Optimize PostgreSQL server setting for extremely wide tables





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







3















We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.



On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:




  • perform some aggregation function on one column, grouped by another column (this may well be also some window function)

  • read subset of variables (e.g. 20) and all rows

  • transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements


There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.



What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?



Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.










share|improve this question























  • possibly related: stackoverflow.com/questions/2966524/…

    – Jack Douglas
    Aug 20 '15 at 14:31











  • What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

    – Kassandry
    Aug 20 '15 at 19:53











  • I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

    – Craig Ringer
    Aug 21 '15 at 1:31


















3















We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.



On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:




  • perform some aggregation function on one column, grouped by another column (this may well be also some window function)

  • read subset of variables (e.g. 20) and all rows

  • transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements


There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.



What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?



Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.










share|improve this question























  • possibly related: stackoverflow.com/questions/2966524/…

    – Jack Douglas
    Aug 20 '15 at 14:31











  • What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

    – Kassandry
    Aug 20 '15 at 19:53











  • I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

    – Craig Ringer
    Aug 21 '15 at 1:31














3












3








3








We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.



On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:




  • perform some aggregation function on one column, grouped by another column (this may well be also some window function)

  • read subset of variables (e.g. 20) and all rows

  • transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements


There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.



What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?



Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.










share|improve this question














We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.



On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:




  • perform some aggregation function on one column, grouped by another column (this may well be also some window function)

  • read subset of variables (e.g. 20) and all rows

  • transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements


There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.



What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?



Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.







postgresql postgresql-9.4 postgresql-performance






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 20 '15 at 13:57









Tomas GreifTomas Greif

26327




26327













  • possibly related: stackoverflow.com/questions/2966524/…

    – Jack Douglas
    Aug 20 '15 at 14:31











  • What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

    – Kassandry
    Aug 20 '15 at 19:53











  • I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

    – Craig Ringer
    Aug 21 '15 at 1:31



















  • possibly related: stackoverflow.com/questions/2966524/…

    – Jack Douglas
    Aug 20 '15 at 14:31











  • What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

    – Kassandry
    Aug 20 '15 at 19:53











  • I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

    – Craig Ringer
    Aug 21 '15 at 1:31

















possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas
Aug 20 '15 at 14:31





possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas
Aug 20 '15 at 14:31













What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53





What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53













I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31





I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31










1 Answer
1






active

oldest

votes


















0














How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:



id|cat1|cat2|metric1|metric2|


Would become:



id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column


The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.



If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.






share|improve this answer








New contributor




cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f111698%2foptimize-postgresql-server-setting-for-extremely-wide-tables%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:



    id|cat1|cat2|metric1|metric2|


    Would become:



    id|cat1|cat2|variable|value
    id|cat1|cat2|metric1|value of metric 1 column
    ...
    id|cat1|cat2|metric2|value of metric 2 column


    The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.



    If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.






    share|improve this answer








    New contributor




    cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

























      0














      How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:



      id|cat1|cat2|metric1|metric2|


      Would become:



      id|cat1|cat2|variable|value
      id|cat1|cat2|metric1|value of metric 1 column
      ...
      id|cat1|cat2|metric2|value of metric 2 column


      The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.



      If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.






      share|improve this answer








      New contributor




      cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.























        0












        0








        0







        How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:



        id|cat1|cat2|metric1|metric2|


        Would become:



        id|cat1|cat2|variable|value
        id|cat1|cat2|metric1|value of metric 1 column
        ...
        id|cat1|cat2|metric2|value of metric 2 column


        The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.



        If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.






        share|improve this answer








        New contributor




        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.










        How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:



        id|cat1|cat2|metric1|metric2|


        Would become:



        id|cat1|cat2|variable|value
        id|cat1|cat2|metric1|value of metric 1 column
        ...
        id|cat1|cat2|metric2|value of metric 2 column


        The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.



        If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.







        share|improve this answer








        New contributor




        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 14 mins ago









        cangerscangers

        1




        1




        New contributor




        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f111698%2foptimize-postgresql-server-setting-for-extremely-wide-tables%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            SQL Server 17 - Attemping to backup to remote NAS but Access is denied

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            Restoring from pg_dump with foreign key constraints