Optimize PostgreSQL server setting for extremely wide tables
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.
On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint
or integer
data type. The typical operations we do:
- perform some aggregation function on one column, grouped by another column (this may well be also some window function)
- read subset of variables (e.g. 20) and all rows
- transform entire table into new table (typically there are no joins to other tables, these are mostly simple
case when then end
statements
There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.
What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers
to 60GB, work_mem
to 10GB? Anything else?
Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.
postgresql postgresql-9.4 postgresql-performance
add a comment |
We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.
On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint
or integer
data type. The typical operations we do:
- perform some aggregation function on one column, grouped by another column (this may well be also some window function)
- read subset of variables (e.g. 20) and all rows
- transform entire table into new table (typically there are no joins to other tables, these are mostly simple
case when then end
statements
There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.
What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers
to 60GB, work_mem
to 10GB? Anything else?
Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.
postgresql postgresql-9.4 postgresql-performance
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31
add a comment |
We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.
On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint
or integer
data type. The typical operations we do:
- perform some aggregation function on one column, grouped by another column (this may well be also some window function)
- read subset of variables (e.g. 20) and all rows
- transform entire table into new table (typically there are no joins to other tables, these are mostly simple
case when then end
statements
There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.
What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers
to 60GB, work_mem
to 10GB? Anything else?
Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.
postgresql postgresql-9.4 postgresql-performance
We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.
On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint
or integer
data type. The typical operations we do:
- perform some aggregation function on one column, grouped by another column (this may well be also some window function)
- read subset of variables (e.g. 20) and all rows
- transform entire table into new table (typically there are no joins to other tables, these are mostly simple
case when then end
statements
There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.
What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers
to 60GB, work_mem
to 10GB? Anything else?
Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.
postgresql postgresql-9.4 postgresql-performance
postgresql postgresql-9.4 postgresql-performance
asked Aug 20 '15 at 13:57
Tomas GreifTomas Greif
26327
26327
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31
add a comment |
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31
add a comment |
1 Answer
1
active
oldest
votes
How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable
and value
, similar to how R's reshape2::melt
function works. For instance, a table:
id|cat1|cat2|metric1|metric2|
Would become:
id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column
The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.
If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f111698%2foptimize-postgresql-server-setting-for-extremely-wide-tables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable
and value
, similar to how R's reshape2::melt
function works. For instance, a table:
id|cat1|cat2|metric1|metric2|
Would become:
id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column
The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.
If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.
New contributor
add a comment |
How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable
and value
, similar to how R's reshape2::melt
function works. For instance, a table:
id|cat1|cat2|metric1|metric2|
Would become:
id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column
The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.
If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.
New contributor
add a comment |
How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable
and value
, similar to how R's reshape2::melt
function works. For instance, a table:
id|cat1|cat2|metric1|metric2|
Would become:
id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column
The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.
If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.
New contributor
How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable
and value
, similar to how R's reshape2::melt
function works. For instance, a table:
id|cat1|cat2|metric1|metric2|
Would become:
id|cat1|cat2|variable|value
id|cat1|cat2|metric1|value of metric 1 column
...
id|cat1|cat2|metric2|value of metric 2 column
The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.
If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.
New contributor
New contributor
answered 14 mins ago
cangerscangers
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f111698%2foptimize-postgresql-server-setting-for-extremely-wide-tables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
possibly related: stackoverflow.com/questions/2966524/…
– Jack Douglas♦
Aug 20 '15 at 14:31
What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)
– Kassandry
Aug 20 '15 at 19:53
I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.
– Craig Ringer
Aug 21 '15 at 1:31