Optimize PostgreSQL server setting for extremely wide tables

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.

On this server we have quite atypical data - there are several tables with approx. 500 columns each, ranging from .5 - 2 milion rows. All columns are either of smallint or integer data type. The typical operations we do:

perform some aggregation function on one column, grouped by another column (this may well be also some window function)

read subset of variables (e.g. 20) and all rows

transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements

There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.

What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?

Note: Using such strange tables is not bad design in this case. We really have that many attributes for one case. We use this data as an input to in-database machine learning (MADlib) and as data pre-processing storage before using other statistical tools.

asked Aug 20 '15 at 13:57

Tomas Greif

26327

possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas♦
Aug 20 '15 at 14:31

What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53

I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31

add a comment |

We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.

perform some aggregation function on one column, grouped by another column (this may well be also some window function)

read subset of variables (e.g. 20) and all rows

transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements

There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.

What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?

asked Aug 20 '15 at 13:57

Tomas Greif

26327

possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas♦
Aug 20 '15 at 14:31

What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53

I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31

add a comment |

We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.

perform some aggregation function on one column, grouped by another column (this may well be also some window function)

read subset of variables (e.g. 20) and all rows

transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements

There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.

What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?

asked Aug 20 '15 at 13:57

Tomas Greif

26327

We do have a dedicated server where only PostgreSQL 9.4.4 server is running, 2 physical cores, 16 logical cores, 128GB RAM.

perform some aggregation function on one column, grouped by another column (this may well be also some window function)

read subset of variables (e.g. 20) and all rows

transform entire table into new table (typically there are no joins to other tables, these are mostly simple case when then end statements

There are two concurrent users at maximum. There is a primary key (single column), but this is almost never used in queries. There are no additional table constraints or indexes.

What would be the recommend configuration of PostgreSQL server? Increasing shared_buffers to 60GB, work_mem to 10GB? Anything else?

postgresql postgresql-9.4 postgresql-performance

asked Aug 20 '15 at 13:57

Tomas Greif

26327

asked Aug 20 '15 at 13:57

Tomas Greif

26327

asked Aug 20 '15 at 13:57

Tomas Greif

26327

asked Aug 20 '15 at 13:57

Tomas Greif

26327

asked Aug 20 '15 at 13:57

Tomas Greif

26327

possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas♦
Aug 20 '15 at 14:31

What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53

I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31

add a comment |

possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas♦
Aug 20 '15 at 14:31

What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53

I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31

possibly related: stackoverflow.com/questions/2966524/…

– Jack Douglas♦
Aug 20 '15 at 14:31

What OS are you running on? Do you have a battery backed RAID card? Do you have an estimate of the size on disk of the table? :)

– Kassandry
Aug 20 '15 at 19:53

I still question the wisdom of using such wide tables. If that's the best way to model it for your needs then maybe PostgreSQL isn't the best tool for the job.

– Craig Ringer
Aug 21 '15 at 1:31

add a comment |

1 Answer
1

active

oldest

votes

How many of those columns to you use for grouping? If it's relatively few, then I would recommend restructuring the data to be in a long format, where each grouping (category) column is maintained, and is each grouped-by (metric) column is instead jammed into two columns variable and value, similar to how R's reshape2::melt function works. For instance, a table:

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value

id|cat1|cat2|metric1|value of metric 1 column

...

id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.

answered 14 mins ago

cangers

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f111698%2foptimize-postgresql-server-setting-for-extremely-wide-tables%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value

id|cat1|cat2|metric1|value of metric 1 column

...

id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.

answered 14 mins ago

cangers

New contributor

add a comment |

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value

id|cat1|cat2|metric1|value of metric 1 column

...

id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.

answered 14 mins ago

cangers

New contributor

add a comment |

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value

id|cat1|cat2|metric1|value of metric 1 column

...

id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.

answered 14 mins ago

cangers

New contributor

id|cat1|cat2|metric1|metric2|

Would become:

id|cat1|cat2|variable|value

id|cat1|cat2|metric1|value of metric 1 column

...

id|cat1|cat2|metric2|value of metric 2 column

The table would become K times longer, with K being the number of metrics you'd like to melt. This can actually improve query performance if you add indexes on your category columns.

If that doesn't speed up performance, then I'd recommend using a different tool than Postgres, such as Apache Spark.

answered 14 mins ago

cangers

New contributor

answered 14 mins ago

cangers

New contributor

answered 14 mins ago

cangers

answered 14 mins ago

cangers

New contributor

cangers is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

7tdmmoHwwqUUz3 7kbAWE0MJQQv CN Fcks71VZw,G5sNyLi0

搜尋此網誌

Xrhrft