Why does MongoDB extend FileSize if it is already 5x larger than DataSize
I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.
db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}
I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.
Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?
mongodb mongo-repair
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.
db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}
I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.
Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?
mongodb mongo-repair
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
1
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
1
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31
add a comment |
I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.
db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}
I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.
Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?
mongodb mongo-repair
I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.
db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}
I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.
Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?
mongodb mongo-repair
mongodb mongo-repair
edited Jul 13 '15 at 9:58
Karl Ivar Dahl
asked Jul 13 '15 at 8:46
Karl Ivar DahlKarl Ivar Dahl
1062
1062
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 18 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
1
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
1
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31
add a comment |
1
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
1
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
1
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31
1
1
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
1
1
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
1
1
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31
add a comment |
1 Answer
1
active
oldest
votes
Based on the comments I have received the following actions seem to address my concerns:
- Migrate existing collections to power of 2 sizes.
- Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.
- Only capped collections should be considered "100% maintenance-free".
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106736%2fwhy-does-mongodb-extend-filesize-if-it-is-already-5x-larger-than-datasize%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Based on the comments I have received the following actions seem to address my concerns:
- Migrate existing collections to power of 2 sizes.
- Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.
- Only capped collections should be considered "100% maintenance-free".
add a comment |
Based on the comments I have received the following actions seem to address my concerns:
- Migrate existing collections to power of 2 sizes.
- Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.
- Only capped collections should be considered "100% maintenance-free".
add a comment |
Based on the comments I have received the following actions seem to address my concerns:
- Migrate existing collections to power of 2 sizes.
- Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.
- Only capped collections should be considered "100% maintenance-free".
Based on the comments I have received the following actions seem to address my concerns:
- Migrate existing collections to power of 2 sizes.
- Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.
- Only capped collections should be considered "100% maintenance-free".
answered Jul 14 '15 at 8:48
Karl Ivar DahlKarl Ivar Dahl
1062
1062
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106736%2fwhy-does-mongodb-extend-filesize-if-it-is-already-5x-larger-than-datasize%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .
– aldwinaldwin
Jul 13 '15 at 9:15
Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.
– aldwinaldwin
Jul 13 '15 at 9:21
Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.
– Karl Ivar Dahl
Jul 13 '15 at 10:07
1
It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections
– aldwinaldwin
Jul 13 '15 at 10:21
1
You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general
– Adam C
Jul 13 '15 at 11:31