Why does MongoDB extend FileSize if it is already 5x larger than DataSize












1















I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.



db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}


I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.



Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?










share|improve this question
















bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

    – aldwinaldwin
    Jul 13 '15 at 9:15











  • Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

    – aldwinaldwin
    Jul 13 '15 at 9:21













  • Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

    – Karl Ivar Dahl
    Jul 13 '15 at 10:07






  • 1





    It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

    – aldwinaldwin
    Jul 13 '15 at 10:21








  • 1





    You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

    – Adam C
    Jul 13 '15 at 11:31
















1















I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.



db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}


I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.



Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?










share|improve this question
















bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

    – aldwinaldwin
    Jul 13 '15 at 9:15











  • Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

    – aldwinaldwin
    Jul 13 '15 at 9:21













  • Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

    – Karl Ivar Dahl
    Jul 13 '15 at 10:07






  • 1





    It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

    – aldwinaldwin
    Jul 13 '15 at 10:21








  • 1





    You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

    – Adam C
    Jul 13 '15 at 11:31














1












1








1


1






I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.



db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}


I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.



Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?










share|improve this question
















I recently had to assign more diskspace to my MongoDB 2.4.8 instance.
This instance continually receives transactions, makes some updates and then deletes them after 3 months. I would therefore expect that the disk usage was relatively constant.
The documents have a relatively uniform size of 5KB.



db.stats()
{
"db" : "mydb",
"collections" : 16,
"objects" : 4.71578e+006,
"avgObjSize" : 5368.2594088278856000,
"dataSize" : 25315551828.0000000000000000,
"storageSize" : 111230508336.0000000000000000,
"numExtents" : 128,
"indexes" : 41,
"indexSize" : 1398799136.0000000000000000,
"fileSize" : 122280738816.0000000000000000,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"ok" : 1.0000000000000000
}


I understand that disk usage will be larger than data size due to preallocation and fragmentation, but I cannot see any reasonble explanation for a 5 to 1 ratio other than a large historical delete or a bug.



Is MongoDB unable to reuse space properly so that we must schedule manual repair-jobs on otherwise completely stable systems, or do I have another problem somewhere?







mongodb mongo-repair






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 13 '15 at 9:58







Karl Ivar Dahl

















asked Jul 13 '15 at 8:46









Karl Ivar DahlKarl Ivar Dahl

1062




1062





bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 18 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

    – aldwinaldwin
    Jul 13 '15 at 9:15











  • Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

    – aldwinaldwin
    Jul 13 '15 at 9:21













  • Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

    – Karl Ivar Dahl
    Jul 13 '15 at 10:07






  • 1





    It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

    – aldwinaldwin
    Jul 13 '15 at 10:21








  • 1





    You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

    – Adam C
    Jul 13 '15 at 11:31














  • 1





    Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

    – aldwinaldwin
    Jul 13 '15 at 9:15











  • Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

    – aldwinaldwin
    Jul 13 '15 at 9:21













  • Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

    – Karl Ivar Dahl
    Jul 13 '15 at 10:07






  • 1





    It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

    – aldwinaldwin
    Jul 13 '15 at 10:21








  • 1





    You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

    – Adam C
    Jul 13 '15 at 11:31








1




1





Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

– aldwinaldwin
Jul 13 '15 at 9:15





Maybe the following article can help you further : stackoverflow.com/questions/13390160/… .

– aldwinaldwin
Jul 13 '15 at 9:15













Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

– aldwinaldwin
Jul 13 '15 at 9:21







Most probably, indeed the deleted space that isn't reused. Maybe advised is, while you need to do a maintenance anyway to recover the space, upgrade to v3.0 in the meantime. There seem so be a lot of improvements between 2.4 and 3.0. Also, instead of deleting/removing everything older than 3 months, have a look at the TTL-index.

– aldwinaldwin
Jul 13 '15 at 9:21















Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

– Karl Ivar Dahl
Jul 13 '15 at 10:07





Thanks. The article you refer to does address the common issue of reclaiming space. I am however not concerned about reclamation, but that the database file size never seems to stabilize even with a constant amount of data. I would expect that at some point in time there should be 100% reuse of provisioned space.

– Karl Ivar Dahl
Jul 13 '15 at 10:07




1




1





It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

– aldwinaldwin
Jul 13 '15 at 10:21







It seems that 2.4 uses the 'exact fit allocation', what means that when there is free space, a document with the exact fit 'might' be used to fill it in, but unlikely. The power of 2 sizes allocation strategy can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion or relocation. docs.mongodb.org/manual/core/storage/#power-of-2-allocation 100% without any maintenance is not possible except for capped collections

– aldwinaldwin
Jul 13 '15 at 10:21






1




1





You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

– Adam C
Jul 13 '15 at 11:31





You are basically seeing poor re-use (I wrote the SO answer linked above). There are several reasons why it can happen - the free list can become poisoned (rare), or new data doesn't fit in the free space, or you may be hitting the timeout on the free list search which defaults to allocate new space rather than slow things down. You should look at the power of 2 sizes option as mentioned, you can schedule repairs/resyncs every few weeks to reclaim space, or you can look at the newer versions and storage engines for better disk space utilisation in general

– Adam C
Jul 13 '15 at 11:31










1 Answer
1






active

oldest

votes


















0














Based on the comments I have received the following actions seem to address my concerns:




  • Migrate existing collections to power of 2 sizes.

  • Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.

  • Only capped collections should be considered "100% maintenance-free".






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106736%2fwhy-does-mongodb-extend-filesize-if-it-is-already-5x-larger-than-datasize%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Based on the comments I have received the following actions seem to address my concerns:




    • Migrate existing collections to power of 2 sizes.

    • Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.

    • Only capped collections should be considered "100% maintenance-free".






    share|improve this answer




























      0














      Based on the comments I have received the following actions seem to address my concerns:




      • Migrate existing collections to power of 2 sizes.

      • Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.

      • Only capped collections should be considered "100% maintenance-free".






      share|improve this answer


























        0












        0








        0







        Based on the comments I have received the following actions seem to address my concerns:




        • Migrate existing collections to power of 2 sizes.

        • Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.

        • Only capped collections should be considered "100% maintenance-free".






        share|improve this answer













        Based on the comments I have received the following actions seem to address my concerns:




        • Migrate existing collections to power of 2 sizes.

        • Run repair or compress periodically to optimize the free list search so that default allocation of new disk space on timeout is avoided.

        • Only capped collections should be considered "100% maintenance-free".







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jul 14 '15 at 8:48









        Karl Ivar DahlKarl Ivar Dahl

        1062




        1062






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f106736%2fwhy-does-mongodb-extend-filesize-if-it-is-already-5x-larger-than-datasize%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            SQL Server 17 - Attemping to backup to remote NAS but Access is denied

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            Restoring from pg_dump with foreign key constraints