Oracle: Database writers idle until 'Checkpoint not complete' even with multiple redo logs












0















I work on an application with a steady data income of 400 MB/hr. For that purpose we have setup 10 redo log groups with a size 100MB each. Therefore, we roll over a log approx. every 15min.



Because of limited disk space, we have a retention job which deletes 'old' data after a certain time. When this job is run it is doing a delete cascade based on timestamps. So far so good...



When the data removal process is running the redo log interval shortens to 15s and after 9 log rollovers we see the infamous 'Checkpoint not completed' message.



We oberserved with top and iotop how the log writer (ora_lgwr_XXXX) and database writers (ora_dbwX_XXXX) behave and found that the database writers are completely idle until the point of 'Checkpoint not complete'. The logwriter writes with MB/s and the database writers idle. Regarding the documentation the database writers start on created checkpoints and these are created after every redo log rollover.



So, why are the database writers not working until the redo logs are all full?



Shouldn't I see some significant I/O for the database writers after the first checkpoint is completed with the first redo log rollover?



We see some accumalted kBs for the two datbase writers before the 'Checkpoint not completed' message with at least 9 redo log rollovers. Afterwards, the database writers store the data with a rate of MB/s to database.



Can the database writer processes made more aggressive? Idealy depending on the amount of data pending in the redo logs? I do not need to have every byte integrated into database files immeditially, but when a lot of data is pending in the database buffer cache (and the redo logs are still active) and a lot of data is coming in, there should be a mechanism to drive the database writers to a higher performance?



Metadata:




  • 8 vCPUs

  • 56 GB RAM

  • Oracle 11.2.0.3

  • log_checkpoint_interval=0

  • log_checkpoint_timeout=1800

  • db_writer_processes=1

  • 10 redo log groups, 100MB each

  • 1 1TB parition via iSCSI for all data (oradata)










share|improve this question
















bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    That's a very small size for redo logs in this day and age

    – Philᵀᴹ
    Jun 22 '17 at 10:21











  • I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

    – Raj
    Jun 22 '17 at 12:01











  • We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

    – Rick-Rainer Ludwig
    Jun 22 '17 at 12:55











  • in your transaction that loads 400MB/hr, when and how often do you commit?

    – Raj
    Jun 22 '17 at 17:48











  • I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

    – Rick-Rainer Ludwig
    Jun 22 '17 at 18:09
















0















I work on an application with a steady data income of 400 MB/hr. For that purpose we have setup 10 redo log groups with a size 100MB each. Therefore, we roll over a log approx. every 15min.



Because of limited disk space, we have a retention job which deletes 'old' data after a certain time. When this job is run it is doing a delete cascade based on timestamps. So far so good...



When the data removal process is running the redo log interval shortens to 15s and after 9 log rollovers we see the infamous 'Checkpoint not completed' message.



We oberserved with top and iotop how the log writer (ora_lgwr_XXXX) and database writers (ora_dbwX_XXXX) behave and found that the database writers are completely idle until the point of 'Checkpoint not complete'. The logwriter writes with MB/s and the database writers idle. Regarding the documentation the database writers start on created checkpoints and these are created after every redo log rollover.



So, why are the database writers not working until the redo logs are all full?



Shouldn't I see some significant I/O for the database writers after the first checkpoint is completed with the first redo log rollover?



We see some accumalted kBs for the two datbase writers before the 'Checkpoint not completed' message with at least 9 redo log rollovers. Afterwards, the database writers store the data with a rate of MB/s to database.



Can the database writer processes made more aggressive? Idealy depending on the amount of data pending in the redo logs? I do not need to have every byte integrated into database files immeditially, but when a lot of data is pending in the database buffer cache (and the redo logs are still active) and a lot of data is coming in, there should be a mechanism to drive the database writers to a higher performance?



Metadata:




  • 8 vCPUs

  • 56 GB RAM

  • Oracle 11.2.0.3

  • log_checkpoint_interval=0

  • log_checkpoint_timeout=1800

  • db_writer_processes=1

  • 10 redo log groups, 100MB each

  • 1 1TB parition via iSCSI for all data (oradata)










share|improve this question
















bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    That's a very small size for redo logs in this day and age

    – Philᵀᴹ
    Jun 22 '17 at 10:21











  • I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

    – Raj
    Jun 22 '17 at 12:01











  • We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

    – Rick-Rainer Ludwig
    Jun 22 '17 at 12:55











  • in your transaction that loads 400MB/hr, when and how often do you commit?

    – Raj
    Jun 22 '17 at 17:48











  • I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

    – Rick-Rainer Ludwig
    Jun 22 '17 at 18:09














0












0








0








I work on an application with a steady data income of 400 MB/hr. For that purpose we have setup 10 redo log groups with a size 100MB each. Therefore, we roll over a log approx. every 15min.



Because of limited disk space, we have a retention job which deletes 'old' data after a certain time. When this job is run it is doing a delete cascade based on timestamps. So far so good...



When the data removal process is running the redo log interval shortens to 15s and after 9 log rollovers we see the infamous 'Checkpoint not completed' message.



We oberserved with top and iotop how the log writer (ora_lgwr_XXXX) and database writers (ora_dbwX_XXXX) behave and found that the database writers are completely idle until the point of 'Checkpoint not complete'. The logwriter writes with MB/s and the database writers idle. Regarding the documentation the database writers start on created checkpoints and these are created after every redo log rollover.



So, why are the database writers not working until the redo logs are all full?



Shouldn't I see some significant I/O for the database writers after the first checkpoint is completed with the first redo log rollover?



We see some accumalted kBs for the two datbase writers before the 'Checkpoint not completed' message with at least 9 redo log rollovers. Afterwards, the database writers store the data with a rate of MB/s to database.



Can the database writer processes made more aggressive? Idealy depending on the amount of data pending in the redo logs? I do not need to have every byte integrated into database files immeditially, but when a lot of data is pending in the database buffer cache (and the redo logs are still active) and a lot of data is coming in, there should be a mechanism to drive the database writers to a higher performance?



Metadata:




  • 8 vCPUs

  • 56 GB RAM

  • Oracle 11.2.0.3

  • log_checkpoint_interval=0

  • log_checkpoint_timeout=1800

  • db_writer_processes=1

  • 10 redo log groups, 100MB each

  • 1 1TB parition via iSCSI for all data (oradata)










share|improve this question
















I work on an application with a steady data income of 400 MB/hr. For that purpose we have setup 10 redo log groups with a size 100MB each. Therefore, we roll over a log approx. every 15min.



Because of limited disk space, we have a retention job which deletes 'old' data after a certain time. When this job is run it is doing a delete cascade based on timestamps. So far so good...



When the data removal process is running the redo log interval shortens to 15s and after 9 log rollovers we see the infamous 'Checkpoint not completed' message.



We oberserved with top and iotop how the log writer (ora_lgwr_XXXX) and database writers (ora_dbwX_XXXX) behave and found that the database writers are completely idle until the point of 'Checkpoint not complete'. The logwriter writes with MB/s and the database writers idle. Regarding the documentation the database writers start on created checkpoints and these are created after every redo log rollover.



So, why are the database writers not working until the redo logs are all full?



Shouldn't I see some significant I/O for the database writers after the first checkpoint is completed with the first redo log rollover?



We see some accumalted kBs for the two datbase writers before the 'Checkpoint not completed' message with at least 9 redo log rollovers. Afterwards, the database writers store the data with a rate of MB/s to database.



Can the database writer processes made more aggressive? Idealy depending on the amount of data pending in the redo logs? I do not need to have every byte integrated into database files immeditially, but when a lot of data is pending in the database buffer cache (and the redo logs are still active) and a lot of data is coming in, there should be a mechanism to drive the database writers to a higher performance?



Metadata:




  • 8 vCPUs

  • 56 GB RAM

  • Oracle 11.2.0.3

  • log_checkpoint_interval=0

  • log_checkpoint_timeout=1800

  • db_writer_processes=1

  • 10 redo log groups, 100MB each

  • 1 1TB parition via iSCSI for all data (oradata)







oracle performance oracle-11g-r2 performance-tuning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 26 '17 at 17:44







Rick-Rainer Ludwig

















asked Jun 22 '17 at 9:18









Rick-Rainer LudwigRick-Rainer Ludwig

28115




28115





bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 23 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    That's a very small size for redo logs in this day and age

    – Philᵀᴹ
    Jun 22 '17 at 10:21











  • I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

    – Raj
    Jun 22 '17 at 12:01











  • We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

    – Rick-Rainer Ludwig
    Jun 22 '17 at 12:55











  • in your transaction that loads 400MB/hr, when and how often do you commit?

    – Raj
    Jun 22 '17 at 17:48











  • I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

    – Rick-Rainer Ludwig
    Jun 22 '17 at 18:09














  • 1





    That's a very small size for redo logs in this day and age

    – Philᵀᴹ
    Jun 22 '17 at 10:21











  • I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

    – Raj
    Jun 22 '17 at 12:01











  • We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

    – Rick-Rainer Ludwig
    Jun 22 '17 at 12:55











  • in your transaction that loads 400MB/hr, when and how often do you commit?

    – Raj
    Jun 22 '17 at 17:48











  • I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

    – Rick-Rainer Ludwig
    Jun 22 '17 at 18:09








1




1





That's a very small size for redo logs in this day and age

– Philᵀᴹ
Jun 22 '17 at 10:21





That's a very small size for redo logs in this day and age

– Philᵀᴹ
Jun 22 '17 at 10:21













I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

– Raj
Jun 22 '17 at 12:01





I think @Philᵀᴹ is right, due to your settings, the 'checkpoint incomplete' message is showing symptoms of underlying problems. "I do not need to have every byte written to disk immediately", yup, you don't but DB has to to guarantee consistency. During your busy 'delete' time see how many logs are switched, then use that info to resize your logs. I'd say at-least 500Mx10 would be a start and then monitor. Also is archive_lag_target set? to what value?

– Raj
Jun 22 '17 at 12:01













We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

– Rick-Rainer Ludwig
Jun 22 '17 at 12:55





We have 1GB of redo logs in total for an average amount of data of 400MB per hour. We roll over ever 15min. That looks reasonable at first. I made the comment about the disk writes more precise, because I meant the data integration into database files. I wonder why the database writers do not try to keep up with the logs. At the moment, we need about 3min to fill up the redo logs completely. Data integration is slower than log writing, that's clear, but if the database writers would do a better job, we could increase the time of redo log file fillup and maybe this would already be sufficient?

– Rick-Rainer Ludwig
Jun 22 '17 at 12:55













in your transaction that loads 400MB/hr, when and how often do you commit?

– Raj
Jun 22 '17 at 17:48





in your transaction that loads 400MB/hr, when and how often do you commit?

– Raj
Jun 22 '17 at 17:48













I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

– Rick-Rainer Ludwig
Jun 22 '17 at 18:09





I do not know the exact frequency, but a couple of times per second. We get the data in packets every 3 to 5 seconds and have some different commits for metadata, some raw data, statistical values and calculations... We talk about~100kBs per second over maybe 2 or 3 commits in this second.

– Rick-Rainer Ludwig
Jun 22 '17 at 18:09










1 Answer
1






active

oldest

votes


















0














I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.






share|improve this answer
























  • For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

    – Rick-Rainer Ludwig
    Jun 26 '17 at 17:43











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f176978%2foracle-database-writers-idle-until-checkpoint-not-complete-even-with-multiple%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.






share|improve this answer
























  • For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

    – Rick-Rainer Ludwig
    Jun 26 '17 at 17:43
















0














I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.






share|improve this answer
























  • For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

    – Rick-Rainer Ludwig
    Jun 26 '17 at 17:43














0












0








0







I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.






share|improve this answer













I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jun 22 '17 at 13:44









mustacciomustaccio

9,37972237




9,37972237













  • For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

    – Rick-Rainer Ludwig
    Jun 26 '17 at 17:43



















  • For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

    – Rick-Rainer Ludwig
    Jun 26 '17 at 17:43

















For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

– Rick-Rainer Ludwig
Jun 26 '17 at 17:43





For Oracle 11g, the parameter is deprecated in favor for parameter FAST_START_MTTR_TARGET. This one is overwritten by log_checkpoint_timeout. So, it leaves with one only. The latter, I will try end of the week, but it is not very promising. The timeout also creates a checkpoint, exactly like redo log rollover. Why should the database writer start earlier then? The main question is still: Why do the database writers start so damn late!?

– Rick-Rainer Ludwig
Jun 26 '17 at 17:43


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f176978%2foracle-database-writers-idle-until-checkpoint-not-complete-even-with-multiple%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints