Large INSERTs performance optimization












0















I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).



The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.



When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.



How can I optimize performance for this process?



UPDATE



My SHOW CREATE TABLE result is the following:



CREATE TABLE `UserData` (
`uid` int(11) NOT NULL,
`data` int(11) NOT NULL,
PRIMARY KEY (`uid`,`data`),
KEY `uid` (`uid`),
KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


I need 2 indexes because it is necessary for me to fetch data by uid and by data value.



I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),... with 40000 (uid,data) pairs.



15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.










share|improve this question

























  • Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

    – Shlomi Noach
    May 13 '13 at 5:10











  • @ShlomiNoach Thank you for reply, I have updated the post

    – zavg
    May 13 '13 at 12:42











  • Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

    – Cristian Porta
    May 13 '13 at 18:50











  • @crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

    – zavg
    May 13 '13 at 19:04
















0















I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).



The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.



When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.



How can I optimize performance for this process?



UPDATE



My SHOW CREATE TABLE result is the following:



CREATE TABLE `UserData` (
`uid` int(11) NOT NULL,
`data` int(11) NOT NULL,
PRIMARY KEY (`uid`,`data`),
KEY `uid` (`uid`),
KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


I need 2 indexes because it is necessary for me to fetch data by uid and by data value.



I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),... with 40000 (uid,data) pairs.



15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.










share|improve this question

























  • Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

    – Shlomi Noach
    May 13 '13 at 5:10











  • @ShlomiNoach Thank you for reply, I have updated the post

    – zavg
    May 13 '13 at 12:42











  • Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

    – Cristian Porta
    May 13 '13 at 18:50











  • @crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

    – zavg
    May 13 '13 at 19:04














0












0








0








I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).



The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.



When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.



How can I optimize performance for this process?



UPDATE



My SHOW CREATE TABLE result is the following:



CREATE TABLE `UserData` (
`uid` int(11) NOT NULL,
`data` int(11) NOT NULL,
PRIMARY KEY (`uid`,`data`),
KEY `uid` (`uid`),
KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


I need 2 indexes because it is necessary for me to fetch data by uid and by data value.



I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),... with 40000 (uid,data) pairs.



15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.










share|improve this question
















I have 15 Amazon AWS EC2 t1.micro instances which simultaneously populate Amazon RDS MySQL d2.m2.xlarge database with data using large INSERTs (40000 rows in query).



The queries are sent continuously. The table is INNODB, two INT columns, there is index for both columns. CPU utilization of RDS instance is about 30% during data receiving.



When I have one EC2 instance, the speed is in orders of magnitude faster than I run 15 instances simultaneously. In light of this, the 15-instances group works slower and slower until the speed becomes totally unsatisfactory.



How can I optimize performance for this process?



UPDATE



My SHOW CREATE TABLE result is the following:



CREATE TABLE `UserData` (
`uid` int(11) NOT NULL,
`data` int(11) NOT NULL,
PRIMARY KEY (`uid`,`data`),
KEY `uid` (`uid`),
KEY `data` (`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1


I need 2 indexes because it is necessary for me to fetch data by uid and by data value.



I insert data with INSERT INTO UserData (uid, data) VALUES (1,2),(1,3),(1,10),... with 40000 (uid,data) pairs.



15 parallel instances insert ~121 000 000 rows in 2 hours, but I am sure that it can be much more faster.







mysql innodb optimization insert amazon-rds






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 6 mins ago







zavg

















asked May 13 '13 at 0:09









zavgzavg

121126




121126













  • Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

    – Shlomi Noach
    May 13 '13 at 5:10











  • @ShlomiNoach Thank you for reply, I have updated the post

    – zavg
    May 13 '13 at 12:42











  • Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

    – Cristian Porta
    May 13 '13 at 18:50











  • @crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

    – zavg
    May 13 '13 at 19:04



















  • Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

    – Shlomi Noach
    May 13 '13 at 5:10











  • @ShlomiNoach Thank you for reply, I have updated the post

    – zavg
    May 13 '13 at 12:42











  • Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

    – Cristian Porta
    May 13 '13 at 18:50











  • @crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

    – zavg
    May 13 '13 at 19:04

















Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

– Shlomi Noach
May 13 '13 at 5:10





Please present the SHOW CREATE TABLE your_table output. In particular, is there an AUTO_INCREMENT column? Which PRIMARY KEY? What is the order by which rows are inserted?

– Shlomi Noach
May 13 '13 at 5:10













@ShlomiNoach Thank you for reply, I have updated the post

– zavg
May 13 '13 at 12:42





@ShlomiNoach Thank you for reply, I have updated the post

– zavg
May 13 '13 at 12:42













Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

– Cristian Porta
May 13 '13 at 18:50





Are your INSERTs enclosed into a transaction? START TRANSACTION; INSERT ... and COMMIT;

– Cristian Porta
May 13 '13 at 18:50













@crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

– zavg
May 13 '13 at 19:04





@crisrian-porta Not, but I use bulk inserts with 40000 values in one query.

– zavg
May 13 '13 at 19:04










1 Answer
1






active

oldest

votes


















2














Two hints for you:




  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY


  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000).
    Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000







share|improve this answer
























  • Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

    – zavg
    May 17 '13 at 21:09











  • number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

    – Shlomi Noach
    May 19 '13 at 17:00











  • I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

    – zavg
    May 19 '13 at 22:52











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f42018%2flarge-inserts-performance-optimization%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Two hints for you:




  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY


  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000).
    Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000







share|improve this answer
























  • Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

    – zavg
    May 17 '13 at 21:09











  • number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

    – Shlomi Noach
    May 19 '13 at 17:00











  • I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

    – zavg
    May 19 '13 at 22:52
















2














Two hints for you:




  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY


  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000).
    Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000







share|improve this answer
























  • Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

    – zavg
    May 17 '13 at 21:09











  • number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

    – Shlomi Noach
    May 19 '13 at 17:00











  • I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

    – zavg
    May 19 '13 at 22:52














2












2








2







Two hints for you:




  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY


  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000).
    Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000







share|improve this answer













Two hints for you:




  1. The KEY uid is redundant, because it is covered by the PRIMARY KEY


  2. 40,000 rows at a time might make for too large a transaction. Although these are very small rows (two INTs) this may cause the transaction to go to disk, depending on your settings. I usually go with around 1,000 rows at a time (I go as low as 100 and as high as 10,000).
    Please try doing 40 * 1,000 and see if this works better for you than 1 * 40,000








share|improve this answer












share|improve this answer



share|improve this answer










answered May 17 '13 at 19:54









Shlomi NoachShlomi Noach

6,2771721




6,2771721













  • Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

    – zavg
    May 17 '13 at 21:09











  • number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

    – Shlomi Noach
    May 19 '13 at 17:00











  • I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

    – zavg
    May 19 '13 at 22:52



















  • Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

    – zavg
    May 17 '13 at 21:09











  • number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

    – Shlomi Noach
    May 19 '13 at 17:00











  • I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

    – zavg
    May 19 '13 at 22:52

















Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

– zavg
May 17 '13 at 21:09





Ok, thank you for advice. I will try your technique and measure the performance increase. Currently I am using such optimizations: 1) I have dropped all indexes while writing to db, 2) I have set AUTOCOMMIT=0; and COMMIT after very large sequence of INSERTs (about 1500000 rows), 3) I am using increased innodb_buffer_pool_size and bulk_insert_buffer_size, 4) also I changed t1.micro instances to m1.small and m2.xlarge to m1.xlarge for better IO performance

– zavg
May 17 '13 at 21:09













number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

– Shlomi Noach
May 19 '13 at 17:00





number 2) sound like an anti-optimization to me. And how do you mean you've dropped all indexes while writing? Do you mean you're ALTERing the table before and after the INSERT? If so - don't do it.

– Shlomi Noach
May 19 '13 at 17:00













I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

– zavg
May 19 '13 at 22:52





I insert data to the table without indexes and add them after all INSERTs. then I don't write to this table.

– zavg
May 19 '13 at 22:52


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f42018%2flarge-inserts-performance-optimization%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints