Efficiently bulk upsert unrelated rows












2















As part of a manual replication process between databases with different but related schemas, for each table, we identify new and modified records in the source database and feed them via a table valued parameter to a stored procedure in the SQL Server destination database. The source database is not SQL Server and is in a different data center.



For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.



Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.




  1. An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.


  2. MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.



Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:




When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.




Do I understand that right? Am I better off with a separate UPDATE and INSERT? And what about the implicit transaction? Is performing a single SELECT, UPDATE, and INSERT over a large batch of rows most efficient, or is it better to take advantage of the ability to do one row at time by using a FOR loop? Or something else?



In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?










share|improve this question
















bumped to the homepage by Community 16 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

    – scsimon
    Nov 1 '18 at 19:34


















2















As part of a manual replication process between databases with different but related schemas, for each table, we identify new and modified records in the source database and feed them via a table valued parameter to a stored procedure in the SQL Server destination database. The source database is not SQL Server and is in a different data center.



For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.



Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.




  1. An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.


  2. MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.



Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:




When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.




Do I understand that right? Am I better off with a separate UPDATE and INSERT? And what about the implicit transaction? Is performing a single SELECT, UPDATE, and INSERT over a large batch of rows most efficient, or is it better to take advantage of the ability to do one row at time by using a FOR loop? Or something else?



In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?










share|improve this question
















bumped to the homepage by Community 16 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

    – scsimon
    Nov 1 '18 at 19:34
















2












2








2








As part of a manual replication process between databases with different but related schemas, for each table, we identify new and modified records in the source database and feed them via a table valued parameter to a stored procedure in the SQL Server destination database. The source database is not SQL Server and is in a different data center.



For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.



Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.




  1. An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.


  2. MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.



Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:




When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.




Do I understand that right? Am I better off with a separate UPDATE and INSERT? And what about the implicit transaction? Is performing a single SELECT, UPDATE, and INSERT over a large batch of rows most efficient, or is it better to take advantage of the ability to do one row at time by using a FOR loop? Or something else?



In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?










share|improve this question
















As part of a manual replication process between databases with different but related schemas, for each table, we identify new and modified records in the source database and feed them via a table valued parameter to a stored procedure in the SQL Server destination database. The source database is not SQL Server and is in a different data center.



For each row to be upserted, the stored procedure queries for the primary keys of the related rows already in the destination database.



Currently, we do a single SELECT to get the primary keys, followed by a single MERGE to perform the upsert. There are two aspects of this approach that I believe may not be as efficient as possible.




  1. An implicit transaction unnecessarily wraps the MERGE. The database would remain consistent even with each row being upserted one at a time. If an row's upsert fails, we want the remaining rows to proceed.


  2. MERGE interleaves inserts and sets as it goes through the rows, which is fine, but we don't need this. It would be acceptable to set all the modified rows and subsequently insert all the new rows.



Based on the flexibility we have, the MERGE performance tip to use UPDATE and INSERT seems to apply to our case:




When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.




Do I understand that right? Am I better off with a separate UPDATE and INSERT? And what about the implicit transaction? Is performing a single SELECT, UPDATE, and INSERT over a large batch of rows most efficient, or is it better to take advantage of the ability to do one row at time by using a FOR loop? Or something else?



In general, what is the most efficient way to upsert a large batch or rows to a SQL Server table in which the rows are not transactionally related and sets and inserts need not be interleaved?







sql-server transactional-replication etl upsert table-valued-parameters






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 1 '18 at 20:13







Edward Brey

















asked Nov 1 '18 at 19:27









Edward BreyEdward Brey

1115




1115





bumped to the homepage by Community 16 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 16 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

    – scsimon
    Nov 1 '18 at 19:34
















  • 1





    Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

    – scsimon
    Nov 1 '18 at 19:34










1




1





Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34







Aaron Bertrand wrote about some potential gotcha's with MERGE that you may want to be aware of. Another thing I'd change is the table variable to a temp table. There are more blogs about this out there. Regarding If an row's upsert fails, we want the remaining rows to proceed., this sounds like RBAR methodology. I'd expect most people, even using UPSERT, would want the entire transaction rolled back if either the batch update or insert failed

– scsimon
Nov 1 '18 at 19:34












1 Answer
1






active

oldest

votes


















0














To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.



Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.



set xact_abort on;
begin try
begin tran

update d
set d.column = s.column
from destinationTable d
inner join sourceTable s on
s.id = d.id

insert into destinationTable (...column list...)
select s.1, s.2, ...
from sourceTable s
where not exists(select id from destinationTable d where d.id = s.id)

commit
end try
begin catch
if @@trancount > 0 rollback tran
declare @msg nvarchar(2048) = error_message()
raiserror (@msg, 16, 1)
return 55555
end catch


Some good reads




  • Erlan Sommarskog's Error and Transaction Handling

  • Aaron Bertrand's Use Caution w/ SQL Server MERGE

  • Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

  • Using LEFT JOIN or NOT EXISTS


In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.



If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.






share|improve this answer


























  • I added a clarification to the question that the source database is not SQL Server and is in a different data center.

    – Edward Brey
    Nov 1 '18 at 20:13











  • unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

    – scsimon
    Nov 1 '18 at 20:22











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f221569%2fefficiently-bulk-upsert-unrelated-rows%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.



Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.



set xact_abort on;
begin try
begin tran

update d
set d.column = s.column
from destinationTable d
inner join sourceTable s on
s.id = d.id

insert into destinationTable (...column list...)
select s.1, s.2, ...
from sourceTable s
where not exists(select id from destinationTable d where d.id = s.id)

commit
end try
begin catch
if @@trancount > 0 rollback tran
declare @msg nvarchar(2048) = error_message()
raiserror (@msg, 16, 1)
return 55555
end catch


Some good reads




  • Erlan Sommarskog's Error and Transaction Handling

  • Aaron Bertrand's Use Caution w/ SQL Server MERGE

  • Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

  • Using LEFT JOIN or NOT EXISTS


In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.



If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.






share|improve this answer


























  • I added a clarification to the question that the source database is not SQL Server and is in a different data center.

    – Edward Brey
    Nov 1 '18 at 20:13











  • unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

    – scsimon
    Nov 1 '18 at 20:22
















0














To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.



Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.



set xact_abort on;
begin try
begin tran

update d
set d.column = s.column
from destinationTable d
inner join sourceTable s on
s.id = d.id

insert into destinationTable (...column list...)
select s.1, s.2, ...
from sourceTable s
where not exists(select id from destinationTable d where d.id = s.id)

commit
end try
begin catch
if @@trancount > 0 rollback tran
declare @msg nvarchar(2048) = error_message()
raiserror (@msg, 16, 1)
return 55555
end catch


Some good reads




  • Erlan Sommarskog's Error and Transaction Handling

  • Aaron Bertrand's Use Caution w/ SQL Server MERGE

  • Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

  • Using LEFT JOIN or NOT EXISTS


In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.



If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.






share|improve this answer


























  • I added a clarification to the question that the source database is not SQL Server and is in a different data center.

    – Edward Brey
    Nov 1 '18 at 20:13











  • unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

    – scsimon
    Nov 1 '18 at 20:22














0












0








0







To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.



Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.



set xact_abort on;
begin try
begin tran

update d
set d.column = s.column
from destinationTable d
inner join sourceTable s on
s.id = d.id

insert into destinationTable (...column list...)
select s.1, s.2, ...
from sourceTable s
where not exists(select id from destinationTable d where d.id = s.id)

commit
end try
begin catch
if @@trancount > 0 rollback tran
declare @msg nvarchar(2048) = error_message()
raiserror (@msg, 16, 1)
return 55555
end catch


Some good reads




  • Erlan Sommarskog's Error and Transaction Handling

  • Aaron Bertrand's Use Caution w/ SQL Server MERGE

  • Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

  • Using LEFT JOIN or NOT EXISTS


In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.



If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.






share|improve this answer















To elaborate on my comment, I would use a set based approach for better performance with error handling, here is how I would do it. I mixed the two common methods, using JOIN and NOT EXISTS / EXISTS.



Note, I don't use a temp table or table variable here. I've seen performance gains when the databases are on different servers but pulling the data into a temp table before doing an upsert across a linked server or whatever, but I'm not sure if your databases are on the same instance or not.



set xact_abort on;
begin try
begin tran

update d
set d.column = s.column
from destinationTable d
inner join sourceTable s on
s.id = d.id

insert into destinationTable (...column list...)
select s.1, s.2, ...
from sourceTable s
where not exists(select id from destinationTable d where d.id = s.id)

commit
end try
begin catch
if @@trancount > 0 rollback tran
declare @msg nvarchar(2048) = error_message()
raiserror (@msg, 16, 1)
return 55555
end catch


Some good reads




  • Erlan Sommarskog's Error and Transaction Handling

  • Aaron Bertrand's Use Caution w/ SQL Server MERGE

  • Martin Smith's break down of Table Variable vs Temp Tables and his suggestions on when you should use one over the other

  • Using LEFT JOIN or NOT EXISTS


In the case that the data is in on a different server a temp table could increase the performance of the insert (or may be the only option if it's not SQL Server). The above code would be the same, except the sourceTable would be replaced with #sourceTable.



If the table is large, then you may want to consider staging tables instead of #tempTables so that you don't beat up tempdb too bad. BULK INSERT into the staging tables (sourceTable) and then do your upsert from here.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 1 '18 at 20:21

























answered Nov 1 '18 at 20:07









scsimonscsimon

1,398414




1,398414













  • I added a clarification to the question that the source database is not SQL Server and is in a different data center.

    – Edward Brey
    Nov 1 '18 at 20:13











  • unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

    – scsimon
    Nov 1 '18 at 20:22



















  • I added a clarification to the question that the source database is not SQL Server and is in a different data center.

    – Edward Brey
    Nov 1 '18 at 20:13











  • unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

    – scsimon
    Nov 1 '18 at 20:22

















I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13





I added a clarification to the question that the source database is not SQL Server and is in a different data center.

– Edward Brey
Nov 1 '18 at 20:13













unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22





unserstood @EdwardBrey. Not much would change for me, except probably bulk inserting into stagingTables and then doing the upsert. Naturally, these would have what ever PK, FK, and indexes you need to make your upserts perform well

– scsimon
Nov 1 '18 at 20:22


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f221569%2fefficiently-bulk-upsert-unrelated-rows%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints