Limit the number of results but show how much would be returned












0















I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.



At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.



When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.



I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:



DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)


I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).



This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.










share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Comments are not for extended discussion; this conversation has been moved to chat.

    – Paul White
    Nov 29 '16 at 14:38
















0















I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.



At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.



When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.



I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:



DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)


I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).



This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.










share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    Comments are not for extended discussion; this conversation has been moved to chat.

    – Paul White
    Nov 29 '16 at 14:38














0












0








0


1






I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.



At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.



When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.



I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:



DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)


I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).



This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.










share|improve this question
















I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.



At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.



When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.



I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:



DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)


I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).



This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.







sql-server performance t-sql performance-tuning count






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 '16 at 14:37









Paul White

52.1k14278450




52.1k14278450










asked Nov 3 '16 at 18:25









SasaSasa

1045




1045





bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    Comments are not for extended discussion; this conversation has been moved to chat.

    – Paul White
    Nov 29 '16 at 14:38














  • 1





    Comments are not for extended discussion; this conversation has been moved to chat.

    – Paul White
    Nov 29 '16 at 14:38








1




1





Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White
Nov 29 '16 at 14:38





Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White
Nov 29 '16 at 14:38










1 Answer
1






active

oldest

votes


















0














Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +



this line is different



ON dc.[hashUFI] = #rLog.[hashUFI]


it might help



SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )





share|improve this answer


























  • The number of returned items is different. LEFT JOIN is mandatory here :~(

    – Sasa
    Nov 7 '16 at 9:47













  • I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

    – Sasa
    Nov 7 '16 at 9:59











  • Using dc instead of u haven't helper either. Thanks anyway.

    – Sasa
    Nov 7 '16 at 10:03








  • 3





    What version supports COUNT(DISTINCT column1, column2)?

    – Andriy M
    Nov 29 '16 at 14:15











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154183%2flimit-the-number-of-results-but-show-how-much-would-be-returned%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +



this line is different



ON dc.[hashUFI] = #rLog.[hashUFI]


it might help



SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )





share|improve this answer


























  • The number of returned items is different. LEFT JOIN is mandatory here :~(

    – Sasa
    Nov 7 '16 at 9:47













  • I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

    – Sasa
    Nov 7 '16 at 9:59











  • Using dc instead of u haven't helper either. Thanks anyway.

    – Sasa
    Nov 7 '16 at 10:03








  • 3





    What version supports COUNT(DISTINCT column1, column2)?

    – Andriy M
    Nov 29 '16 at 14:15
















0














Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +



this line is different



ON dc.[hashUFI] = #rLog.[hashUFI]


it might help



SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )





share|improve this answer


























  • The number of returned items is different. LEFT JOIN is mandatory here :~(

    – Sasa
    Nov 7 '16 at 9:47













  • I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

    – Sasa
    Nov 7 '16 at 9:59











  • Using dc instead of u haven't helper either. Thanks anyway.

    – Sasa
    Nov 7 '16 at 10:03








  • 3





    What version supports COUNT(DISTINCT column1, column2)?

    – Andriy M
    Nov 29 '16 at 14:15














0












0








0







Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +



this line is different



ON dc.[hashUFI] = #rLog.[hashUFI]


it might help



SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )





share|improve this answer















Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +



this line is different



ON dc.[hashUFI] = #rLog.[hashUFI]


it might help



SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 29 '16 at 15:10

























answered Nov 5 '16 at 17:48









paparazzopaparazzo

4,6141230




4,6141230













  • The number of returned items is different. LEFT JOIN is mandatory here :~(

    – Sasa
    Nov 7 '16 at 9:47













  • I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

    – Sasa
    Nov 7 '16 at 9:59











  • Using dc instead of u haven't helper either. Thanks anyway.

    – Sasa
    Nov 7 '16 at 10:03








  • 3





    What version supports COUNT(DISTINCT column1, column2)?

    – Andriy M
    Nov 29 '16 at 14:15



















  • The number of returned items is different. LEFT JOIN is mandatory here :~(

    – Sasa
    Nov 7 '16 at 9:47













  • I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

    – Sasa
    Nov 7 '16 at 9:59











  • Using dc instead of u haven't helper either. Thanks anyway.

    – Sasa
    Nov 7 '16 at 10:03








  • 3





    What version supports COUNT(DISTINCT column1, column2)?

    – Andriy M
    Nov 29 '16 at 14:15

















The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47







The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47















I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59





I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59













Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03







Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03






3




3





What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15





What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154183%2flimit-the-number-of-results-but-show-how-much-would-be-returned%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints