Limit the number of results but show how much would be returned
I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.
At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.
When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.
I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] )
is going to take too much resources. How can I make this run faster:
DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)
I have indexes on all the hashUFI
fields and also on all the pairs (rangeStart, rangeEnd)
.
This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.
sql-server performance t-sql performance-tuning count
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.
At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.
When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.
I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] )
is going to take too much resources. How can I make this run faster:
DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)
I have indexes on all the hashUFI
fields and also on all the pairs (rangeStart, rangeEnd)
.
This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.
sql-server performance t-sql performance-tuning count
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38
add a comment |
I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.
At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.
When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.
I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] )
is going to take too much resources. How can I make this run faster:
DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)
I have indexes on all the hashUFI
fields and also on all the pairs (rangeStart, rangeEnd)
.
This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.
sql-server performance t-sql performance-tuning count
I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.
At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.
When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.
I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] )
is going to take too much resources. How can I make this run faster:
DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )
FROM ( #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId ) )
LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]
AND #rlog.iId IN ( '', u.iId )
AND #rLog.docId = dc.docId
WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
)
I have indexes on all the hashUFI
fields and also on all the pairs (rangeStart, rangeEnd)
.
This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.
sql-server performance t-sql performance-tuning count
sql-server performance t-sql performance-tuning count
edited Nov 29 '16 at 14:37
Paul White♦
52.1k14278450
52.1k14278450
asked Nov 3 '16 at 18:25
SasaSasa
1045
1045
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 13 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38
add a comment |
1
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38
1
1
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38
add a comment |
1 Answer
1
active
oldest
votes
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
ON dc.[hashUFI] = #rLog.[hashUFI]
it might help
SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
What version supportsCOUNT(DISTINCT column1, column2)
?
– Andriy M
Nov 29 '16 at 14:15
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154183%2flimit-the-number-of-results-but-show-how-much-would-be-returned%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
ON dc.[hashUFI] = #rLog.[hashUFI]
it might help
SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
What version supportsCOUNT(DISTINCT column1, column2)
?
– Andriy M
Nov 29 '16 at 14:15
add a comment |
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
ON dc.[hashUFI] = #rLog.[hashUFI]
it might help
SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
What version supportsCOUNT(DISTINCT column1, column2)
?
– Andriy M
Nov 29 '16 at 14:15
add a comment |
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
ON dc.[hashUFI] = #rLog.[hashUFI]
it might help
SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )
Are you sure you even need the left join to get a good count?
Are you using values from #rLog in the distinct?
And I would go with , distinct not +
this line is different
ON dc.[hashUFI] = #rLog.[hashUFI]
it might help
SELECT COUNT( DISTINCT [field 1] + [field n] )
FROM #tempDC dc WITH (NOLOCK)
JOIN #tempUID u WITH (NOLOCK)
ON dc.[hashUFI] = u.[hashUFI]
AND dc.iId IN ( '', u.iId )
AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd
AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart
-- try without the left join and see if the number is different
LEFT JOIN #rLog
ON dc.[hashUFI] = #rLog.[hashUFI]
AND dc.docId = #rLog.docId
AND #rlog.iId IN ( '', u.iId )
edited Nov 29 '16 at 15:10
answered Nov 5 '16 at 17:48
paparazzopaparazzo
4,6141230
4,6141230
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
What version supportsCOUNT(DISTINCT column1, column2)
?
– Andriy M
Nov 29 '16 at 14:15
add a comment |
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
What version supportsCOUNT(DISTINCT column1, column2)
?
– Andriy M
Nov 29 '16 at 14:15
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
The number of returned items is different. LEFT JOIN is mandatory here :~(
– Sasa
Nov 7 '16 at 9:47
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.
– Sasa
Nov 7 '16 at 9:59
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
Using dc instead of u haven't helper either. Thanks anyway.
– Sasa
Nov 7 '16 at 10:03
3
3
What version supports
COUNT(DISTINCT column1, column2)
?– Andriy M
Nov 29 '16 at 14:15
What version supports
COUNT(DISTINCT column1, column2)
?– Andriy M
Nov 29 '16 at 14:15
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154183%2flimit-the-number-of-results-but-show-how-much-would-be-returned%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Comments are not for extended discussion; this conversation has been moved to chat.
– Paul White♦
Nov 29 '16 at 14:38