Limit the number of results but show how much would be returned

I have a big problem in T-SQL with large amount of data where I need to limit number of returned rows to 200K but also to show the number of rows that would be returned. It is needed to be done under 10 min.

At the end for the result I am joining 3 tables which has 10'sK, 100'sK, or millions of rows in various combinations, depending on given parameters. I am using just several rows from each table and need the exact number of distinct rows that are returned.

When I place data into a temp table, writing it to disk takes too long (hours), so I am now trying to run the query twice. First time just for counting and second time for returning the result set without writing it to a temp table. This is because if I have more than 200K rows, than it can take too long otherwise.

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )

    FROM ( #tempDC dc WITH (NOLOCK)

            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 

                AND dc.iId IN ( '', u.iId ) )

        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]

            AND #rlog.iId IN ( '',  u.iId )

            AND #rLog.docId = dc.docId

    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White♦
Nov 29 '16 at 14:38

add a comment |

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )

    FROM ( #tempDC dc WITH (NOLOCK)

            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 

                AND dc.iId IN ( '', u.iId ) )

        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]

            AND #rlog.iId IN ( '',  u.iId )

            AND #rLog.docId = dc.docId

    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White♦
Nov 29 '16 at 14:38

add a comment |

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )

    FROM ( #tempDC dc WITH (NOLOCK)

            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 

                AND dc.iId IN ( '', u.iId ) )

        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]

            AND #rlog.iId IN ( '',  u.iId )

            AND #rLog.docId = dc.docId

    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

I am wondering if COUNT ( DISTINCT [field 1] + ... + [field n] ) is going to take too much resources. How can I make this run faster:

DECLARE @count INT = ( SELECT COUNT( DISTINCT [field 1] + ... + [field n] )

    FROM ( #tempDC dc WITH (NOLOCK)

            JOIN #tempUID u WITH (NOLOCK) ON dc.[hashUFI] = u.[hashUFI] 

                AND dc.iId IN ( '', u.iId ) )

        LEFT JOIN #rLog ON u.[hashUFI] = #rLog.[hashUFI]

            AND #rlog.iId IN ( '',  u.iId )

            AND #rLog.docId = dc.docId

    WHERE dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

        AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

)

I have indexes on all the hashUFI fields and also on all the pairs (rangeStart, rangeEnd).

This is just a problematic snippet of a much larger procedure (around 2000 lines of code), which is taking data from some audit tables.

sql-server performance t-sql performance-tuning count

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

edited Nov 29 '16 at 14:37

Paul White♦

52.1k14278450

asked Nov 3 '16 at 18:25

Sasa

1045

asked Nov 3 '16 at 18:25

Sasa

1045

asked Nov 3 '16 at 18:25

Sasa

1045

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White♦
Nov 29 '16 at 14:38

add a comment |

1

Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White♦
Nov 29 '16 at 14:38

Comments are not for extended discussion; this conversation has been moved to chat.

– Paul White♦
Nov 29 '16 at 14:38

add a comment |

1 Answer
1

active

oldest

votes

Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )

  FROM #tempDC dc WITH (NOLOCK)

  JOIN #tempUID u WITH (NOLOCK) 

        ON dc.[hashUFI] = u.[hashUFI] 

       AND dc.iId IN ( '', u.iId ) 

       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

  -- try without the left join and see if the number is different

  LEFT JOIN #rLog 

        ON dc.[hashUFI] = #rLog.[hashUFI]

       AND dc.docId     = #rLog.docId 

       AND #rlog.iId IN ( '',  u.iId )

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

3

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f154183%2flimit-the-number-of-results-but-show-how-much-would-be-returned%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )

  FROM #tempDC dc WITH (NOLOCK)

  JOIN #tempUID u WITH (NOLOCK) 

        ON dc.[hashUFI] = u.[hashUFI] 

       AND dc.iId IN ( '', u.iId ) 

       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

  -- try without the left join and see if the number is different

  LEFT JOIN #rLog 

        ON dc.[hashUFI] = #rLog.[hashUFI]

       AND dc.docId     = #rLog.docId 

       AND #rlog.iId IN ( '',  u.iId )

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

3

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

add a comment |

Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )

  FROM #tempDC dc WITH (NOLOCK)

  JOIN #tempUID u WITH (NOLOCK) 

        ON dc.[hashUFI] = u.[hashUFI] 

       AND dc.iId IN ( '', u.iId ) 

       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

  -- try without the left join and see if the number is different

  LEFT JOIN #rLog 

        ON dc.[hashUFI] = #rLog.[hashUFI]

       AND dc.docId     = #rLog.docId 

       AND #rlog.iId IN ( '',  u.iId )

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

3

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

add a comment |

Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )

  FROM #tempDC dc WITH (NOLOCK)

  JOIN #tempUID u WITH (NOLOCK) 

        ON dc.[hashUFI] = u.[hashUFI] 

       AND dc.iId IN ( '', u.iId ) 

       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

  -- try without the left join and see if the number is different

  LEFT JOIN #rLog 

        ON dc.[hashUFI] = #rLog.[hashUFI]

       AND dc.docId     = #rLog.docId 

       AND #rlog.iId IN ( '',  u.iId )

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

Are you sure you even need the left join to get a good count?

Are you using values from #rLog in the distinct?

And I would go with , distinct not +

this line is different

ON dc.[hashUFI] = #rLog.[hashUFI]

it might help

SELECT COUNT( DISTINCT [field 1] + [field n] )

  FROM #tempDC dc WITH (NOLOCK)

  JOIN #tempUID u WITH (NOLOCK) 

        ON dc.[hashUFI] = u.[hashUFI] 

       AND dc.iId IN ( '', u.iId ) 

       AND dc.rangeStart <= dc.rangeEnd AND u.rangeStart <= u.rangeEnd 

       AND u.rangeStart <= dc.rangeEnd AND u.rangeEnd >= dc.rangeStart

  -- try without the left join and see if the number is different

  LEFT JOIN #rLog 

        ON dc.[hashUFI] = #rLog.[hashUFI]

       AND dc.docId     = #rLog.docId 

       AND #rlog.iId IN ( '',  u.iId )

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

edited Nov 29 '16 at 15:10

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

answered Nov 5 '16 at 17:48

paparazzo

4,6141230

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

3

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

add a comment |

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

3

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

The number of returned items is different. LEFT JOIN is mandatory here :~(

– Sasa
Nov 7 '16 at 9:47

I have replaced LEFT JOIN with OUTER APPLY (SELECT TOP (2) [needed fields] FROM #rLog WITH (NOLOCK) WHERE ...<same as ON>...), but it haven't helped either.

– Sasa
Nov 7 '16 at 9:59

Using dc instead of u haven't helper either. Thanks anyway.

– Sasa
Nov 7 '16 at 10:03

What version supports COUNT(DISTINCT column1, column2)?

– Andriy M
Nov 29 '16 at 14:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

l6133VVFpkVgK,FGI n4gfiuE7LYth7Yb,Z60duPTFvRMXrZlxdV79sHxhcsauQzN,x C3l biv6kVZJE,iUVl

搜尋此網誌

Xrhrft