Optimizing queries for 25+ million rows

I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).

There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.

columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.

After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.

CREATE TABLE #TEMP

(

    TABLE VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    AFTER VARCHAR(1000) NULL,

    DATE DATETIME,

    SORT_ID INT IDENTITY(1,1)

)

CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)



INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)

    SELECT TABLE, FIELD, AFTER, DATE 

    FROM mytbl WITH (NOLOCK)

    WHERE TABLE = 'OTB' AND

    FIELD = 'STATUS'

Runnig this -> (216598 row(s) affected)

Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins

I want to run something like this for several queries:

SELECT COUNT(*) AS COUNT

FROM #TEMP

WHERE AFTER = 'R' AND

DATE >= '2014-01-01' AND

DATE <= '2015-01-01'

It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.

How can I optimize this?

EDIT

I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.

I can't create a unique on any index since there is no unique field or rows.

I am using SQL Server 2012.

Table Info: It's a heap and has the following space usage:

name    rows        reserved    data        index_size  unused

mytbl   24869658    9204568 KB  3017952 KB  5816232 KB  370384 KB

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24

If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26

2

You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand♦
Feb 27 '15 at 15:27

3

Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30

2

I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White♦
Feb 27 '15 at 15:55

|
show 3 more comments

There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.

columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.

After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.

CREATE TABLE #TEMP

(

    TABLE VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    AFTER VARCHAR(1000) NULL,

    DATE DATETIME,

    SORT_ID INT IDENTITY(1,1)

)

CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)



INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)

    SELECT TABLE, FIELD, AFTER, DATE 

    FROM mytbl WITH (NOLOCK)

    WHERE TABLE = 'OTB' AND

    FIELD = 'STATUS'

Runnig this -> (216598 row(s) affected)

Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins

I want to run something like this for several queries:

SELECT COUNT(*) AS COUNT

FROM #TEMP

WHERE AFTER = 'R' AND

DATE >= '2014-01-01' AND

DATE <= '2015-01-01'

It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.

How can I optimize this?

EDIT

I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.

I can't create a unique on any index since there is no unique field or rows.

I am using SQL Server 2012.

Table Info: It's a heap and has the following space usage:

name    rows        reserved    data        index_size  unused

mytbl   24869658    9204568 KB  3017952 KB  5816232 KB  370384 KB

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24

If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26

2

You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand♦
Feb 27 '15 at 15:27

3

Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30

2

I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White♦
Feb 27 '15 at 15:55

|
show 3 more comments

There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.

columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.

After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.

CREATE TABLE #TEMP

(

    TABLE VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    AFTER VARCHAR(1000) NULL,

    DATE DATETIME,

    SORT_ID INT IDENTITY(1,1)

)

CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)



INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)

    SELECT TABLE, FIELD, AFTER, DATE 

    FROM mytbl WITH (NOLOCK)

    WHERE TABLE = 'OTB' AND

    FIELD = 'STATUS'

Runnig this -> (216598 row(s) affected)

Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins

I want to run something like this for several queries:

SELECT COUNT(*) AS COUNT

FROM #TEMP

WHERE AFTER = 'R' AND

DATE >= '2014-01-01' AND

DATE <= '2015-01-01'

It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.

How can I optimize this?

EDIT

I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.

I can't create a unique on any index since there is no unique field or rows.

I am using SQL Server 2012.

Table Info: It's a heap and has the following space usage:

name    rows        reserved    data        index_size  unused

mytbl   24869658    9204568 KB  3017952 KB  5816232 KB  370384 KB

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.

columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.

After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.

CREATE TABLE #TEMP

(

    TABLE VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    AFTER VARCHAR(1000) NULL,

    DATE DATETIME,

    SORT_ID INT IDENTITY(1,1)

)

CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)



INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)

    SELECT TABLE, FIELD, AFTER, DATE 

    FROM mytbl WITH (NOLOCK)

    WHERE TABLE = 'OTB' AND

    FIELD = 'STATUS'

Runnig this -> (216598 row(s) affected)

Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins

I want to run something like this for several queries:

SELECT COUNT(*) AS COUNT

FROM #TEMP

WHERE AFTER = 'R' AND

DATE >= '2014-01-01' AND

DATE <= '2015-01-01'

It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.

How can I optimize this?

EDIT

I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.

I can't create a unique on any index since there is no unique field or rows.

I am using SQL Server 2012.

Table Info: It's a heap and has the following space usage:

name    rows        reserved    data        index_size  unused

mytbl   24869658    9204568 KB  3017952 KB  5816232 KB  370384 KB

sql-server sql-server-2012 performance-tuning

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

edited Feb 27 '15 at 16:34

Paul White♦

49.5k14261415

asked Feb 27 '15 at 15:12

Atieh

159126

asked Feb 27 '15 at 15:12

Atieh

159126

asked Feb 27 '15 at 15:12

Atieh

159126

@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24

If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26

2

You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand♦
Feb 27 '15 at 15:27

3

Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30

2

I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White♦
Feb 27 '15 at 15:55

|
show 3 more comments

@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24

If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26

2

You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand♦
Feb 27 '15 at 15:27

3

Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30

2

I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White♦
Feb 27 '15 at 15:55

@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24

If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26

You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand♦
Feb 27 '15 at 15:27

Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30

I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White♦
Feb 27 '15 at 15:55

|
show 3 more comments

2 Answers
2

active

oldest

votes

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]

FROM mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB' AND

[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 

FROM dbo.mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB'

AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or

Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.

Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.

The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).

If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:

CREATE INDEX index_name

ON dbo.mytbl ([DATE],[AFTER])

WHERE [TABLE] = 'OTB'

AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

add a comment |

I think there is a case for changing the indexes here because:

a) you have a task to do (these multiple queries)

b) data warehouse volumes (25+ million rows) and

c) a performance problem.

This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.

Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.

I set up a simple test rig to mimic your setup, and saw a good improvement in performance:

USE tempdb

GO



SET NOCOUNT ON

GO



-- Create a large table

IF OBJECT_ID('dbo.largeTable') IS NOT NULL

DROP TABLE dbo.largeTable

GO

CREATE TABLE dbo.largeTable ( 



    [TABLE] VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    [AFTER] VARCHAR(1000) NULL,

    [DATE] DATETIME,

    SORT_ID INT IDENTITY(1,1),



    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )

)

GO



-- Populate table

;WITH cte AS (

SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn

FROM master.sys.columns c1

    CROSS JOIN master.sys.columns c2

    CROSS JOIN master.sys.columns c3

)

INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

SELECT 

    x.tableName, 

    y.field,

    z.[after],

    DATEADD( day, rn % 1111, '1 Jan 2012' )

FROM cte c

    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )

    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )

    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )



CHECKPOINT



GO 5



EXEC sp_spaceused 'dbo.largeTable'

GO



SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records

FROM dbo.largeTable

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1

GO



-- Add the non-clustered columnstore

CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



-- Check query again

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2

GO

My results, 6 seconds v 0.08 seconds:

enter image description here

In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f94018%2foptimizing-queries-for-25-million-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]

FROM mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB' AND

[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 

FROM dbo.mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB'

AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or

Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

CREATE INDEX index_name

ON dbo.mytbl ([DATE],[AFTER])

WHERE [TABLE] = 'OTB'

AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

add a comment |

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]

FROM mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB' AND

[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 

FROM dbo.mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB'

AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or

Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

CREATE INDEX index_name

ON dbo.mytbl ([DATE],[AFTER])

WHERE [TABLE] = 'OTB'

AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

add a comment |

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]

FROM mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB' AND

[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 

FROM dbo.mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB'

AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or

Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

CREATE INDEX index_name

ON dbo.mytbl ([DATE],[AFTER])

WHERE [TABLE] = 'OTB'

AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]

FROM mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB' AND

[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 

FROM dbo.mytbl WITH (NOLOCK)

WHERE [TABLE] = 'OTB'

AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or

Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

CREATE INDEX index_name

ON dbo.mytbl ([DATE],[AFTER])

WHERE [TABLE] = 'OTB'

AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

edited Feb 27 '15 at 16:59

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

answered Feb 27 '15 at 16:33

Paul White♦

49.5k14261415

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

add a comment |

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

– Atieh
Feb 27 '15 at 16:50

@Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

– Paul White♦
Feb 27 '15 at 16:53

add a comment |

I think there is a case for changing the indexes here because:

a) you have a task to do (these multiple queries)

b) data warehouse volumes (25+ million rows) and

c) a performance problem.

This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.

I set up a simple test rig to mimic your setup, and saw a good improvement in performance:

USE tempdb

GO



SET NOCOUNT ON

GO



-- Create a large table

IF OBJECT_ID('dbo.largeTable') IS NOT NULL

DROP TABLE dbo.largeTable

GO

CREATE TABLE dbo.largeTable ( 



    [TABLE] VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    [AFTER] VARCHAR(1000) NULL,

    [DATE] DATETIME,

    SORT_ID INT IDENTITY(1,1),



    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )

)

GO



-- Populate table

;WITH cte AS (

SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn

FROM master.sys.columns c1

    CROSS JOIN master.sys.columns c2

    CROSS JOIN master.sys.columns c3

)

INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

SELECT 

    x.tableName, 

    y.field,

    z.[after],

    DATEADD( day, rn % 1111, '1 Jan 2012' )

FROM cte c

    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )

    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )

    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )



CHECKPOINT



GO 5



EXEC sp_spaceused 'dbo.largeTable'

GO



SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records

FROM dbo.largeTable

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1

GO



-- Add the non-clustered columnstore

CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



-- Check query again

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2

GO

My results, 6 seconds v 0.08 seconds:

enter image description here

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

add a comment |

I think there is a case for changing the indexes here because:

a) you have a task to do (these multiple queries)

b) data warehouse volumes (25+ million rows) and

c) a performance problem.

This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.

I set up a simple test rig to mimic your setup, and saw a good improvement in performance:

USE tempdb

GO



SET NOCOUNT ON

GO



-- Create a large table

IF OBJECT_ID('dbo.largeTable') IS NOT NULL

DROP TABLE dbo.largeTable

GO

CREATE TABLE dbo.largeTable ( 



    [TABLE] VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    [AFTER] VARCHAR(1000) NULL,

    [DATE] DATETIME,

    SORT_ID INT IDENTITY(1,1),



    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )

)

GO



-- Populate table

;WITH cte AS (

SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn

FROM master.sys.columns c1

    CROSS JOIN master.sys.columns c2

    CROSS JOIN master.sys.columns c3

)

INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

SELECT 

    x.tableName, 

    y.field,

    z.[after],

    DATEADD( day, rn % 1111, '1 Jan 2012' )

FROM cte c

    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )

    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )

    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )



CHECKPOINT



GO 5



EXEC sp_spaceused 'dbo.largeTable'

GO



SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records

FROM dbo.largeTable

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1

GO



-- Add the non-clustered columnstore

CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



-- Check query again

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2

GO

My results, 6 seconds v 0.08 seconds:

enter image description here

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

add a comment |

I think there is a case for changing the indexes here because:

a) you have a task to do (these multiple queries)

b) data warehouse volumes (25+ million rows) and

c) a performance problem.

This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.

I set up a simple test rig to mimic your setup, and saw a good improvement in performance:

USE tempdb

GO



SET NOCOUNT ON

GO



-- Create a large table

IF OBJECT_ID('dbo.largeTable') IS NOT NULL

DROP TABLE dbo.largeTable

GO

CREATE TABLE dbo.largeTable ( 



    [TABLE] VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    [AFTER] VARCHAR(1000) NULL,

    [DATE] DATETIME,

    SORT_ID INT IDENTITY(1,1),



    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )

)

GO



-- Populate table

;WITH cte AS (

SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn

FROM master.sys.columns c1

    CROSS JOIN master.sys.columns c2

    CROSS JOIN master.sys.columns c3

)

INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

SELECT 

    x.tableName, 

    y.field,

    z.[after],

    DATEADD( day, rn % 1111, '1 Jan 2012' )

FROM cte c

    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )

    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )

    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )



CHECKPOINT



GO 5



EXEC sp_spaceused 'dbo.largeTable'

GO



SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records

FROM dbo.largeTable

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1

GO



-- Add the non-clustered columnstore

CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



-- Check query again

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2

GO

My results, 6 seconds v 0.08 seconds:

enter image description here

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

I think there is a case for changing the indexes here because:

a) you have a task to do (these multiple queries)

b) data warehouse volumes (25+ million rows) and

c) a performance problem.

This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.

I set up a simple test rig to mimic your setup, and saw a good improvement in performance:

USE tempdb

GO



SET NOCOUNT ON

GO



-- Create a large table

IF OBJECT_ID('dbo.largeTable') IS NOT NULL

DROP TABLE dbo.largeTable

GO

CREATE TABLE dbo.largeTable ( 



    [TABLE] VARCHAR(30) NULL,

    FIELD VARCHAR(30) NULL,

    [AFTER] VARCHAR(1000) NULL,

    [DATE] DATETIME,

    SORT_ID INT IDENTITY(1,1),



    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )

)

GO



-- Populate table

;WITH cte AS (

SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn

FROM master.sys.columns c1

    CROSS JOIN master.sys.columns c2

    CROSS JOIN master.sys.columns c3

)

INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

SELECT 

    x.tableName, 

    y.field,

    z.[after],

    DATEADD( day, rn % 1111, '1 Jan 2012' )

FROM cte c

    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )

    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )

    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )



CHECKPOINT



GO 5



EXEC sp_spaceused 'dbo.largeTable'

GO



SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records

FROM dbo.largeTable

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1

GO



-- Add the non-clustered columnstore

CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )

GO



-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!

--DBCC DROPCLEANBUFFERS

--DBCC FREEPROCCACHE

--GO



-- Check query again

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()



SELECT COUNT(*) AS COUNT

FROM dbo.largeTable

WHERE [AFTER] = 'R' 

  AND [DATE] >= '2014-01-01' 

  AND [DATE] <= '2015-01-01'



SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2

GO

My results, 6 seconds v 0.08 seconds:

enter image description here

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

edited 50 secs ago

Juan Castillo

1031

edited 50 secs ago

Juan Castillo

1031

edited 50 secs ago

Juan Castillo

1031

answered Feb 28 '15 at 10:25

wBob

9,50221738

answered Feb 28 '15 at 10:25

wBob

9,50221738

answered Feb 28 '15 at 10:25

wBob

9,50221738

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

sq26jMN7kHxyJyvgUHyhve9BS1DgrLtrSJ

搜尋此網誌

Xrhrft