Optimizing queries for 25+ million rows












10















I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).



There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.



columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.



After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.



CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)

INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'


Runnig this -> (216598 row(s) affected)



Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins



I want to run something like this for several queries:



SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'


It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.



How can I optimize this?



EDIT



I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.



I can't create a unique on any index since there is no unique field or rows.



I am using SQL Server 2012.



Table Info: It's a heap and has the following space usage:



name    rows        reserved    data        index_size  unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB









share|improve this question

























  • @MikaelEriksson I cannot modify production tables..

    – Atieh
    Feb 27 '15 at 15:24













  • If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:26






  • 2





    You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

    – Aaron Bertrand
    Feb 27 '15 at 15:27






  • 3





    Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:30








  • 2





    I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

    – Paul White
    Feb 27 '15 at 15:55
















10















I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).



There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.



columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.



After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.



CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)

INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'


Runnig this -> (216598 row(s) affected)



Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins



I want to run something like this for several queries:



SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'


It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.



How can I optimize this?



EDIT



I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.



I can't create a unique on any index since there is no unique field or rows.



I am using SQL Server 2012.



Table Info: It's a heap and has the following space usage:



name    rows        reserved    data        index_size  unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB









share|improve this question

























  • @MikaelEriksson I cannot modify production tables..

    – Atieh
    Feb 27 '15 at 15:24













  • If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:26






  • 2





    You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

    – Aaron Bertrand
    Feb 27 '15 at 15:27






  • 3





    Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:30








  • 2





    I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

    – Paul White
    Feb 27 '15 at 15:55














10












10








10


4






I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).



There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.



columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.



After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.



CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)

INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'


Runnig this -> (216598 row(s) affected)



Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins



I want to run something like this for several queries:



SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'


It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.



How can I optimize this?



EDIT



I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.



I can't create a unique on any index since there is no unique field or rows.



I am using SQL Server 2012.



Table Info: It's a heap and has the following space usage:



name    rows        reserved    data        index_size  unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB









share|improve this question
















I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).



There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.



columns needed: TABLE, FIELD, AFTER, DATE, and there is an index on each of DATE and TABLE.



After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.



CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)

INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'


Runnig this -> (216598 row(s) affected)



Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins



I want to run something like this for several queries:



SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'


It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.



How can I optimize this?



EDIT



I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.



I can't create a unique on any index since there is no unique field or rows.



I am using SQL Server 2012.



Table Info: It's a heap and has the following space usage:



name    rows        reserved    data        index_size  unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB






sql-server sql-server-2012 performance-tuning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 '15 at 16:34









Paul White

49.5k14261415




49.5k14261415










asked Feb 27 '15 at 15:12









AtiehAtieh

159126




159126













  • @MikaelEriksson I cannot modify production tables..

    – Atieh
    Feb 27 '15 at 15:24













  • If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:26






  • 2





    You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

    – Aaron Bertrand
    Feb 27 '15 at 15:27






  • 3





    Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:30








  • 2





    I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

    – Paul White
    Feb 27 '15 at 15:55



















  • @MikaelEriksson I cannot modify production tables..

    – Atieh
    Feb 27 '15 at 15:24













  • If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:26






  • 2





    You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

    – Aaron Bertrand
    Feb 27 '15 at 15:27






  • 3





    Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

    – ypercubeᵀᴹ
    Feb 27 '15 at 15:30








  • 2





    I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

    – Paul White
    Feb 27 '15 at 15:55

















@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24







@MikaelEriksson I cannot modify production tables..

– Atieh
Feb 27 '15 at 15:24















If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26





If the queries you are trying to optimize are of the form SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01', why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?

– ypercubeᵀᴹ
Feb 27 '15 at 15:26




2




2





You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand
Feb 27 '15 at 15:27





You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...

– Aaron Bertrand
Feb 27 '15 at 15:27




3




3





Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30







Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the TABLE and FIELD columns from the #temp table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS' for the specific temp table after all.)

– ypercubeᵀᴹ
Feb 27 '15 at 15:30






2




2





I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White
Feb 27 '15 at 15:55





I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE statement). The down vote was because the question was not clear.

– Paul White
Feb 27 '15 at 15:55










2 Answers
2






active

oldest

votes


















11














The question is mainly about how to optimize the select statement:



SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'


Removing the redundant projections and adding the presumed dbo schema:



SELECT [AFTER], [DATE] 
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';


Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:




  1. Scan the heap entirely (3GB+); or

  2. Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.


Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.



Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.



The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).



If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:



CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';


Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.






share|improve this answer


























  • Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

    – Atieh
    Feb 27 '15 at 16:50











  • @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

    – Paul White
    Feb 27 '15 at 16:53



















5














I think there is a case for changing the indexes here because:




  • a) you have a task to do (these multiple queries)

  • b) data warehouse volumes (25+ million rows) and

  • c) a performance problem.


This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.



Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.



I set up a simple test rig to mimic your setup, and saw a good improvement in performance:



USE tempdb
GO

SET NOCOUNT ON
GO

-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (

[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),

pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO

-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )

CHECKPOINT

GO 5

EXEC sp_spaceused 'dbo.largeTable'
GO

SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO

-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO

DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'

SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO

-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO

-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO

-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'

SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO


My results, 6 seconds v 0.08 seconds:



enter image description here



In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.






share|improve this answer

























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f94018%2foptimizing-queries-for-25-million-rows%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    11














    The question is mainly about how to optimize the select statement:



    SELECT [TABLE], [FIELD], [AFTER], [DATE]
    FROM mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB' AND
    [FIELD] = 'STATUS'


    Removing the redundant projections and adding the presumed dbo schema:



    SELECT [AFTER], [DATE] 
    FROM dbo.mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB'
    AND FIELD = 'STATUS';


    Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:




    1. Scan the heap entirely (3GB+); or

    2. Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.


    Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.



    Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.



    The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).



    If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:



    CREATE INDEX index_name
    ON dbo.mytbl ([DATE],[AFTER])
    WHERE [TABLE] = 'OTB'
    AND [FIELD] = 'STATUS';


    Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.






    share|improve this answer


























    • Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

      – Atieh
      Feb 27 '15 at 16:50











    • @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

      – Paul White
      Feb 27 '15 at 16:53
















    11














    The question is mainly about how to optimize the select statement:



    SELECT [TABLE], [FIELD], [AFTER], [DATE]
    FROM mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB' AND
    [FIELD] = 'STATUS'


    Removing the redundant projections and adding the presumed dbo schema:



    SELECT [AFTER], [DATE] 
    FROM dbo.mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB'
    AND FIELD = 'STATUS';


    Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:




    1. Scan the heap entirely (3GB+); or

    2. Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.


    Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.



    Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.



    The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).



    If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:



    CREATE INDEX index_name
    ON dbo.mytbl ([DATE],[AFTER])
    WHERE [TABLE] = 'OTB'
    AND [FIELD] = 'STATUS';


    Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.






    share|improve this answer


























    • Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

      – Atieh
      Feb 27 '15 at 16:50











    • @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

      – Paul White
      Feb 27 '15 at 16:53














    11












    11








    11







    The question is mainly about how to optimize the select statement:



    SELECT [TABLE], [FIELD], [AFTER], [DATE]
    FROM mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB' AND
    [FIELD] = 'STATUS'


    Removing the redundant projections and adding the presumed dbo schema:



    SELECT [AFTER], [DATE] 
    FROM dbo.mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB'
    AND FIELD = 'STATUS';


    Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:




    1. Scan the heap entirely (3GB+); or

    2. Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.


    Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.



    Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.



    The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).



    If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:



    CREATE INDEX index_name
    ON dbo.mytbl ([DATE],[AFTER])
    WHERE [TABLE] = 'OTB'
    AND [FIELD] = 'STATUS';


    Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.






    share|improve this answer















    The question is mainly about how to optimize the select statement:



    SELECT [TABLE], [FIELD], [AFTER], [DATE]
    FROM mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB' AND
    [FIELD] = 'STATUS'


    Removing the redundant projections and adding the presumed dbo schema:



    SELECT [AFTER], [DATE] 
    FROM dbo.mytbl WITH (NOLOCK)
    WHERE [TABLE] = 'OTB'
    AND FIELD = 'STATUS';


    Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:




    1. Scan the heap entirely (3GB+); or

    2. Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.


    Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.



    Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.



    The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).



    If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:



    CREATE INDEX index_name
    ON dbo.mytbl ([DATE],[AFTER])
    WHERE [TABLE] = 'OTB'
    AND [FIELD] = 'STATUS';


    Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 27 '15 at 16:59

























    answered Feb 27 '15 at 16:33









    Paul WhitePaul White

    49.5k14261415




    49.5k14261415













    • Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

      – Atieh
      Feb 27 '15 at 16:50











    • @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

      – Paul White
      Feb 27 '15 at 16:53



















    • Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

      – Atieh
      Feb 27 '15 at 16:50











    • @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

      – Paul White
      Feb 27 '15 at 16:53

















    Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

    – Atieh
    Feb 27 '15 at 16:50





    Sorry Paul, there is: IDX6 nonclustered located on PRIMARY TABLE, FIELD. Maybe this would change things you mentioned?

    – Atieh
    Feb 27 '15 at 16:50













    @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

    – Paul White
    Feb 27 '15 at 16:53





    @Atieh Not materially, no. Unless it also includes DATE and AFTER, using that index will also require an RID lookup. Check the execution plans.

    – Paul White
    Feb 27 '15 at 16:53













    5














    I think there is a case for changing the indexes here because:




    • a) you have a task to do (these multiple queries)

    • b) data warehouse volumes (25+ million rows) and

    • c) a performance problem.


    This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.



    Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.



    I set up a simple test rig to mimic your setup, and saw a good improvement in performance:



    USE tempdb
    GO

    SET NOCOUNT ON
    GO

    -- Create a large table
    IF OBJECT_ID('dbo.largeTable') IS NOT NULL
    DROP TABLE dbo.largeTable
    GO
    CREATE TABLE dbo.largeTable (

    [TABLE] VARCHAR(30) NULL,
    FIELD VARCHAR(30) NULL,
    [AFTER] VARCHAR(1000) NULL,
    [DATE] DATETIME,
    SORT_ID INT IDENTITY(1,1),

    pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
    )
    GO

    -- Populate table
    ;WITH cte AS (
    SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
    FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
    )
    INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
    SELECT
    x.tableName,
    y.field,
    z.[after],
    DATEADD( day, rn % 1111, '1 Jan 2012' )
    FROM cte c
    CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
    CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
    CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )

    CHECKPOINT

    GO 5

    EXEC sp_spaceused 'dbo.largeTable'
    GO

    SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
    FROM dbo.largeTable
    GO

    -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
    --DBCC DROPCLEANBUFFERS
    --DBCC FREEPROCCACHE
    --GO

    DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

    SELECT COUNT(*) AS COUNT
    FROM dbo.largeTable
    WHERE [AFTER] = 'R'
    AND [DATE] >= '2014-01-01'
    AND [DATE] <= '2015-01-01'

    SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
    GO

    -- Add the non-clustered columnstore
    CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
    GO

    -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
    --DBCC DROPCLEANBUFFERS
    --DBCC FREEPROCCACHE
    --GO

    -- Check query again
    DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

    SELECT COUNT(*) AS COUNT
    FROM dbo.largeTable
    WHERE [AFTER] = 'R'
    AND [DATE] >= '2014-01-01'
    AND [DATE] <= '2015-01-01'

    SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
    GO


    My results, 6 seconds v 0.08 seconds:



    enter image description here



    In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.






    share|improve this answer






























      5














      I think there is a case for changing the indexes here because:




      • a) you have a task to do (these multiple queries)

      • b) data warehouse volumes (25+ million rows) and

      • c) a performance problem.


      This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.



      Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.



      I set up a simple test rig to mimic your setup, and saw a good improvement in performance:



      USE tempdb
      GO

      SET NOCOUNT ON
      GO

      -- Create a large table
      IF OBJECT_ID('dbo.largeTable') IS NOT NULL
      DROP TABLE dbo.largeTable
      GO
      CREATE TABLE dbo.largeTable (

      [TABLE] VARCHAR(30) NULL,
      FIELD VARCHAR(30) NULL,
      [AFTER] VARCHAR(1000) NULL,
      [DATE] DATETIME,
      SORT_ID INT IDENTITY(1,1),

      pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
      )
      GO

      -- Populate table
      ;WITH cte AS (
      SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
      FROM master.sys.columns c1
      CROSS JOIN master.sys.columns c2
      CROSS JOIN master.sys.columns c3
      )
      INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
      SELECT
      x.tableName,
      y.field,
      z.[after],
      DATEADD( day, rn % 1111, '1 Jan 2012' )
      FROM cte c
      CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
      CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
      CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )

      CHECKPOINT

      GO 5

      EXEC sp_spaceused 'dbo.largeTable'
      GO

      SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
      FROM dbo.largeTable
      GO

      -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
      --DBCC DROPCLEANBUFFERS
      --DBCC FREEPROCCACHE
      --GO

      DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

      SELECT COUNT(*) AS COUNT
      FROM dbo.largeTable
      WHERE [AFTER] = 'R'
      AND [DATE] >= '2014-01-01'
      AND [DATE] <= '2015-01-01'

      SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
      GO

      -- Add the non-clustered columnstore
      CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
      GO

      -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
      --DBCC DROPCLEANBUFFERS
      --DBCC FREEPROCCACHE
      --GO

      -- Check query again
      DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

      SELECT COUNT(*) AS COUNT
      FROM dbo.largeTable
      WHERE [AFTER] = 'R'
      AND [DATE] >= '2014-01-01'
      AND [DATE] <= '2015-01-01'

      SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
      GO


      My results, 6 seconds v 0.08 seconds:



      enter image description here



      In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.






      share|improve this answer




























        5












        5








        5







        I think there is a case for changing the indexes here because:




        • a) you have a task to do (these multiple queries)

        • b) data warehouse volumes (25+ million rows) and

        • c) a performance problem.


        This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.



        Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.



        I set up a simple test rig to mimic your setup, and saw a good improvement in performance:



        USE tempdb
        GO

        SET NOCOUNT ON
        GO

        -- Create a large table
        IF OBJECT_ID('dbo.largeTable') IS NOT NULL
        DROP TABLE dbo.largeTable
        GO
        CREATE TABLE dbo.largeTable (

        [TABLE] VARCHAR(30) NULL,
        FIELD VARCHAR(30) NULL,
        [AFTER] VARCHAR(1000) NULL,
        [DATE] DATETIME,
        SORT_ID INT IDENTITY(1,1),

        pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
        )
        GO

        -- Populate table
        ;WITH cte AS (
        SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
        FROM master.sys.columns c1
        CROSS JOIN master.sys.columns c2
        CROSS JOIN master.sys.columns c3
        )
        INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
        SELECT
        x.tableName,
        y.field,
        z.[after],
        DATEADD( day, rn % 1111, '1 Jan 2012' )
        FROM cte c
        CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
        CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
        CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )

        CHECKPOINT

        GO 5

        EXEC sp_spaceused 'dbo.largeTable'
        GO

        SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
        FROM dbo.largeTable
        GO

        -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
        --DBCC DROPCLEANBUFFERS
        --DBCC FREEPROCCACHE
        --GO

        DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

        SELECT COUNT(*) AS COUNT
        FROM dbo.largeTable
        WHERE [AFTER] = 'R'
        AND [DATE] >= '2014-01-01'
        AND [DATE] <= '2015-01-01'

        SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
        GO

        -- Add the non-clustered columnstore
        CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
        GO

        -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
        --DBCC DROPCLEANBUFFERS
        --DBCC FREEPROCCACHE
        --GO

        -- Check query again
        DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

        SELECT COUNT(*) AS COUNT
        FROM dbo.largeTable
        WHERE [AFTER] = 'R'
        AND [DATE] >= '2014-01-01'
        AND [DATE] <= '2015-01-01'

        SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
        GO


        My results, 6 seconds v 0.08 seconds:



        enter image description here



        In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.






        share|improve this answer















        I think there is a case for changing the indexes here because:




        • a) you have a task to do (these multiple queries)

        • b) data warehouse volumes (25+ million rows) and

        • c) a performance problem.


        This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.



        Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.



        I set up a simple test rig to mimic your setup, and saw a good improvement in performance:



        USE tempdb
        GO

        SET NOCOUNT ON
        GO

        -- Create a large table
        IF OBJECT_ID('dbo.largeTable') IS NOT NULL
        DROP TABLE dbo.largeTable
        GO
        CREATE TABLE dbo.largeTable (

        [TABLE] VARCHAR(30) NULL,
        FIELD VARCHAR(30) NULL,
        [AFTER] VARCHAR(1000) NULL,
        [DATE] DATETIME,
        SORT_ID INT IDENTITY(1,1),

        pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
        )
        GO

        -- Populate table
        ;WITH cte AS (
        SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
        FROM master.sys.columns c1
        CROSS JOIN master.sys.columns c2
        CROSS JOIN master.sys.columns c3
        )
        INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
        SELECT
        x.tableName,
        y.field,
        z.[after],
        DATEADD( day, rn % 1111, '1 Jan 2012' )
        FROM cte c
        CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
        CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
        CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )

        CHECKPOINT

        GO 5

        EXEC sp_spaceused 'dbo.largeTable'
        GO

        SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
        FROM dbo.largeTable
        GO

        -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
        --DBCC DROPCLEANBUFFERS
        --DBCC FREEPROCCACHE
        --GO

        DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

        SELECT COUNT(*) AS COUNT
        FROM dbo.largeTable
        WHERE [AFTER] = 'R'
        AND [DATE] >= '2014-01-01'
        AND [DATE] <= '2015-01-01'

        SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
        GO

        -- Add the non-clustered columnstore
        CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
        GO

        -- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
        --DBCC DROPCLEANBUFFERS
        --DBCC FREEPROCCACHE
        --GO

        -- Check query again
        DECLARE @startDate DATETIME2 = SYSUTCDATETIME()

        SELECT COUNT(*) AS COUNT
        FROM dbo.largeTable
        WHERE [AFTER] = 'R'
        AND [DATE] >= '2014-01-01'
        AND [DATE] <= '2015-01-01'

        SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
        GO


        My results, 6 seconds v 0.08 seconds:



        enter image description here



        In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 50 secs ago









        Juan Castillo

        1031




        1031










        answered Feb 28 '15 at 10:25









        wBobwBob

        9,50221738




        9,50221738






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f94018%2foptimizing-queries-for-25-million-rows%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            SQL Server 17 - Attemping to backup to remote NAS but Access is denied

            Always On Availability groups resolving state after failover - Remote harden of transaction...

            Restoring from pg_dump with foreign key constraints