Optimizing queries for 25+ million rows
I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).
There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.
columns needed: TABLE
, FIELD
, AFTER
, DATE
, and there is an index on each of DATE
and TABLE
.
After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.
CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)
INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'
Runnig this -> (216598 row(s) affected)
Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins
I want to run something like this for several queries:
SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'
It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.
How can I optimize this?
EDIT
I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.
I can't create a unique on any index since there is no unique field or rows.
I am using SQL Server 2012.
Table Info: It's a heap and has the following space usage:
name rows reserved data index_size unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB
sql-server sql-server-2012 performance-tuning
|
show 3 more comments
I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).
There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.
columns needed: TABLE
, FIELD
, AFTER
, DATE
, and there is an index on each of DATE
and TABLE
.
After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.
CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)
INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'
Runnig this -> (216598 row(s) affected)
Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins
I want to run something like this for several queries:
SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'
It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.
How can I optimize this?
EDIT
I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.
I can't create a unique on any index since there is no unique field or rows.
I am using SQL Server 2012.
Table Info: It's a heap and has the following space usage:
name rows reserved data index_size unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB
sql-server sql-server-2012 performance-tuning
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
If the queries you are trying to optimize are of the formSELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?
– ypercubeᵀᴹ
Feb 27 '15 at 15:26
2
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
3
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove theTABLE
andFIELD
columns from the#temp
table (all rows haveTABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)
– ypercubeᵀᴹ
Feb 27 '15 at 15:30
2
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE
statement). The down vote was because the question was not clear.
– Paul White♦
Feb 27 '15 at 15:55
|
show 3 more comments
I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).
There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.
columns needed: TABLE
, FIELD
, AFTER
, DATE
, and there is an index on each of DATE
and TABLE
.
After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.
CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)
INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'
Runnig this -> (216598 row(s) affected)
Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins
I want to run something like this for several queries:
SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'
It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.
How can I optimize this?
EDIT
I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.
I can't create a unique on any index since there is no unique field or rows.
I am using SQL Server 2012.
Table Info: It's a heap and has the following space usage:
name rows reserved data index_size unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB
sql-server sql-server-2012 performance-tuning
I am using MS SQL, and I have to run several queries on the same table on different criteria. At first I ran each query on the original table although they all share some filtering (i.e Date, status). This took a lot of time (around 2 minutes).
There are duplicates in data rows, and all indexes are NON-CLUSTERED. I am only interested in 4 columns for my criteria and the result should output the count only, for all queries.
columns needed: TABLE
, FIELD
, AFTER
, DATE
, and there is an index on each of DATE
and TABLE
.
After creating a temp table with only the fields I need, it went down to a 1:40 minutes, which is still very bad.
CREATE TABLE #TEMP
(
TABLE VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
AFTER VARCHAR(1000) NULL,
DATE DATETIME,
SORT_ID INT IDENTITY(1,1)
)
CREATE CLUSTERED INDEX IX_ADT ON #TEMP(SORT_ID)
INSERT INTO #TEMP (TABLE, FIELD, AFTER, DATE)
SELECT TABLE, FIELD, AFTER, DATE
FROM mytbl WITH (NOLOCK)
WHERE TABLE = 'OTB' AND
FIELD = 'STATUS'
Runnig this -> (216598 row(s) affected)
Since not all queries rely on date range, I didn't include it in the query. The problem is that it's taking well above 1 minute to insert only. The above insert took 1:19 mins
I want to run something like this for several queries:
SELECT COUNT(*) AS COUNT
FROM #TEMP
WHERE AFTER = 'R' AND
DATE >= '2014-01-01' AND
DATE <= '2015-01-01'
It's a problem with the insert more than that of the selection, but the temp has way less rows than the original table which could be better than going through the table several times.
How can I optimize this?
EDIT
I have removed the sort ID, I thought the problem was with mainly the select and not insert. It was a guess.
I can't create a unique on any index since there is no unique field or rows.
I am using SQL Server 2012.
Table Info: It's a heap and has the following space usage:
name rows reserved data index_size unused
mytbl 24869658 9204568 KB 3017952 KB 5816232 KB 370384 KB
sql-server sql-server-2012 performance-tuning
sql-server sql-server-2012 performance-tuning
edited Feb 27 '15 at 16:34
Paul White♦
49.5k14261415
49.5k14261415
asked Feb 27 '15 at 15:12
AtiehAtieh
159126
159126
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
If the queries you are trying to optimize are of the formSELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?
– ypercubeᵀᴹ
Feb 27 '15 at 15:26
2
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
3
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove theTABLE
andFIELD
columns from the#temp
table (all rows haveTABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)
– ypercubeᵀᴹ
Feb 27 '15 at 15:30
2
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE
statement). The down vote was because the question was not clear.
– Paul White♦
Feb 27 '15 at 15:55
|
show 3 more comments
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
If the queries you are trying to optimize are of the formSELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?
– ypercubeᵀᴹ
Feb 27 '15 at 15:26
2
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
3
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove theTABLE
andFIELD
columns from the#temp
table (all rows haveTABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)
– ypercubeᵀᴹ
Feb 27 '15 at 15:30
2
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (CREATE TABLE
statement). The down vote was because the question was not clear.
– Paul White♦
Feb 27 '15 at 15:55
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
If the queries you are trying to optimize are of the form
SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?– ypercubeᵀᴹ
Feb 27 '15 at 15:26
If the queries you are trying to optimize are of the form
SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?– ypercubeᵀᴹ
Feb 27 '15 at 15:26
2
2
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
3
3
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the
TABLE
and FIELD
columns from the #temp
table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)– ypercubeᵀᴹ
Feb 27 '15 at 15:30
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the
TABLE
and FIELD
columns from the #temp
table (all rows have TABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)– ypercubeᵀᴹ
Feb 27 '15 at 15:30
2
2
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (
CREATE TABLE
statement). The down vote was because the question was not clear.– Paul White♦
Feb 27 '15 at 15:55
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (
CREATE TABLE
statement). The down vote was because the question was not clear.– Paul White♦
Feb 27 '15 at 15:55
|
show 3 more comments
2 Answers
2
active
oldest
votes
The question is mainly about how to optimize the select statement:
SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'
Removing the redundant projections and adding the presumed dbo
schema:
SELECT [AFTER], [DATE]
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';
Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:
- Scan the heap entirely (3GB+); or
- Locate rows matching
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.
Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB'
and [FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0)
or FORCESCAN
hint to scan the heap.
Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.
Sorry Paul, there is:IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?
– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includesDATE
andAFTER
, using that index will also require an RID lookup. Check the execution plans.
– Paul White♦
Feb 27 '15 at 16:53
add a comment |
I think there is a case for changing the indexes here because:
- a) you have a task to do (these multiple queries)
- b) data warehouse volumes (25+ million rows) and
- c) a performance problem.
This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.
Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.
I set up a simple test rig to mimic your setup, and saw a good improvement in performance:
USE tempdb
GO
SET NOCOUNT ON
GO
-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (
[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),
pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO
-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )
CHECKPOINT
GO 5
EXEC sp_spaceused 'dbo.largeTable'
GO
SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO
-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO
My results, 6 seconds v 0.08 seconds:
In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f94018%2foptimizing-queries-for-25-million-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The question is mainly about how to optimize the select statement:
SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'
Removing the redundant projections and adding the presumed dbo
schema:
SELECT [AFTER], [DATE]
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';
Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:
- Scan the heap entirely (3GB+); or
- Locate rows matching
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.
Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB'
and [FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0)
or FORCESCAN
hint to scan the heap.
Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.
Sorry Paul, there is:IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?
– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includesDATE
andAFTER
, using that index will also require an RID lookup. Check the execution plans.
– Paul White♦
Feb 27 '15 at 16:53
add a comment |
The question is mainly about how to optimize the select statement:
SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'
Removing the redundant projections and adding the presumed dbo
schema:
SELECT [AFTER], [DATE]
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';
Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:
- Scan the heap entirely (3GB+); or
- Locate rows matching
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.
Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB'
and [FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0)
or FORCESCAN
hint to scan the heap.
Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.
Sorry Paul, there is:IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?
– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includesDATE
andAFTER
, using that index will also require an RID lookup. Check the execution plans.
– Paul White♦
Feb 27 '15 at 16:53
add a comment |
The question is mainly about how to optimize the select statement:
SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'
Removing the redundant projections and adding the presumed dbo
schema:
SELECT [AFTER], [DATE]
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';
Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:
- Scan the heap entirely (3GB+); or
- Locate rows matching
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.
Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB'
and [FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0)
or FORCESCAN
hint to scan the heap.
Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.
The question is mainly about how to optimize the select statement:
SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'
Removing the redundant projections and adding the presumed dbo
schema:
SELECT [AFTER], [DATE]
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';
Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE])
SQL Server has two main options:
- Scan the heap entirely (3GB+); or
- Locate rows matching
[TABLE] = 'OTB'
and[FIELD] = 'STATUS'
(usingIDX6
), then perform a heap (RID) lookup per row to retrieve the[AFTER]
and[DATE]
columns.
Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB'
and [FIELD] = 'STATUS'
predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0)
or FORCESCAN
hint to scan the heap.
Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.
The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).
If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:
CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';
Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.
edited Feb 27 '15 at 16:59
answered Feb 27 '15 at 16:33
Paul White♦Paul White
49.5k14261415
49.5k14261415
Sorry Paul, there is:IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?
– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includesDATE
andAFTER
, using that index will also require an RID lookup. Check the execution plans.
– Paul White♦
Feb 27 '15 at 16:53
add a comment |
Sorry Paul, there is:IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?
– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includesDATE
andAFTER
, using that index will also require an RID lookup. Check the execution plans.
– Paul White♦
Feb 27 '15 at 16:53
Sorry Paul, there is:
IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?– Atieh
Feb 27 '15 at 16:50
Sorry Paul, there is:
IDX6 nonclustered located on PRIMARY TABLE, FIELD
. Maybe this would change things you mentioned?– Atieh
Feb 27 '15 at 16:50
@Atieh Not materially, no. Unless it also includes
DATE
and AFTER
, using that index will also require an RID lookup. Check the execution plans.– Paul White♦
Feb 27 '15 at 16:53
@Atieh Not materially, no. Unless it also includes
DATE
and AFTER
, using that index will also require an RID lookup. Check the execution plans.– Paul White♦
Feb 27 '15 at 16:53
add a comment |
I think there is a case for changing the indexes here because:
- a) you have a task to do (these multiple queries)
- b) data warehouse volumes (25+ million rows) and
- c) a performance problem.
This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.
Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.
I set up a simple test rig to mimic your setup, and saw a good improvement in performance:
USE tempdb
GO
SET NOCOUNT ON
GO
-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (
[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),
pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO
-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )
CHECKPOINT
GO 5
EXEC sp_spaceused 'dbo.largeTable'
GO
SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO
-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO
My results, 6 seconds v 0.08 seconds:
In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.
add a comment |
I think there is a case for changing the indexes here because:
- a) you have a task to do (these multiple queries)
- b) data warehouse volumes (25+ million rows) and
- c) a performance problem.
This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.
Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.
I set up a simple test rig to mimic your setup, and saw a good improvement in performance:
USE tempdb
GO
SET NOCOUNT ON
GO
-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (
[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),
pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO
-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )
CHECKPOINT
GO 5
EXEC sp_spaceused 'dbo.largeTable'
GO
SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO
-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO
My results, 6 seconds v 0.08 seconds:
In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.
add a comment |
I think there is a case for changing the indexes here because:
- a) you have a task to do (these multiple queries)
- b) data warehouse volumes (25+ million rows) and
- c) a performance problem.
This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.
Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.
I set up a simple test rig to mimic your setup, and saw a good improvement in performance:
USE tempdb
GO
SET NOCOUNT ON
GO
-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (
[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),
pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO
-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )
CHECKPOINT
GO 5
EXEC sp_spaceused 'dbo.largeTable'
GO
SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO
-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO
My results, 6 seconds v 0.08 seconds:
In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.
I think there is a case for changing the indexes here because:
- a) you have a task to do (these multiple queries)
- b) data warehouse volumes (25+ million rows) and
- c) a performance problem.
This would also be a good use case for non-clustered columnstore indexes introduced in SQL Server 2012, ie summarise / aggregate a few columns on a large table with many columns.
Although these indexes have a side-effect of making the table read-only (with the exception of partition switching), they can transform the performance of aggregate queries under the right conditions. The read-only aspect can be managed, either by dropping and recreating the index or simple partition switch data into the table.
I set up a simple test rig to mimic your setup, and saw a good improvement in performance:
USE tempdb
GO
SET NOCOUNT ON
GO
-- Create a large table
IF OBJECT_ID('dbo.largeTable') IS NOT NULL
DROP TABLE dbo.largeTable
GO
CREATE TABLE dbo.largeTable (
[TABLE] VARCHAR(30) NULL,
FIELD VARCHAR(30) NULL,
[AFTER] VARCHAR(1000) NULL,
[DATE] DATETIME,
SORT_ID INT IDENTITY(1,1),
pad VARCHAR(100) DEFAULT REPLICATE( '$', 100 )
)
GO
-- Populate table
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
SELECT
x.tableName,
y.field,
z.[after],
DATEADD( day, rn % 1111, '1 Jan 2012' )
FROM cte c
CROSS JOIN ( VALUES ( 'OTB' ), ( 'AAA' ), ( 'BBB' ), ( 'CCCC' ) ) x ( tableName )
CROSS JOIN ( VALUES ( 'STATUS' ), ( 'TIME' ), ( 'POWER' ) ) y ( field )
CROSS JOIN ( VALUES ( 'R' ), ( 'X' ), ( 'Z' ), ( 'A' ) ) z ( [after] )
CHECKPOINT
GO 5
EXEC sp_spaceused 'dbo.largeTable'
GO
SELECT MIN([DATE]) xmin, MAX([DATE]) xmax, FORMAT( COUNT(*), '#,#' ) records
FROM dbo.largeTable
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff1
GO
-- Add the non-clustered columnstore
CREATE NONCLUSTERED COLUMNSTORE INDEX _cs ON dbo.largeTable ( [TABLE], FIELD, [AFTER], [DATE] )
GO
-- Optionally clear cache for more comparable results; DO NOT RUN ON PRODUCTION SYSTEM!!
--DBCC DROPCLEANBUFFERS
--DBCC FREEPROCCACHE
--GO
-- Check query again
DECLARE @startDate DATETIME2 = SYSUTCDATETIME()
SELECT COUNT(*) AS COUNT
FROM dbo.largeTable
WHERE [AFTER] = 'R'
AND [DATE] >= '2014-01-01'
AND [DATE] <= '2015-01-01'
SELECT DATEDIFF( millisecond, @startDate, SYSUTCDATETIME() ) diff2
GO
My results, 6 seconds v 0.08 seconds:
In summary, try and build a case with your boss to have the indexes changed or at least create some kind of overnight process where these records are carved off to a read-only reporting table/database where you can do your work, and add indexing appropriate for that workload.
edited 50 secs ago
Juan Castillo
1031
1031
answered Feb 28 '15 at 10:25
wBobwBob
9,50221738
9,50221738
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f94018%2foptimizing-queries-for-25-million-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@MikaelEriksson I cannot modify production tables..
– Atieh
Feb 27 '15 at 15:24
If the queries you are trying to optimize are of the form
SELECT COUNT(*) AS COUNT FROM original_table WHERE AFTER = 'R' AND DATE >= '2014-01-01' AND DATE < '2015-01-01'
, why don't you try to optimize each (query) separately? Are you not allowed to add indexes to the table?– ypercubeᵀᴹ
Feb 27 '15 at 15:26
2
You need to determine why it's slow. Is it being blocked? Is it waiting for tempdb to grow? Is the execution plan abysmal? Nobody can fix "my query is slow" without more details...
– Aaron Bertrand♦
Feb 27 '15 at 15:27
3
Well, seems like a lost cause to me ("I'm not allowed to optimize anything, so lets just push 200K rows in a temp table every time we need to run some queries"). But you could remove the
TABLE
andFIELD
columns from the#temp
table (all rows haveTABLE = 'OTB' AND FIELD = 'STATUS'
for the specific temp table after all.)– ypercubeᵀᴹ
Feb 27 '15 at 15:30
2
I did ask for an edit and improvements by adding a detailed (and polite) comment. That's what comments are for. You should also tag your question with the version of SQL Server you are using (e.g. SQL Server 2014). DDL for the table might be helpful too (
CREATE TABLE
statement). The down vote was because the question was not clear.– Paul White♦
Feb 27 '15 at 15:55