Poor SQL Server Performance With Nested “OR” Conditions In LEFT OUTER JOIN
I have a tricky bit of SQL that's giving me a bit of trouble. I'd like to introduce several new columns to a key table whose components may or may not be used based on user configuration (Dimension's 1-10). For architectural reasons these are stored as "UNIQUEIDENTIFIER"s.
However upon the introduction of these new dimension columns in my DB, I'm now facing performance problems in several other stored procs that manipulate data in this key table, one of which I will include below. It appears as though the method that I have implemented to support these new columns is inefficient and I'm not exactly sure why that is the case and the best way to remedy it.
(Included SQL)
--In real usage this would be set by user configuration
DECLARE @Dimension1Enabled BIT = 1
DECLARE @Dimension2Enabled BIT = 1
DECLARE @Dimension3Enabled BIT = 1
DECLARE @Dimension4Enabled BIT = 1
DECLARE @Dimension5Enabled BIT = 1
DECLARE @Dimension6Enabled BIT = 1
DECLARE @Dimension7Enabled BIT = 1
DECLARE @Dimension8Enabled BIT = 1
DECLARE @Dimension9Enabled BIT = 1
DECLARE @Dimension10Enabled BIT = 1
/*
--Here's the table definition for reference
CREATE TABLE [dbo].[Table1](
[Column1] [int] NOT NULL,
[Column2] [uniqueidentifier] NOT NULL,
[Column3] [uniqueidentifier] NOT NULL,
[Column4] [char](3) NOT NULL,
[Dimension1] [uniqueidentifier] NULL,
[Dimension2] [uniqueidentifier] NULL,
[Dimension3] [uniqueidentifier] NULL,
[Dimension4] [uniqueidentifier] NULL,
[Dimension5] [uniqueidentifier] NULL,
[Dimension6] [uniqueidentifier] NULL,
[Dimension7] [uniqueidentifier] NULL,
[Dimension8] [uniqueidentifier] NULL,
[Dimension9] [uniqueidentifier] NULL,
[Dimension10] [uniqueidentifier] NULL,
[Period] [int] NOT NULL,
[Amt] [numeric](28, 12) NOT NULL,
[EndingBal] [numeric](28, 12) NOT NULL,
[PlanningSourceID] [int] NOT NULL,
[Column5] [uniqueidentifier] NULL
) ON [PRIMARY]
--Simply contains 0-12
CREATE TABLE [dbo].[Table2] (
[FiscalPeriod] [int] NOT NULL
,[FiscalPeriodDescription] [varchar](50) NOT NULL
,[FiscalPeriodDescriptionMonthName] [varchar](20) NULL
,[FiscalPeriodName] [varchar](255) NULL
)
*/
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4
--This section causes great slowness
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = pfd.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = pfd.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = pfd.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = pfd.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = pfd.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = pfd.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = pfd.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = pfd.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = pfd.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = pfd.Dimension10)
AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL
With 50,000 records in Table1 the performance of this query is quite poor, several minutes at least. Remove the "@DimensionXEnabled = 0 OR" condition though and the query will run in several seconds.
I feel I have several options to optimize this, I could do one of the following:
Change the "Dimension" columns to NOT NULL and populate with 0 Guid Values - I believe this would have a performance impact because now I'm coping a large amount of data with each insert. I've tested this option and it does work better but there is a definite performance impact and I'd like to avoid it if possible.
Convert the query to dynamic SQL and generate the join conditions dynamically - Would need to do this in 100+ other places and have a DB full of dynamic SQL, not looking forward to this option either.
Wrap the "Dimension1 = Dimension1" in ISNULL statements rather than based the join condition off of a variable - I believe this would result in a inefficient join statement as the query would no longer be able to use indexes.
I feel that the SQL optimizer is making bad decisions with this query, if I write the query without the OR condition it run's quickly, even though the results are exactly the same. The query plans look almost identical other than that in the long running join, most of the processing time is used in the LEFT OUTER JOIN statements merge:
(With "OR" Conditions)
(Without "OR" Conditions)
I'm unsure how to proceed, my question is, is there a better option?
Update 1
I've been doing more research into this, attempting various ways to get around the poor performance, so far I haven't found any good solutions just more information:
Tried column "IS NULL" without success (
PFDE.Dimension1 IS NULL OR PFDE.Dimension1 = pfd.Dimension1
), ran even slower, that doesn't seem to do it.Tired join "Hints" (HASH, MERGE, LOOP), no performance impact here, in fact in some cases they just ran slower.
Tried option 3, actually this seems to run very quick, I think because Table2 doesn't have any indexes, this approach could be a problem in the future though...
I've tried dynamic SQL (@AaronBertrand recommendation), actually the results were somewhat underwhelming (30 sec+), not sure why it performed so poorly, but that might be something that I could improve with some further optimizations, I'm looking into this more..
I tried the "OPTION (RECOMPILE);" mentioned in this post. Saw definite improvements (~15 secs), still not as fast as not dimensions but maybe something else I can iterate on.
sql-server database-design performance query-performance optimization
add a comment |
I have a tricky bit of SQL that's giving me a bit of trouble. I'd like to introduce several new columns to a key table whose components may or may not be used based on user configuration (Dimension's 1-10). For architectural reasons these are stored as "UNIQUEIDENTIFIER"s.
However upon the introduction of these new dimension columns in my DB, I'm now facing performance problems in several other stored procs that manipulate data in this key table, one of which I will include below. It appears as though the method that I have implemented to support these new columns is inefficient and I'm not exactly sure why that is the case and the best way to remedy it.
(Included SQL)
--In real usage this would be set by user configuration
DECLARE @Dimension1Enabled BIT = 1
DECLARE @Dimension2Enabled BIT = 1
DECLARE @Dimension3Enabled BIT = 1
DECLARE @Dimension4Enabled BIT = 1
DECLARE @Dimension5Enabled BIT = 1
DECLARE @Dimension6Enabled BIT = 1
DECLARE @Dimension7Enabled BIT = 1
DECLARE @Dimension8Enabled BIT = 1
DECLARE @Dimension9Enabled BIT = 1
DECLARE @Dimension10Enabled BIT = 1
/*
--Here's the table definition for reference
CREATE TABLE [dbo].[Table1](
[Column1] [int] NOT NULL,
[Column2] [uniqueidentifier] NOT NULL,
[Column3] [uniqueidentifier] NOT NULL,
[Column4] [char](3) NOT NULL,
[Dimension1] [uniqueidentifier] NULL,
[Dimension2] [uniqueidentifier] NULL,
[Dimension3] [uniqueidentifier] NULL,
[Dimension4] [uniqueidentifier] NULL,
[Dimension5] [uniqueidentifier] NULL,
[Dimension6] [uniqueidentifier] NULL,
[Dimension7] [uniqueidentifier] NULL,
[Dimension8] [uniqueidentifier] NULL,
[Dimension9] [uniqueidentifier] NULL,
[Dimension10] [uniqueidentifier] NULL,
[Period] [int] NOT NULL,
[Amt] [numeric](28, 12) NOT NULL,
[EndingBal] [numeric](28, 12) NOT NULL,
[PlanningSourceID] [int] NOT NULL,
[Column5] [uniqueidentifier] NULL
) ON [PRIMARY]
--Simply contains 0-12
CREATE TABLE [dbo].[Table2] (
[FiscalPeriod] [int] NOT NULL
,[FiscalPeriodDescription] [varchar](50) NOT NULL
,[FiscalPeriodDescriptionMonthName] [varchar](20) NULL
,[FiscalPeriodName] [varchar](255) NULL
)
*/
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4
--This section causes great slowness
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = pfd.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = pfd.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = pfd.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = pfd.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = pfd.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = pfd.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = pfd.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = pfd.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = pfd.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = pfd.Dimension10)
AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL
With 50,000 records in Table1 the performance of this query is quite poor, several minutes at least. Remove the "@DimensionXEnabled = 0 OR" condition though and the query will run in several seconds.
I feel I have several options to optimize this, I could do one of the following:
Change the "Dimension" columns to NOT NULL and populate with 0 Guid Values - I believe this would have a performance impact because now I'm coping a large amount of data with each insert. I've tested this option and it does work better but there is a definite performance impact and I'd like to avoid it if possible.
Convert the query to dynamic SQL and generate the join conditions dynamically - Would need to do this in 100+ other places and have a DB full of dynamic SQL, not looking forward to this option either.
Wrap the "Dimension1 = Dimension1" in ISNULL statements rather than based the join condition off of a variable - I believe this would result in a inefficient join statement as the query would no longer be able to use indexes.
I feel that the SQL optimizer is making bad decisions with this query, if I write the query without the OR condition it run's quickly, even though the results are exactly the same. The query plans look almost identical other than that in the long running join, most of the processing time is used in the LEFT OUTER JOIN statements merge:
(With "OR" Conditions)
(Without "OR" Conditions)
I'm unsure how to proceed, my question is, is there a better option?
Update 1
I've been doing more research into this, attempting various ways to get around the poor performance, so far I haven't found any good solutions just more information:
Tried column "IS NULL" without success (
PFDE.Dimension1 IS NULL OR PFDE.Dimension1 = pfd.Dimension1
), ran even slower, that doesn't seem to do it.Tired join "Hints" (HASH, MERGE, LOOP), no performance impact here, in fact in some cases they just ran slower.
Tried option 3, actually this seems to run very quick, I think because Table2 doesn't have any indexes, this approach could be a problem in the future though...
I've tried dynamic SQL (@AaronBertrand recommendation), actually the results were somewhat underwhelming (30 sec+), not sure why it performed so poorly, but that might be something that I could improve with some further optimizations, I'm looking into this more..
I tried the "OPTION (RECOMPILE);" mentioned in this post. Saw definite improvements (~15 secs), still not as fast as not dimensions but maybe something else I can iterate on.
sql-server database-design performance query-performance optimization
add a comment |
I have a tricky bit of SQL that's giving me a bit of trouble. I'd like to introduce several new columns to a key table whose components may or may not be used based on user configuration (Dimension's 1-10). For architectural reasons these are stored as "UNIQUEIDENTIFIER"s.
However upon the introduction of these new dimension columns in my DB, I'm now facing performance problems in several other stored procs that manipulate data in this key table, one of which I will include below. It appears as though the method that I have implemented to support these new columns is inefficient and I'm not exactly sure why that is the case and the best way to remedy it.
(Included SQL)
--In real usage this would be set by user configuration
DECLARE @Dimension1Enabled BIT = 1
DECLARE @Dimension2Enabled BIT = 1
DECLARE @Dimension3Enabled BIT = 1
DECLARE @Dimension4Enabled BIT = 1
DECLARE @Dimension5Enabled BIT = 1
DECLARE @Dimension6Enabled BIT = 1
DECLARE @Dimension7Enabled BIT = 1
DECLARE @Dimension8Enabled BIT = 1
DECLARE @Dimension9Enabled BIT = 1
DECLARE @Dimension10Enabled BIT = 1
/*
--Here's the table definition for reference
CREATE TABLE [dbo].[Table1](
[Column1] [int] NOT NULL,
[Column2] [uniqueidentifier] NOT NULL,
[Column3] [uniqueidentifier] NOT NULL,
[Column4] [char](3) NOT NULL,
[Dimension1] [uniqueidentifier] NULL,
[Dimension2] [uniqueidentifier] NULL,
[Dimension3] [uniqueidentifier] NULL,
[Dimension4] [uniqueidentifier] NULL,
[Dimension5] [uniqueidentifier] NULL,
[Dimension6] [uniqueidentifier] NULL,
[Dimension7] [uniqueidentifier] NULL,
[Dimension8] [uniqueidentifier] NULL,
[Dimension9] [uniqueidentifier] NULL,
[Dimension10] [uniqueidentifier] NULL,
[Period] [int] NOT NULL,
[Amt] [numeric](28, 12) NOT NULL,
[EndingBal] [numeric](28, 12) NOT NULL,
[PlanningSourceID] [int] NOT NULL,
[Column5] [uniqueidentifier] NULL
) ON [PRIMARY]
--Simply contains 0-12
CREATE TABLE [dbo].[Table2] (
[FiscalPeriod] [int] NOT NULL
,[FiscalPeriodDescription] [varchar](50) NOT NULL
,[FiscalPeriodDescriptionMonthName] [varchar](20) NULL
,[FiscalPeriodName] [varchar](255) NULL
)
*/
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4
--This section causes great slowness
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = pfd.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = pfd.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = pfd.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = pfd.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = pfd.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = pfd.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = pfd.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = pfd.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = pfd.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = pfd.Dimension10)
AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL
With 50,000 records in Table1 the performance of this query is quite poor, several minutes at least. Remove the "@DimensionXEnabled = 0 OR" condition though and the query will run in several seconds.
I feel I have several options to optimize this, I could do one of the following:
Change the "Dimension" columns to NOT NULL and populate with 0 Guid Values - I believe this would have a performance impact because now I'm coping a large amount of data with each insert. I've tested this option and it does work better but there is a definite performance impact and I'd like to avoid it if possible.
Convert the query to dynamic SQL and generate the join conditions dynamically - Would need to do this in 100+ other places and have a DB full of dynamic SQL, not looking forward to this option either.
Wrap the "Dimension1 = Dimension1" in ISNULL statements rather than based the join condition off of a variable - I believe this would result in a inefficient join statement as the query would no longer be able to use indexes.
I feel that the SQL optimizer is making bad decisions with this query, if I write the query without the OR condition it run's quickly, even though the results are exactly the same. The query plans look almost identical other than that in the long running join, most of the processing time is used in the LEFT OUTER JOIN statements merge:
(With "OR" Conditions)
(Without "OR" Conditions)
I'm unsure how to proceed, my question is, is there a better option?
Update 1
I've been doing more research into this, attempting various ways to get around the poor performance, so far I haven't found any good solutions just more information:
Tried column "IS NULL" without success (
PFDE.Dimension1 IS NULL OR PFDE.Dimension1 = pfd.Dimension1
), ran even slower, that doesn't seem to do it.Tired join "Hints" (HASH, MERGE, LOOP), no performance impact here, in fact in some cases they just ran slower.
Tried option 3, actually this seems to run very quick, I think because Table2 doesn't have any indexes, this approach could be a problem in the future though...
I've tried dynamic SQL (@AaronBertrand recommendation), actually the results were somewhat underwhelming (30 sec+), not sure why it performed so poorly, but that might be something that I could improve with some further optimizations, I'm looking into this more..
I tried the "OPTION (RECOMPILE);" mentioned in this post. Saw definite improvements (~15 secs), still not as fast as not dimensions but maybe something else I can iterate on.
sql-server database-design performance query-performance optimization
I have a tricky bit of SQL that's giving me a bit of trouble. I'd like to introduce several new columns to a key table whose components may or may not be used based on user configuration (Dimension's 1-10). For architectural reasons these are stored as "UNIQUEIDENTIFIER"s.
However upon the introduction of these new dimension columns in my DB, I'm now facing performance problems in several other stored procs that manipulate data in this key table, one of which I will include below. It appears as though the method that I have implemented to support these new columns is inefficient and I'm not exactly sure why that is the case and the best way to remedy it.
(Included SQL)
--In real usage this would be set by user configuration
DECLARE @Dimension1Enabled BIT = 1
DECLARE @Dimension2Enabled BIT = 1
DECLARE @Dimension3Enabled BIT = 1
DECLARE @Dimension4Enabled BIT = 1
DECLARE @Dimension5Enabled BIT = 1
DECLARE @Dimension6Enabled BIT = 1
DECLARE @Dimension7Enabled BIT = 1
DECLARE @Dimension8Enabled BIT = 1
DECLARE @Dimension9Enabled BIT = 1
DECLARE @Dimension10Enabled BIT = 1
/*
--Here's the table definition for reference
CREATE TABLE [dbo].[Table1](
[Column1] [int] NOT NULL,
[Column2] [uniqueidentifier] NOT NULL,
[Column3] [uniqueidentifier] NOT NULL,
[Column4] [char](3) NOT NULL,
[Dimension1] [uniqueidentifier] NULL,
[Dimension2] [uniqueidentifier] NULL,
[Dimension3] [uniqueidentifier] NULL,
[Dimension4] [uniqueidentifier] NULL,
[Dimension5] [uniqueidentifier] NULL,
[Dimension6] [uniqueidentifier] NULL,
[Dimension7] [uniqueidentifier] NULL,
[Dimension8] [uniqueidentifier] NULL,
[Dimension9] [uniqueidentifier] NULL,
[Dimension10] [uniqueidentifier] NULL,
[Period] [int] NOT NULL,
[Amt] [numeric](28, 12) NOT NULL,
[EndingBal] [numeric](28, 12) NOT NULL,
[PlanningSourceID] [int] NOT NULL,
[Column5] [uniqueidentifier] NULL
) ON [PRIMARY]
--Simply contains 0-12
CREATE TABLE [dbo].[Table2] (
[FiscalPeriod] [int] NOT NULL
,[FiscalPeriodDescription] [varchar](50) NOT NULL
,[FiscalPeriodDescriptionMonthName] [varchar](20) NULL
,[FiscalPeriodName] [varchar](255) NULL
)
*/
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4
--This section causes great slowness
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = pfd.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = pfd.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = pfd.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = pfd.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = pfd.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = pfd.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = pfd.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = pfd.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = pfd.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = pfd.Dimension10)
AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL
With 50,000 records in Table1 the performance of this query is quite poor, several minutes at least. Remove the "@DimensionXEnabled = 0 OR" condition though and the query will run in several seconds.
I feel I have several options to optimize this, I could do one of the following:
Change the "Dimension" columns to NOT NULL and populate with 0 Guid Values - I believe this would have a performance impact because now I'm coping a large amount of data with each insert. I've tested this option and it does work better but there is a definite performance impact and I'd like to avoid it if possible.
Convert the query to dynamic SQL and generate the join conditions dynamically - Would need to do this in 100+ other places and have a DB full of dynamic SQL, not looking forward to this option either.
Wrap the "Dimension1 = Dimension1" in ISNULL statements rather than based the join condition off of a variable - I believe this would result in a inefficient join statement as the query would no longer be able to use indexes.
I feel that the SQL optimizer is making bad decisions with this query, if I write the query without the OR condition it run's quickly, even though the results are exactly the same. The query plans look almost identical other than that in the long running join, most of the processing time is used in the LEFT OUTER JOIN statements merge:
(With "OR" Conditions)
(Without "OR" Conditions)
I'm unsure how to proceed, my question is, is there a better option?
Update 1
I've been doing more research into this, attempting various ways to get around the poor performance, so far I haven't found any good solutions just more information:
Tried column "IS NULL" without success (
PFDE.Dimension1 IS NULL OR PFDE.Dimension1 = pfd.Dimension1
), ran even slower, that doesn't seem to do it.Tired join "Hints" (HASH, MERGE, LOOP), no performance impact here, in fact in some cases they just ran slower.
Tried option 3, actually this seems to run very quick, I think because Table2 doesn't have any indexes, this approach could be a problem in the future though...
I've tried dynamic SQL (@AaronBertrand recommendation), actually the results were somewhat underwhelming (30 sec+), not sure why it performed so poorly, but that might be something that I could improve with some further optimizations, I'm looking into this more..
I tried the "OPTION (RECOMPILE);" mentioned in this post. Saw definite improvements (~15 secs), still not as fast as not dimensions but maybe something else I can iterate on.
sql-server database-design performance query-performance optimization
sql-server database-design performance query-performance optimization
edited Jan 17 at 22:48
David Rogers
asked Jan 17 at 15:54
David RogersDavid Rogers
1095
1095
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.
DECLARE @sql nvarchar(max) = N'
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
...
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'
+ CASE WHEN @Dimension1Enabled = 1 THEN
N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN
N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END
+ ...
+ CASE WHEN @Dimension10Enabled = 1 THEN
N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END
+ N' AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL;';
EXEC sys.sp_executesql @sql;
I'd also drop the DISTINCT
- if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
add a comment |
IMO,Biggest mistake is the place where you are using CROSS JOIN
.
First self join [dbo].[Table1]
and filter the resultset then with this resultset use CROSS JOIN
.It will give tremendous boost .
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = PFDE1.Column1
AND PFDE.Column2 = PFDE1.Column2
AND PFDE.Column3 = PFDE1.Column3
AND PFDE.Column4 = PFDE1.Column4
AND PFDE.Period = PFDE1.Period
AND PFDE.Column5 = PFDE1.Column5
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = PFDE1.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = PFDE1.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = PFDE1.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = PFDE1.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = PFDE1.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = PFDE1.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = PFDE1.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = PFDE1.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = PFDE1.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = PFDE1.Dimension10)
WHERE pfde.Column1 IS NULL
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
Hope my point is clear,it is untested.
No doubts Dynamic Sql will further improve it, EXEC sys.sp_executesql @sql
I'd also drop the DISTINCT - if it's really necessary because of
duplicates then there is an underlying problem to be fixed as well.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227393%2fpoor-sql-server-performance-with-nested-or-conditions-in-left-outer-join%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.
DECLARE @sql nvarchar(max) = N'
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
...
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'
+ CASE WHEN @Dimension1Enabled = 1 THEN
N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN
N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END
+ ...
+ CASE WHEN @Dimension10Enabled = 1 THEN
N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END
+ N' AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL;';
EXEC sys.sp_executesql @sql;
I'd also drop the DISTINCT
- if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
add a comment |
Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.
DECLARE @sql nvarchar(max) = N'
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
...
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'
+ CASE WHEN @Dimension1Enabled = 1 THEN
N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN
N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END
+ ...
+ CASE WHEN @Dimension10Enabled = 1 THEN
N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END
+ N' AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL;';
EXEC sys.sp_executesql @sql;
I'd also drop the DISTINCT
- if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
add a comment |
Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.
DECLARE @sql nvarchar(max) = N'
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
...
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'
+ CASE WHEN @Dimension1Enabled = 1 THEN
N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN
N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END
+ ...
+ CASE WHEN @Dimension10Enabled = 1 THEN
N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END
+ N' AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL;';
EXEC sys.sp_executesql @sql;
I'd also drop the DISTINCT
- if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.
Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.
DECLARE @sql nvarchar(max) = N'
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
...
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'
+ CASE WHEN @Dimension1Enabled = 1 THEN
N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN
N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END
+ ...
+ CASE WHEN @Dimension10Enabled = 1 THEN
N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END
+ N' AND PFDE.Period = pfd.Period
AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL;';
EXEC sys.sp_executesql @sql;
I'd also drop the DISTINCT
- if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.
answered Jan 17 at 16:16
Aaron Bertrand♦Aaron Bertrand
150k18284482
150k18284482
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
add a comment |
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Yep, this would be a good implementation of option 2, but I'm kinda hoping to avoiding this as if I followed this approach here I'd have to resort to this everywhere else I implement this, there are 100+ places and I'd prefer not to maintain a DB full of dynamic SQL...
– David Rogers
Jan 17 at 16:21
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well 1 and 3 don't sound like viable options either, since 1 is a performance hit and 3 is not going to change anything (they boil down to the same logic), so... sometimes you just have to pick your poison. A pattern is just a pattern and if dealing with users screaming is more important than a little extra code, you need to make a choice.
– Aaron Bertrand♦
Jan 17 at 16:35
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well each has it's advantages and disadvantages, the issue is I'm not just making a decision for this one query, I'm making a decision for 10+ tables, 100+ procs, and many, many other associated items, it will take over a month to implement and test it properly, and once it's done I'll have to live with the consequences in over 100 production environment for years. So this question really is only the tip of the iceberg, a small manifestation of a broader architectural decision that I need to make and then live with. Though I accept that I may need to go with more than one option...
– David Rogers
Jan 17 at 23:37
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
Well we as a peer group can tell you your options, and tell you which ones we think are best, but we can't crowd-source your decision for you. As for the testing mentioned in your update, did you look at any execution plans? Any query that takes 15 or 20 seconds is already 100 times worse than it probably should be. Why is it taking so long? What are the wait types? Is it being blocked? Are you returning 40 billion rows to an SSMS grid over a slow network connection? If you read the kitchen sink article I posted, dynamic SQL can help for presence/absence of parameters, and option (recompile)
– Aaron Bertrand♦
Jan 18 at 12:16
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
...can help for data skew. You can also combine them. But given a 15 second query, using dynamic SQL or not is unlikely to fix that, there is something else bigger going on here.
– Aaron Bertrand♦
Jan 18 at 12:20
add a comment |
IMO,Biggest mistake is the place where you are using CROSS JOIN
.
First self join [dbo].[Table1]
and filter the resultset then with this resultset use CROSS JOIN
.It will give tremendous boost .
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = PFDE1.Column1
AND PFDE.Column2 = PFDE1.Column2
AND PFDE.Column3 = PFDE1.Column3
AND PFDE.Column4 = PFDE1.Column4
AND PFDE.Period = PFDE1.Period
AND PFDE.Column5 = PFDE1.Column5
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = PFDE1.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = PFDE1.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = PFDE1.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = PFDE1.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = PFDE1.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = PFDE1.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = PFDE1.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = PFDE1.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = PFDE1.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = PFDE1.Dimension10)
WHERE pfde.Column1 IS NULL
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
Hope my point is clear,it is untested.
No doubts Dynamic Sql will further improve it, EXEC sys.sp_executesql @sql
I'd also drop the DISTINCT - if it's really necessary because of
duplicates then there is an underlying problem to be fixed as well.
add a comment |
IMO,Biggest mistake is the place where you are using CROSS JOIN
.
First self join [dbo].[Table1]
and filter the resultset then with this resultset use CROSS JOIN
.It will give tremendous boost .
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = PFDE1.Column1
AND PFDE.Column2 = PFDE1.Column2
AND PFDE.Column3 = PFDE1.Column3
AND PFDE.Column4 = PFDE1.Column4
AND PFDE.Period = PFDE1.Period
AND PFDE.Column5 = PFDE1.Column5
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = PFDE1.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = PFDE1.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = PFDE1.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = PFDE1.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = PFDE1.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = PFDE1.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = PFDE1.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = PFDE1.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = PFDE1.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = PFDE1.Dimension10)
WHERE pfde.Column1 IS NULL
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
Hope my point is clear,it is untested.
No doubts Dynamic Sql will further improve it, EXEC sys.sp_executesql @sql
I'd also drop the DISTINCT - if it's really necessary because of
duplicates then there is an underlying problem to be fixed as well.
add a comment |
IMO,Biggest mistake is the place where you are using CROSS JOIN
.
First self join [dbo].[Table1]
and filter the resultset then with this resultset use CROSS JOIN
.It will give tremendous boost .
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = PFDE1.Column1
AND PFDE.Column2 = PFDE1.Column2
AND PFDE.Column3 = PFDE1.Column3
AND PFDE.Column4 = PFDE1.Column4
AND PFDE.Period = PFDE1.Period
AND PFDE.Column5 = PFDE1.Column5
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = PFDE1.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = PFDE1.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = PFDE1.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = PFDE1.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = PFDE1.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = PFDE1.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = PFDE1.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = PFDE1.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = PFDE1.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = PFDE1.Dimension10)
WHERE pfde.Column1 IS NULL
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
Hope my point is clear,it is untested.
No doubts Dynamic Sql will further improve it, EXEC sys.sp_executesql @sql
I'd also drop the DISTINCT - if it's really necessary because of
duplicates then there is an underlying problem to be fixed as well.
IMO,Biggest mistake is the place where you are using CROSS JOIN
.
First self join [dbo].[Table1]
and filter the resultset then with this resultset use CROSS JOIN
.It will give tremendous boost .
SELECT pfd.*
FROM (
SELECT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,p.FiscalPeriod AS Period
,0 AS Amt
,0 AS EndingBal
,PFDE1.Column5
FROM (
SELECT DISTINCT PFDE1.Column1
,PFDE1.Column2
,PFDE1.Column3
,PFDE1.Column4
,PFDE1.Dimension1
,PFDE1.Dimension2
,PFDE1.[Dimension3]
,PFDE1.[Dimension4]
,PFDE1.[Dimension5]
,PFDE1.[Dimension6]
,PFDE1.[Dimension7]
,PFDE1.[Dimension8]
,PFDE1.[Dimension9]
,PFDE1.[Dimension10]
,PFDE1.Column5
FROM [dbo].[Table1] PFDE1
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = PFDE1.Column1
AND PFDE.Column2 = PFDE1.Column2
AND PFDE.Column3 = PFDE1.Column3
AND PFDE.Column4 = PFDE1.Column4
AND PFDE.Period = PFDE1.Period
AND PFDE.Column5 = PFDE1.Column5
AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = PFDE1.Dimension1)
AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = PFDE1.Dimension2)
AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = PFDE1.Dimension3)
AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = PFDE1.Dimension4)
AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = PFDE1.Dimension5)
AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = PFDE1.Dimension6)
AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = PFDE1.Dimension7)
AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = PFDE1.Dimension8)
AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = PFDE1.Dimension9)
AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = PFDE1.Dimension10)
WHERE pfde.Column1 IS NULL
) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
Hope my point is clear,it is untested.
No doubts Dynamic Sql will further improve it, EXEC sys.sp_executesql @sql
I'd also drop the DISTINCT - if it's really necessary because of
duplicates then there is an underlying problem to be fixed as well.
answered 6 mins ago
KumarHarshKumarHarsh
81358
81358
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f227393%2fpoor-sql-server-performance-with-nested-or-conditions-in-left-outer-join%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown