DBPedias

Your Database Knowledge Community

Joe Casella

  1. Meme Monday: Please, Don’t Ask Stupid Questions

    I’ll never forget my very first day as a philosophy TA. I’d been assigned to”Introduction to Ethics” and Professor Oldenquist was describing his expectations and class mechanics. After explaining that, “no, ethical theory is not just a matter of opinion” and “yes, we can give you an objective grade”, he walked to the edge of the stage, smiled wryly, and said: “There’s one more thing I’d like to clear up: you may hear some of your professors say that ‘there’s no such thing as a stupid question’. They’re wrong. That’s rubbish. There are stupid questions, please don’t ask them in my class.”

    He paused for effect and continued, “A stupid question is one whose sole purpose is to make the speaker look bad or the questioner look good.  Please, don’t ask them in my class”.

    We still got our share of “stupid” questions (embarrassing new TAs is a Big 10 sport, after all) but I think that bit of administrivia was his most important lesson of the quarter.

    Which brings me to “Stupid SQL Questions”, the subject of this month’s MemeMonday creative writing assignment.

    Whenever I’m at a user group or seminar and the speaker says “there’s no such thing as a stupid question.”   I remember Professor Oldenquist.  Yes, there are stupid questions.  In a professional context, a stupid question is one that’s designed to make the speaker look bad, the questioner look good, or one that derails the speaker from the presentation’s topic to “a problem I encountered this week.”

    Yes, there are stupid SQL questions: please, don’t ask them.


  2. Meme Monday: Please, Don’t Ask Stupid Questions

    I’ll never forget my very first day as a philosophy TA. I’d been assigned to”Introduction to Ethics” and Professor Oldenquist was describing his expectations and class mechanics. After explaining that, “no, ethical theory is not just a matter of opinion” and “yes, we can give you an objective grade”, he walked to the edge of the stage, smiled wryly, and said: “There’s one more thing I’d like to clear up: you may hear some of your professors say that ‘there’s no such thing as a stupid question’. They’re wrong. That’s rubbish. There are stupid questions, please don’t ask them in my class.”

    He paused for effect and continued, “A stupid question is one whose sole purpose is to make the speaker look bad or the questioner look good.  Please, don’t ask them in my class”.

    We still got our share of “stupid” questions (embarrassing new TAs is a Big 10 sport, after all) but I think that bit of administrivia was his most important lesson of the quarter.

    Which brings me to “Stupid SQL Questions”, the subject of this month’s MemeMonday creative writing assignment.

    Whenever I’m at a user group or seminar and the speaker says “there’s no such thing as a stupid question.”   I remember Professor Oldenquist.  Yes, there are stupid questions.  In a professional context, a stupid question is one that’s designed to make the speaker look bad, the questioner look good, or one that derails the speaker from the presentation’s topic to “a problem I encountered this week.”

    Yes, there are stupid SQL questions: please, don’t ask them.


  3. Finding an Expression Definition in a Graphic Execution Plan

    Today on Twitter #SqlHelp, someone asked how to find the definition of an expression in a graphic execution plan. I thought I’d pass along my answer in case it helps someone else.

    Consider a very simple query, which includes a scalar operator, such as a SUM():

    1. SELECT ProductID
    2.       , TotalQty = sum(sd.OrderQty)
    3.   FROM AdventureWorks2008R2.Sales.SalesOrderDetail  sd
    4. GROUP BY ProductID

    It produces a simple graphical execution plan like this:

    Expression_01

    Examining the Hash Match operator shows an output named “EXPR1002”. It’s a calculated expression, but what’s its definition?

    Expression_02

    To find a calculated expression’s definition, select the operator and hit F4 to bring up the Properties window.  The definition is in the Defined Properties entry.  Here, it’s the SUM() of the OrderQuantity.

    image

    In a more complex query the output of one operator may be used by another operator, resulting in expressions operating on expressions, so you may need to check more than one step to find the fields involved.


  4. Finding an Expression Definition in a Graphic Execution Plan

    Today on Twitter #SqlHelp, someone asked how to find the definition of an expression in a graphic execution plan. I thought I’d pass along my answer in case it helps someone else.

    Consider a very simple query, which includes a scalar operator, such as a SUM():

    1. SELECT ProductID
    2.       , TotalQty = sum(sd.OrderQty)
    3.   FROM AdventureWorks2008R2.Sales.SalesOrderDetail  sd
    4. GROUP BY ProductID

    It produces a simple graphical execution plan like this:

    Expression_01

    Examining the Hash Match operator shows an output named “EXPR1002”. It’s a calculated expression, but what’s its definition?

    Expression_02

    To find a calculated expression’s definition, select the operator and hit F4 to bring up the Properties window.  The definition is in the Defined Properties entry.  Here, it’s the SUM() of the OrderQuantity.

    image

    In a more complex query the output of one operator may be used by another operator, resulting in expressions operating on expressions, so you may need to check more than one step to find the fields involved.


  5. TSQL-Tuesday # 18: Using a Recursive CTE to Create a Calendar Table

    TsqlLogo_thumb4[4]

    This month’s TSQL-Tuesday topic is so much fun to write about that I’ve decided to create a second post.

    When Recursive CTEs were introduced in SQL Server 2005, many people gave examples of how they can be used to generate a “numbers” table.  (I’ll ignore the debate about whether it’s the best way to generate the table, or if we should even care).   Well, once you have a numbers table it’s easy to create a Calendar table or Date Dimension.

    The script consists of three parts.  The first section is a recursive CTE, using a method attributed to Itzik Ben-Gan, that generates a numbers table.  In the second section, each number in the table is added to your starting date to generate a list of days between the start and end dates.  Each date is also associated with the first date of its fiscal year using a creative approach I learned from this forum post.  This “first date” of a fiscal year will be used to perform the fiscal date calculations in the final step. Finally, the days CTE is used with a series of date functions to populate the calendar table or date dimension.

    1.  
    2.  
    3. DECLARE @StartDate smalldatetime = '01/01/2009' –First Calendar date to include in table
    4. DECLARE @EndDate smalldatetime = '12/31/2011' –Last calendar date to include in the table
    5. DECLARE @FiscalBeginMonth smallint = 10  –Month in which the fiscal year begins e.g. October 1
    6. ;
    7.  
    8.    WITH –This secton generates the number table
    9.         E00(N) AS (SELECT 1 UNION ALL SELECT 1),
    10.         E02(N) AS (SELECT 1 FROM E00 a, E00 b),
    11.         E04(N) AS (SELECT 1 FROM E02 a, E02 b),
    12.         E08(N) AS (SELECT 1 FROM E04 a, E04 b),
    13.         E16(N) AS (SELECT 1 FROM E08 a, E08 b),
    14.         E32(N) AS (SELECT 1 FROM E16 a, E16 b),
    15.    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32),
    16.     –This CTE generates a list of calendar dates
    17. CalendarBase as (
    18. SELECT
    19.         DateKey = n
    20.       , CalendarDate = DATEADD(day, n - 1, @StartDate )
    21.       , FiscalYearBegin = DATEADD(MONTH, DATEDIFF(MONTH, 0, DATEADD(DAY, n - 1, @StartDate )) - (12 + DATEPART(MONTH, DATEADD(day, n - 1, @StartDate )) - @FiscalBeginMonth) % 12, 0)
    22.    FROM cteTally
    23.   WHERE N <= DATEDIFF(day, @StartDate , @EndDate +1)
    24. )
    25.  
    26. –Finally, use the list of calendar dates to fill the date dimension
    27. SELECT DateKey
    28.     , IsoDate       = CONVERT(char(8), CalendarDate, 112)
    29.     , CalendarDate
    30.     , CalendarYear  = YEAR(CalendarDate)
    31.     , CalendarQuarter =  (DATEPART(QUARTER,CalendarDate) )
    32.     , CalendarMonth = MONTH(CalendarDate)
    33.     , CalendarDay   = DATEPART(DAY, CalendarDate)
    34.     , DayofWk       = DATEPART(Dw, CalendarDate)
    35.     , CalendarWeekOfMonth = DATEDIFF(week, DATEADD(day,1, CalendarDate-DAY(CalendarDate) + 1) -1, CalendarDate) +1
    36.     , WeekofYr      = DATEPART(WEEK, CalendarDate)
    37.     , DayofYr       = DATEPART(DAYOFYEAR, CalendarDate)
    38.     , NameMonth     = DATENAME(Month, CalendarDate)
    39.     , NameDay       = DATENAME (Weekday,CalendarDate )
    40.     , FiscalYear    = CASE WHEN MONTH(CalendarDate)  < @FiscalBeginMonth THEN  YEAR(CalendarDate)
    41.                            ELSE YEAR(CalendarDate) + 1 END
    42.     , FiscalMonth    = DATEDIFF( MONTH, FiscalYearBegin, CalendarDate) + 1
    43.     , FiscalWeek     = DATEDIFF( WEEK, FiscalYearBegin, CalendarDate) + 1
    44.     , FiscalDay      = DATEDIFF( day, FiscalYearBegin, CalendarDate) + 1
    45.   FROM CalendarBase

    Here’s a sample of the output:

    1. DateKey         IsoDate  CalendarDate            CalendarYear CalendarQuarter CalendarMonth CalendarDay DayofWk     CalendarWeekOfMonth WeekofYr    DayofYr     NameMonth            NameDay        FiscalYear  FiscalMonth FiscalWeek  FiscalDay
    2. ————— ——– ———————– ———— ————— ————- ———– ———– ——————- ———– ———– ——————– ————– ———– ———– ———– ———–
    3. 1               20100510 2010-05-10 00:00:00     2010         2               5             10          2           3                   20          130         May                  Monday         2010        8           33          222
    4. 2               20100511 2010-05-11 00:00:00     2010         2               5             11          3           3                   20          131         May                  Tuesday        2010        8           33          223
    5. 3               20100512 2010-05-12 00:00:00     2010         2               5             12          4           3                   20          132         May                  Wednesday      2010        8           33          224
    6. 4               20100513 2010-05-13 00:00:00     2010         2               5             13          5           3                   20          133         May                  Thursday       2010        8           33          225
    7. 5               20100514 2010-05-14 00:00:00     2010         2               5             14          6           3                   20          134         May                  Friday         2010        8           33          226

    Do I get extra credit for writing two posts?


  6. TSQL-Tuesday # 18: Using a Recursive CTE to Create a Calendar Table

    TsqlLogo_thumb4[4]

    This month’s TSQL-Tuesday topic is so much fun to write about that I’ve decided to create a second post.

    When Recursive CTEs were introduced in SQL Server 2005, many people gave examples of how they can be used to generate a “numbers” table.  (I’ll ignore the debate about whether it’s the best way to generate the table, or if we should even care).   Well, once you have a numbers table it’s easy to create a Calendar table or Date Dimension.

    The script consists of three parts.  The first section is a recursive CTE, using a method attributed to Itzik Ben-Gan, that generates a numbers table.  In the second section, each number in the table is added to your starting date to generate a list of days between the start and end dates.  Each date is also associated with the first date of its fiscal year using a creative approach I learned from this forum post.  This “first date” of a fiscal year will be used to perform the fiscal date calculations in the final step. Finally, the days CTE is used with a series of date functions to populate the calendar table or date dimension.

    1.  
    2.  
    3. DECLARE @StartDate smalldatetime = '01/01/2009' –First Calendar date to include in table
    4. DECLARE @EndDate smalldatetime = '12/31/2011' –Last calendar date to include in the table
    5. DECLARE @FiscalBeginMonth smallint = 10  –Month in which the fiscal year begins e.g. October 1
    6. ;
    7.  
    8.    WITH –This secton generates the number table
    9.         E00(N) AS (SELECT 1 UNION ALL SELECT 1),
    10.         E02(N) AS (SELECT 1 FROM E00 a, E00 b),
    11.         E04(N) AS (SELECT 1 FROM E02 a, E02 b),
    12.         E08(N) AS (SELECT 1 FROM E04 a, E04 b),
    13.         E16(N) AS (SELECT 1 FROM E08 a, E08 b),
    14.         E32(N) AS (SELECT 1 FROM E16 a, E16 b),
    15.    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32),
    16.     –This CTE generates a list of calendar dates
    17. CalendarBase as (
    18. SELECT
    19.         DateKey = n
    20.       , CalendarDate = DATEADD(day, n - 1, @StartDate )
    21.       , FiscalYearBegin = DATEADD(MONTH, DATEDIFF(MONTH, 0, DATEADD(DAY, n - 1, @StartDate )) - (12 + DATEPART(MONTH, DATEADD(day, n - 1, @StartDate )) - @FiscalBeginMonth) % 12, 0)
    22.    FROM cteTally
    23.   WHERE N <= DATEDIFF(day, @StartDate , @EndDate +1)
    24. )
    25.  
    26. –Finally, use the list of calendar dates to fill the date dimension
    27. SELECT DateKey
    28.     , IsoDate       = CONVERT(char(8), CalendarDate, 112)
    29.     , CalendarDate
    30.     , CalendarYear  = YEAR(CalendarDate)
    31.     , CalendarQuarter =  (DATEPART(QUARTER,CalendarDate) )
    32.     , CalendarMonth = MONTH(CalendarDate)
    33.     , CalendarDay   = DATEPART(DAY, CalendarDate)
    34.     , DayofWk       = DATEPART(Dw, CalendarDate)
    35.     , CalendarWeekOfMonth = DATEDIFF(week, DATEADD(day,1, CalendarDate-DAY(CalendarDate) + 1) -1, CalendarDate) +1
    36.     , WeekofYr      = DATEPART(WEEK, CalendarDate)
    37.     , DayofYr       = DATEPART(DAYOFYEAR, CalendarDate)
    38.     , NameMonth     = DATENAME(Month, CalendarDate)
    39.     , NameDay       = DATENAME (Weekday,CalendarDate )
    40.     , FiscalYear    = CASE WHEN MONTH(CalendarDate)  < @FiscalBeginMonth THEN  YEAR(CalendarDate)
    41.                            ELSE YEAR(CalendarDate) + 1 END
    42.     , FiscalMonth    = DATEDIFF( MONTH, FiscalYearBegin, CalendarDate) + 1
    43.     , FiscalWeek     = DATEDIFF( WEEK, FiscalYearBegin, CalendarDate) + 1
    44.     , FiscalDay      = DATEDIFF( day, FiscalYearBegin, CalendarDate) + 1
    45.   FROM CalendarBase

    Here’s a sample of the output:

    1. DateKey         IsoDate  CalendarDate            CalendarYear CalendarQuarter CalendarMonth CalendarDay DayofWk     CalendarWeekOfMonth WeekofYr    DayofYr     NameMonth            NameDay        FiscalYear  FiscalMonth FiscalWeek  FiscalDay
    2. ————— ——– ———————– ———— ————— ————- ———– ———– ——————- ———– ———– ——————– ————– ———– ———– ———– ———–
    3. 1               20100510 2010-05-10 00:00:00     2010         2               5             10          2           3                   20          130         May                  Monday         2010        8           33          222
    4. 2               20100511 2010-05-11 00:00:00     2010         2               5             11          3           3                   20          131         May                  Tuesday        2010        8           33          223
    5. 3               20100512 2010-05-12 00:00:00     2010         2               5             12          4           3                   20          132         May                  Wednesday      2010        8           33          224
    6. 4               20100513 2010-05-13 00:00:00     2010         2               5             13          5           3                   20          133         May                  Thursday       2010        8           33          225
    7. 5               20100514 2010-05-14 00:00:00     2010         2               5             14          6           3                   20          134         May                  Friday         2010        8           33          226

    Do I get extra credit for writing two posts?


  7. T-SQL Tuesday #18-CTEs: What a CTE is Not

    TsqlLogo

    A common beginner’s misconception about Common Table Expressions (CTEs) is that they are a real result set, like those produced by a temporary table or table variable. In fact, the opposite is true: they’re really just a way to simplify and encapsulate your code. For this month’s TSQL-Tuesday, focused on CTEs, I want to illustrate this difference with an example of how a (non-recursive) CTE can be both more and less efficient than a temporary table at accessing data.

    Let’s start with a simple query comparing the sales volume and average price between a given year and the previous year expressed as both a CTE and a derived table:

    1.  
    2. /* CTE Version */
    3. WITH SalesData as
    4. (
    5. SELECT sd.ProductId
    6.      , SalesYr  = YEAR(sh.OrderDate)
    7.      , AvgPrice = Avg(UnitPrice)
    8.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    9.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    10.     ON sh.SalesOrderID = sd.SalesOrderID
    11. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    12. )
    13.  
    14. SELECT s1.ProductId
    15.     , s1.SalesYr  as year1
    16.     , s2.SalesYr  as year2
    17.     , s1.AvgPrice as Year1AvgPrice
    18.     , s2.AvgPrice as Year2AvgPrice
    19.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    20.   FROM SalesData s1
    21. INNER JOIN SalesData s2
    22.     ON s1.Productid = s2.ProductId
    23.    AND s1.SalesYr = s2.SalesYr -1
    24.   
    25. /* Derived Table Version */
    26. SELECT s1.ProductId
    27.     , s1.SalesYr  as year1
    28.     , s2.SalesYr  as year2
    29.     , s1.AvgPrice as Year1AvgPrice
    30.     , s2.AvgPrice as Year2AvgPrice
    31.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    32.   FROM (
    33.           SELECT sd.ProductId
    34.              , SalesYr  = YEAR(sh.OrderDate)
    35.              , AvgPrice = Avg(UnitPrice)
    36.           FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    37.           JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    38.             ON sh.SalesOrderID = sd.SalesOrderID
    39.         GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    40.        ) s1
    41.   INNER JOIN
    42.     (
    43.         SELECT sd.ProductId
    44.              , SalesYr  = YEAR(sh.OrderDate)
    45.              , AvgPrice = Avg(UnitPrice)
    46.           FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    47.           JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    48.             ON sh.SalesOrderID = sd.SalesOrderID
    49.         GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    50.     )             s2
    51.     ON s1.Productid = s2.ProductId
    52.   AND s1.SalesYr = s2.SalesYr -1

    Both the CTE and the derived table generate the same execution plan:

    image

    Statistics IO for both queries is also identical:

    1.  
    2. (347 row(s) affected)
    3. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4. Table ‘SalesOrderDetail’. Scan count 2, logical reads 2480, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    5. Table ‘SalesOrderHeader’. Scan count 2, logical reads 1372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    So, from an execution standpoint, the derived table and CTE are essentially the same. Now, consider the same query using a temporary table:

    1.  
    2.   /* Aggregate Data in Temp Table */
    3. SELECT sd.ProductId
    4.      , SalesYr  = YEAR(sh.OrderDate)
    5.      , AvgPrice = Avg(UnitPrice)
    6.   INTO #TmpSales
    7.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    8.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    9.     ON sh.SalesOrderID = sd.SalesOrderID
    10. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    11.  
    12. /*Return Sales Info */
    13. SELECT s1.ProductId
    14.     , s1.SalesYr  as year1
    15.     , s2.SalesYr  as year2
    16.     , s1.AvgPrice as Year1AvgPrice
    17.     , s2.AvgPrice as Year2AvgPrice
    18.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    19.   FROM #TmpSales s1
    20. INNER JOIN #TmpSales s2
    21.     ON s1.Productid = s2.ProductId
    22.    AND s1.SalesYr = s2.SalesYr -1

    Along with its query plan:

    image

    And Statistics IO:

    1. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    2. Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    3. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4.  
    5. (613 row(s) affected)
    6.  
    7. (347 row(s) affected)
    8. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    9. Table ‘#TmpSales’. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that in both the CTE and derived table versions, SalesOrderHeader and SalesOrderDetail are each accessed twice and the aggregations are each calculated twice.  The temporary table version accesses each table once, aggregates the results, and then uses the smaller temporary table to produce the final results.  As a result, the CTE version is nearly twice as expensive as the temporary table.   If a CTE were truly a “results set” (as some authors and speakers have presented it), then we should only see SalesOrderHeader  and SalesOrderDetail accessed once, just as with the temporary table, and it should have a similar IO cost.  They’re not.  Conclusion: a CTE is not a temp table or stored results set.

    In this particular case, I’ve structured my queries so that the CTE was a less efficient way to access the data. There are times, however, that the compiler can take advantage of the CTE structure and create a more efficient way to access the data.

    Here are two (oversimplified) queries to illustrate this point:

    1.  
    2.  
    3.   /* CTE Version */
    4. WITH SalesData as
    5. (
    6.     SELECT sd.ProductId
    7.          , SalesYr  = YEAR(sh.OrderDate)
    8.          , AvgPrice = Avg(UnitPrice)
    9.          , AvgOrderQty =AVG(OrderQty)
    10.       FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    11.       JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    12.         ON sh.SalesOrderID = sd.SalesOrderID
    13.     GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    14. )
    15.  
    16. SELECT ProductId
    17.     , AvgPrice
    18.     , AvgOrderQty
    19.   FROM SalesData
    20. WHERE SalesYr = ’2006′;
    21.   /* Aggregate Data in Temp Table */
    22.  
    23. SELECT sd.ProductId
    24.      , SalesYr  = YEAR(sh.OrderDate)
    25.      , AvgPrice = Avg(UnitPrice)
    26.      , AvgOrderQty =AVG(OrderQty)
    27.   INTO #TmpSales
    28.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    29.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    30.     ON sh.SalesOrderID = sd.SalesOrderID
    31. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    32.  
    33.  
    34. /*Return Sales Info */
    35.  
    36. SELECT ProductId
    37.     , AvgPrice
    38.     , AvgOrderQty
    39.   FROM #TmpSales
    40. WHERE SalesYr = ’2006′

    Along with their Statistics IO:

    1. ***CTE***
    2. (132 row(s) affected)
    3. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4. Table ‘SalesOrderDetail’. Scan count 1, logical reads 285, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    5. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    6.     
    7. ***Temp Table***
    8.   
    9. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    10. Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    11. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    12.  
    13. (613 row(s) affected)
    14.  
    15. (132 row(s) affected)
    16. Table ‘#TmpSales’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The temporary table approach still pays the cost of aggregating all the data first and it has the same IO cost as the first sample query.  With the CTE, however, the compiler is able to take advantage of the WHERE clause when it’s expanding the query.  As a result, it aggregates less data and uses about half the IO that’s involved in creating and reading from the unfiltered temporary table.   The text showplan reveals this application of the WHERE clause:

    1. StmtText
    2. —————————————————————————————————————————————————————————————————————————————————————-
    3.   |–Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019]/CONVERT_IMPLICIT(money,[Expr1018],0) END, [Expr1006]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1020]/CONVERT_IMPLICIT(int,[Expr1018],0) END))
    4.        |–Hash Match(Aggregate, HASH:([sd].[ProductID]) DEFINE:([Expr1018]=COUNT(*), [Expr1019]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[UnitPrice] as [sd].[UnitPrice]), [Expr1020]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[OrderQ
    5.             |--Merge Join(Inner Join, MERGE:([sh].[SalesOrderID])=([sd].[SalesOrderID]), RESIDUAL:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[SalesOrderID] as [sd].[SalesOrderID]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [
    6.                  |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [sh]),  WHERE:(datepart(year,[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[OrderDate] as [sh].[OrderDate])=(2006))
    7.                  |–Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [sd]), ORDERED FORWARD)

    So which approach is better, the CTE or temporary table?  As always, it depends on your data and your use.  For a little help deciding, why not check out the rest of the posts in this month’s TSQL-Tuesday?


  8. T-SQL Tuesday #18-CTEs: What a CTE is Not

    TsqlLogo

    A common beginner’s misconception about Common Table Expressions (CTEs) is that they are a real result set, like those produced by a temporary table or table variable. In fact, the opposite is true: they’re really just a way to simplify and encapsulate your code. For this month’s TSQL-Tuesday, focused on CTEs, I want to illustrate this difference with an example of how a (non-recursive) CTE can be both more and less efficient than a temporary table at accessing data.

    Let’s start with a simple query comparing the sales volume and average price between a given year and the previous year expressed as both a CTE and a derived table:

    1.  
    2. /* CTE Version */
    3. WITH SalesData as
    4. (
    5. SELECT sd.ProductId
    6.      , SalesYr  = YEAR(sh.OrderDate)
    7.      , AvgPrice = Avg(UnitPrice)
    8.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    9.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    10.     ON sh.SalesOrderID = sd.SalesOrderID
    11. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    12. )
    13.  
    14. SELECT s1.ProductId
    15.     , s1.SalesYr  as year1
    16.     , s2.SalesYr  as year2
    17.     , s1.AvgPrice as Year1AvgPrice
    18.     , s2.AvgPrice as Year2AvgPrice
    19.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    20.   FROM SalesData s1
    21. INNER JOIN SalesData s2
    22.     ON s1.Productid = s2.ProductId
    23.    AND s1.SalesYr = s2.SalesYr -1
    24.   
    25. /* Derived Table Version */
    26. SELECT s1.ProductId
    27.     , s1.SalesYr  as year1
    28.     , s2.SalesYr  as year2
    29.     , s1.AvgPrice as Year1AvgPrice
    30.     , s2.AvgPrice as Year2AvgPrice
    31.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    32.   FROM (
    33.           SELECT sd.ProductId
    34.              , SalesYr  = YEAR(sh.OrderDate)
    35.              , AvgPrice = Avg(UnitPrice)
    36.           FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    37.           JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    38.             ON sh.SalesOrderID = sd.SalesOrderID
    39.         GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    40.        ) s1
    41.   INNER JOIN
    42.     (
    43.         SELECT sd.ProductId
    44.              , SalesYr  = YEAR(sh.OrderDate)
    45.              , AvgPrice = Avg(UnitPrice)
    46.           FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    47.           JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    48.             ON sh.SalesOrderID = sd.SalesOrderID
    49.         GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    50.     )             s2
    51.     ON s1.Productid = s2.ProductId
    52.   AND s1.SalesYr = s2.SalesYr -1

    Both the CTE and the derived table generate the same execution plan:

    image

    Statistics IO for both queries is also identical:

    1.  
    2. (347 row(s) affected)
    3. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4. Table ‘SalesOrderDetail’. Scan count 2, logical reads 2480, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    5. Table ‘SalesOrderHeader’. Scan count 2, logical reads 1372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    So, from an execution standpoint, the derived table and CTE are essentially the same. Now, consider the same query using a temporary table:

    1.  
    2.   /* Aggregate Data in Temp Table */
    3. SELECT sd.ProductId
    4.      , SalesYr  = YEAR(sh.OrderDate)
    5.      , AvgPrice = Avg(UnitPrice)
    6.   INTO #TmpSales
    7.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    8.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    9.     ON sh.SalesOrderID = sd.SalesOrderID
    10. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    11.  
    12. /*Return Sales Info */
    13. SELECT s1.ProductId
    14.     , s1.SalesYr  as year1
    15.     , s2.SalesYr  as year2
    16.     , s1.AvgPrice as Year1AvgPrice
    17.     , s2.AvgPrice as Year2AvgPrice
    18.     , s2.AvgPrice/s1.AvgPrice as Year2Change
    19.   FROM #TmpSales s1
    20. INNER JOIN #TmpSales s2
    21.     ON s1.Productid = s2.ProductId
    22.    AND s1.SalesYr = s2.SalesYr -1

    Along with its query plan:

    image

    And Statistics IO:

    1. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    2. Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    3. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4.  
    5. (613 row(s) affected)
    6.  
    7. (347 row(s) affected)
    8. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    9. Table ‘#TmpSales’. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that in both the CTE and derived table versions, SalesOrderHeader and SalesOrderDetail are each accessed twice and the aggregations are each calculated twice.  The temporary table version accesses each table once, aggregates the results, and then uses the smaller temporary table to produce the final results.  As a result, the CTE version is nearly twice as expensive as the temporary table.   If a CTE were truly a “results set” (as some authors and speakers have presented it), then we should only see SalesOrderHeader  and SalesOrderDetail accessed once, just as with the temporary table, and it should have a similar IO cost.  They’re not.  Conclusion: a CTE is not a temp table or stored results set.

    In this particular case, I’ve structured my queries so that the CTE was a less efficient way to access the data. There are times, however, that the compiler can take advantage of the CTE structure and create a more efficient way to access the data.

    Here are two (oversimplified) queries to illustrate this point:

    1.  
    2.  
    3.   /* CTE Version */
    4. WITH SalesData as
    5. (
    6.     SELECT sd.ProductId
    7.          , SalesYr  = YEAR(sh.OrderDate)
    8.          , AvgPrice = Avg(UnitPrice)
    9.          , AvgOrderQty =AVG(OrderQty)
    10.       FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    11.       JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    12.         ON sh.SalesOrderID = sd.SalesOrderID
    13.     GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    14. )
    15.  
    16. SELECT ProductId
    17.     , AvgPrice
    18.     , AvgOrderQty
    19.   FROM SalesData
    20. WHERE SalesYr = ’2006′;
    21.   /* Aggregate Data in Temp Table */
    22.  
    23. SELECT sd.ProductId
    24.      , SalesYr  = YEAR(sh.OrderDate)
    25.      , AvgPrice = Avg(UnitPrice)
    26.      , AvgOrderQty =AVG(OrderQty)
    27.   INTO #TmpSales
    28.   FROM AdventureWorks2008R2.sales.SalesOrderHeader sh
    29.   JOIN AdventureWorks2008R2.sales.SalesOrderDetail sd
    30.     ON sh.SalesOrderID = sd.SalesOrderID
    31. GROUP BY sd.ProductId,YEAR(sh.OrderDate)
    32.  
    33.  
    34. /*Return Sales Info */
    35.  
    36. SELECT ProductId
    37.     , AvgPrice
    38.     , AvgOrderQty
    39.   FROM #TmpSales
    40. WHERE SalesYr = ’2006′

    Along with their Statistics IO:

    1. ***CTE***
    2. (132 row(s) affected)
    3. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    4. Table ‘SalesOrderDetail’. Scan count 1, logical reads 285, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    5. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    6.     
    7. ***Temp Table***
    8.   
    9. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    10. Table ‘SalesOrderDetail’. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    11. Table ‘SalesOrderHeader’. Scan count 1, logical reads 686, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    12.  
    13. (613 row(s) affected)
    14.  
    15. (132 row(s) affected)
    16. Table ‘#TmpSales’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The temporary table approach still pays the cost of aggregating all the data first and it has the same IO cost as the first sample query.  With the CTE, however, the compiler is able to take advantage of the WHERE clause when it’s expanding the query.  As a result, it aggregates less data and uses about half the IO that’s involved in creating and reading from the unfiltered temporary table.   The text showplan reveals this application of the WHERE clause:

    1. StmtText
    2. —————————————————————————————————————————————————————————————————————————————————————-
    3.   |–Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019]/CONVERT_IMPLICIT(money,[Expr1018],0) END, [Expr1006]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1020]/CONVERT_IMPLICIT(int,[Expr1018],0) END))
    4.        |–Hash Match(Aggregate, HASH:([sd].[ProductID]) DEFINE:([Expr1018]=COUNT(*), [Expr1019]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[UnitPrice] as [sd].[UnitPrice]), [Expr1020]=SUM([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[OrderQ
    5.             |--Merge Join(Inner Join, MERGE:([sh].[SalesOrderID])=([sd].[SalesOrderID]), RESIDUAL:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[SalesOrderID] as [sd].[SalesOrderID]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [
    6.                  |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [sh]),  WHERE:(datepart(year,[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[OrderDate] as [sh].[OrderDate])=(2006))
    7.                  |–Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [sd]), ORDERED FORWARD)

    So which approach is better, the CTE or temporary table?  As always, it depends on your data and your use.  For a little help deciding, why not check out the rest of the posts in this month’s TSQL-Tuesday?


  9. Meme Monday: I’ve Got 99 Problems…

    In this month’s Meme Monday, Tom LaRock challenges us to list nine things that can go wrong with SQL that aren’t a result of disk issues.  Here are mine:

    1. Selecting from a view returns an error because an underlying table was changed.
    2. “String or binary data would be truncated” error on insert.
    3. Parameter sniffing.
    4. High CXPACKET activity because of an incorrect MAXDOP setting.
    5. Unnecessary query compilation/recompilation because code was written as Database..Table instead of explicitly specifying a schema. (e.g. Databse.dbo.Schema).
    6. Deadlocks.
    7. ODBC Timeout in the client.
    8. “Cannot generate SSPI context”.
    9. Using IDENTITY_INSERT  on an identity column creating duplicate values (because the column doesn’t have a unique constraint).
    On a really bad Monday, I can probably find nine problems in ten minutes!

  10. Preventing PsPad from Opening the Last File Used on Startup

    I recently started using PsPad as a general purpose text editor. It’s freeware, portable, and offers a lot of great editing features such as syntax highlighting, bracket matching, and regular expressions.  It also has a macro recorder and can be scripted in VBScript and JScript.

    One thing I didn’t like was that the application defaulted to opening the last open file or project on startup. It’s not obvious how to change the setting because the it isn’t listed as a startup setting. Here’s where to find it:

    On the Settings menu, click Program Settings to open the Program Settings property sheet.

    image

    Select the Files and Dirs settings to open the file preferences:

    image

    Change the Save Work/Desktop setting from “In PsPad opened files only” to “Don’t restore files”.

    image

    Click OK and restart PsPad.  It will now open to an empty workspace.


  1. 1
  2. Next ›
  3. Last »