DBPedias

Your Database Knowledge Community

Lee Everest

  1. What three things brought you here?

    I was notified of a database geek meme going around by some of the well-known SQL gurus (Paul Randal, Tom LaRock, Grant Fritchey, Tim Mitchell, and others) and since I was invited I thought that I would go ahead and participate as well.  This one asks “What three things or events brought you here?”, so here’s my entry.

     

    Microsoft Visual Basic 4.0 training

    I was actually a finance guy in another life after doing some other odds-and-ends after college, nothing to write home about. (Or even acknowledge, really).  I studied computers and got a degree in information systems, but chose for whatever reason not to go that route. COBOL, SAS, and JCL were the tools, and the IBM 360 IBM 3081 were the playgrounds. I wasn’t a great programmer or anything; all of the best ones were coming out of the computer science department, and not from the school of business.  I was decent though, but remember fighting the computer into the late hours of the night, doing assignments and trying to get programs to run. Little did I know that I’d someday end up right back in the field. 

    While working for this finance company, I took notice of a guy by the name of Kevin Hill, you might have heard of the fellow, who was a colleague of mine, a really sharp computer guy, and a good friend. Kevin is a DBA at Terremark, formerly DataReturn, and works for super-MVP Jason Massie.  Wow, hard to believe that Kevin and I have been friends for almost 20 years.  He used to do some really neat database stuff as far back as 1995, while I was into office automation-type programming.  We used these tools to get our jobs done efficiently and had a lot of fun back then.  I happened to almost transfer to another state to work for their home office and take a true IT job, but turned down the offer.  It’s difficult for us Native Texans to leave our home state, isn’t it?

    The next best thing that awaited me was doing more of what I did in Dallas, and in order to do it even better I asked the branch manager if I could go to VB 4.0 training to learn more about programming Visual Basic.  He had seen (and liked) some of the stuff that I did - tweaking of finance models, reporting, various tools - so agreed, wrote the check for 2 g’s, off I was and running.  Geeking with Microsoft Access and Excel, I never thought that doing Visual Basic 4.0 and VBA would lead me to the world of relational databases, and watching Kevin do some really great stuff, I was drawn more to rows and columns more than the procedural world.  I had even built a web site and made web pages back in 1997, but for some reason HTML didn’t feel right.

     

    image 

     

     

     

    Don Bishop and Match.com

    I left a really good job, took a pay cut, and got a chance to go full-time in IT.  A little-known company by the name of the www.oneandonlynetwork.com owned www.oneandonly.com, was a thriving match and dating site based in Dallas back in the mid-90’s that no one really knew about, yet was booming in the early days of the internet.  Run by a super sharp group, Will, Eva, and Nelson Bunker, and others, they hired me and eventually gave me the opportunity to be a DBA; had they not seen any potential, I would probably be some business analyst somewhere hating my job.

    Later while there, a gentleman by the name of Don Bishop, who I refer to as my mentor, was hired on to take over the responsibilities of the database team.  Don had been in computers almost fifty years, had owned his own business, was very IT savvy, and programmed and built systems for years.  Under his close direction and training, I gained an incredible amount of knowledge within a very short period of time both in SQL Server development and administration.  (Too bad a lot of the DBA candidates that I interview today didn’t have a ‘mentor’.  They lack the skills and intangibles that take years to obtain because they don’t have someone to learn from).

    One and Only Network sold the dating site to CitySearch.com, a USA Networks company, and merged it with Match.com.  Today, Match thrives and is still the leading web site for finding that someone special.  Don, in his 70’s, is retired now for the most part, although he worked up to last year and tells me that he would consider a contract position if it were close to his home.  A brilliant IT and database technician, Don could to this day run circles around half of the DBAs in Dallas, and working with him was an inspiration and challenged me to greater heights.  I went on to take c and c++ courses at the community college level, and eventually obtained an advanced degree in the field.

     

     

    Software Architects

    Moving to Software Architects – SARK – a consulting company that did custom application development was a major step in my career. Too bad the company was sold by another firm, because the place was really special.  The thing that the move did for my career was increase greatly my ability to deliver, exposed me to some great developers and programmers, and taught me how to work on a team to deliver much within tight deadlines.  SARK wasn’t a “body shop”, but rather a consulting firm staffed with top-notch individuals that delivered custom applications. Getting to work with these folks in a number of settings brought my skill-set to an even greater level, and for this I am thankful.

     

    Lee Everest

  2. Stored Procedure performance using “sp_” prefix – Myth or fact?

    Geeking with sp_.

     

    WHAT’S IN A NAME?

    We have all heard that prefixing SQL Server stored procedures with “sp_” is a no-no; you have probably read the many reasons not to do this, but the most familiar ones to developers are that 1) Microsoft uses the “sp_” naming convention for their system stored procedures and so this should be avoided, 2) a procedure prefixed with “sp_” might conflict with one of their procedures and thus may never be called, 3) the SQL engine may first look to the master database for an “sp_” stored procedure before looking at the particular user database where the procedure call originated which might cause a delay, 4) these procedures might not upgrade cleanly, and 5) other resolution problems. There are probably more – maybe you have some to add - but these are the ones that I am familiar with.

    THAT’S WHAT SHE SAID

    But, do we really know this to be truth, or do have we simply taken someone’s word on it, treated it as sacred cow, or simply restated it for years it because it sounds good? Interestingly, Microsoft or Books Online specifically, issues only the following in their help section on creating stored procedures:

    We strongly recommend that you not use the prefix sp_ in the procedure name. This prefix is used by SQL Server to designate system stored procedures. For more information, see Creating Stored Procedures (Database Engine).

    In the Naming Stored procedures section of Books Online they add the following to the above:

    We recommend that you do not create any stored procedures using sp_ as a prefix. SQL Server uses the sp_ prefix to designate system stored procedures. The name you choose may conflict with some future system procedure. If your application uses nonschema qualified name references and your own procedure name conflicts with a system procedure name, your application will break because the name binds to the system procedure, not your own.

    So here’s my question…has anyone actually ever tested this to find out for themselves? I certainly haven’t investigated this closely, and I’ve been writing stored procedures for a long time. If you answered yes, your findings didn’t make Google!, at least that I could see anyway. So, I’ve decided to find out if this is really fact or myth.

    TEST SCENARIO TO FIND OUT IF IT REALLY MATTERS

    For my testing cases, I have chosen to the following rules and constants for my scenario to test the performance of an sp_ stored procedure to one named proc_:

    • Use a script from a previous blog to insert records into a table.
    • Wrap it in a loop, place it in a stored procedure.
    • Call three processes from sqlcmd, asynchronously, via a batch file.
    • Use separate procedures, one prefixed sp_ and with proc_.
    • Use DBCC FREEPROCCACHE, DBCC DROPCLEANBUFFERS for each run.  
    • Truncate table for each run.
    • Make sure log/data files have plenty of space beforehand so that no growth will occur during run.
    • Alternate runs in a logical order
    • Use implicit transactions
    • Do not use any TSQL query hints

    Using the above list, I was able to create a very intensive process which, through calculation and perfmon, revealed between 2500 and 5000 transactions per second.  I ran this over a minute and one half period, give or take, to insert several hundred thousand rows into a single table. Below are some of the results, in seconds:

     

    image

    Figure 1. Raw data of sp_ vs proc_

     

    Figure 1 is a listing of the runs included in the test.  These raw numbers have been plucked out of SQL Profiler; I chose not to use perfmon because, while I could have recorded the results to a log file, I reasoned that the number of records inserted into my table is constant, so I could easily find transactions/second by simply dividing them by the time it took for the batch to complete. Doing a few simple averages I come up with the following:

     

    image

    Figure 2. Analysis of Figure 1.

     

    Again, the three entries in my results come from each of the procedures fired three times asynchronously via sqlcmd from a batch file (asynchronously so they would all start at the same time and finish nearly the same time).  The results from Figure 2 indicate that both the average times and average trans/sec are considerably faster using proc_ than sp_.  The time to completion for sp_ as a percentage of proc_ ranges from roughly 14% to 23% slower.  The average run-time in milliseconds across all runs was 134.72 to 114.07 seconds (432000 inserts), or 18% faster, and an average trans/sec of 3830 to 3230, which is roughly 16% more transactions/sec.  The sp_ on average was slower and performed fewer transactions given the time frame that it took to perform the inserts.  Another interesting statistical view in Figure 3.  I have created a conditional tail expectation, or CTE in statistical-speak, on both procedures’ 24 runs.

     

    image

    Figure 3. Conditional Tail Expectation statistical view 

    This particular analysis orders each of the runs and then allows me to find the expectation if I were to run these at any given point in time. (Twenty-four in my sample is somewhat small, but I will go ahead and use here for demonstration purposes).  For 50% of the time that I run each procedure, I am sure that sp_ will run in 105 seconds, and proc_ will run in 97 seconds.  For approximately 80% of the time, I am sure that sp_ will take 123 seconds to run, but proc_ will only take 107 seconds to run, and finally, for nearly 95% of the time that I run either procedure, sp_ will take about 130 seconds, but proc_ will only take 111 seconds; this is nearly 15% faster given the average of all runs preceding.  We can conclude, therefore, that for this test a procedure prefixed with sp_ was clearly slower than a procedure prefixed with proc_ by somewhere in the neighborhood of 15%, on average.

    CONCLUSION

    From our test, we can see that we should not use sp_ for naming stored procedures. Why would anyone want to do this anyway? Remember that if you prefix your stored procedure with sp_ and place it in the master database, you can call it from any database context without having to change the context of the batch or script;  I guess this is handy, but since I have been writing procedures I’ve really never had to do this that I can recall.  Maybe I have used it somewhere, but I don’t remember.

     

    Lee Everest

    -----------------------------

     

    Interesting factoid – usp_ came from the Wrox set of books circa 2001. Ugh.

    happy_new_year

     

    Code: http://www.texastoo.com/images/sp_test.zip

  3. Plagiarism and SQL Server blogs, articles...

    If it's not your work, it's not your work.


    I never paid much attention to the several articles out on the web about plagiarism with respect to SQL Server, blogs, etc., that is, until it happened to me. About a year ago I did a really cool blog on an interesting idea, and a month or so later, there it was in a national publication. Very nice, right?  You probably won’t notice this until a piece of work that you author gets either copied and pasted, or regurgitated by someone else.  Truth be told, in my estimation it’s not flattery.  Spending hours and hours on a concept to put out on a blog only to have it used by someone else, especially on a big national site or outlet, really sucks.  That’s why I always, always put links in the references section on my work where applicable when I get an idea from someone else. And, I challenge anyone to look at something that I did on my blog to tell me that it is either not original work, or work that has a link to give credit where credit is due.  It's the right thing to do.

    The SQL Server community is strong, much stronger than others out there today. The involvement of many is huge, the MVP program is great, and the countless number of sites and blogs helping others in our field is unmatched.  Folks spend far more time than myself either travelling, speaking, writing, etc. for really nothing more than the satisfaction of lending a helping hand, learning, or assisting others to become better.  I appreciate this – after all, I had a mentor teach me the ropes back in the 1990’s who had fifty plus years of experience in computers.  You think I had an edge over the others because of this?  Oh yeah, so much that one year my salary doubled because of his great training.  And now that the older guys such as he have gone on to retirement, it’s this same community who include people like myself and the folks on the many sites (sqlserverpedia.com, sqlservercentral.com, sqlmag.com, others) that collectively share in training, mentoring, and researching for the new guys coming up, hopefully to give them assistance or an edge down the road.  And believe me, they need it.  I have been interviewing people at my current consulting gig, a global billion dollar company, the past month helping them find ETL folks, and 99% of the resumes that I read represent a candidate who does not have the necessary skills or competency to perform the job.

    If you are unsure whether to reference someone or something (even Books Online), do it.  Not only will you lend credibility to your posting or article, but will show the professionalism and courtesy toward others. And that’s what it’s all about really. Professionalism.  Are you a hack, or are you someone that is a trained and educated writer who understands the very basic tenets and concepts of writing and authoring? When I got my master’s degree, this notion was deeply instilled into the student. We had to sign an agreement saying that we would go out of our way to do take the steps necessary to reference other’s work.  This is the difference that demonstrates a master’s level writer and competency to one with the skills of a high-schooler, or less actually. Finally, if others find out later that you did not cite someone else's work, you've just made yourself look like a complete idiot in the eyes of many.

     

    Lee Everest

     

     

     

  4. Table Variables – still a mystery

    Geeking with table variables. Again.

     

     

    Table variables have always been somewhat interesting to work with when developing in SQL Server. I remember when they came out in SQL Server 2000 we were all excited about trying them out, something new from SQL 7 that mainly sparked our interest I suppose. I mostly remember that there were a slew of KB articles, books, magazine articles, etc. each with various opinions and proclamations about them, such as table variables are faster than temp tables (or slower), or that there is less locking, blocking, or recompiles than temp tables, that they’re somehow in memory where temp tables are not, that they could not be indexed (or could be), or that you should use them for fewer rows while keeping # tables for the larger data sets…the list goes on and on and on. You remember some of these I know, and most of the time information was in conflict.

    All of this had made them somewhat of a mystery. I began dorking with them this go-round because I wanted to 1) figure out what kind of object that they really were, with an idea that they were some sort of a temporary table, and 2) attempt to disprove an earlier fact or “myth” that you could not build statistics on them, knowing that you could in fact index them, either with primary key or unique constraint. (Many think that just because there are no statistics on them that an index is useless without them by the way, which is positively untrue, has been documented widely, and probably worth posting on in the future).

    So, let’s figure out if we can answer these two questions. Run the following script, and do it in CTL + D (Grid) mode since that’s how I’ll be referencing the output. Essentially the script creates a temporary table, inserts rows, and then creates a table variable and inserts rows. Then, because table variables are run-time objects – meaning that they’re destroyed at the completion of the batch and not when closing a connection as are local temporary tables - we need to capture the metadata about them on the fly. We’ll attempt to put this data in a table called dbo.Stats, and then try to gather other information about them. This will allow us to answer questions one and two, above.

     

     

     

    SET NOCOUNT ON

    USE Test

    IF OBJECT_ID('tempdb..#tab') IS NOT NULL

           DROP TABLE #tab

          

    CREATE TABLE #tab (id char(200))

    INSERT #tab VALUES ('A')

    INSERT #tab VALUES ('A')

    INSERT #tab VALUES ('A')

     

    GO

     

    DECLARE @tbl TABLE (TableVarID bigint IDENTITY  , TableVarString char(467))

     

    DECLARE @id INT

    SET @id=1

     

    INSERT INTO @tbl (TableVarString) VALUES ('A')

    INSERT INTO @tbl (TableVarString) VALUES ('B')

    INSERT INTO @tbl (TableVarString) VALUES ('C')

    INSERT INTO @tbl (TableVarString) VALUES ('D')

    INSERT INTO @tbl (TableVarString) VALUES ('E')

     

    --'Creating Stats table'

    IF OBJECT_ID ('Stats') IS NOT NULL

           DROP TABLE Stats

     

    CREATE TABLE Stats

                  (ID INT IDENTITY (1,1), TBL SYSNAME, OB AS OBJECT_ID(TBL))

     

    USE tempdb

    INSERT test..Stats (tbl)

    SELECT TABLE_NAME

    FROM INFORMATION_SCHEMA.TABLES

    WHERE TABLE_NAME LIKE '%#%'

     

    USE Test

    SELECT * FROM Stats

     

    DECLARE @stat varchar (8000) , @ObjIdInQuestion int

    SELECT @id = MIN(id) FROM Stats

    SELECT @stat = tbl FROM Stats WHERE id=@id

     

    DECLARE @str varchar (max)

           ,@help varchar (999)

     

    WHILE (@id IS NOT NULL )

    BEGIN

           SELECT OBJECT_NAME(OBJECT_ID) as ObjName, @stat as TableName,

                  * from tempdb.sys.indexes  WHERE OBJECT_ID =

                               (select id from tempdb.sys.sysobjects where name = @stat)

          

           SELECT @ObjIdInQuestion = OBJECT_ID

                  from tempdb.sys.indexes  WHERE OBJECT_ID =

                               (SELECT id FROM tempdb.sys.sysobjects WHERE name = @stat)

                 

           DECLARE @str1 varchar (max)

           BEGIN TRY

                  SET @str1 = 'create statistics xxx'+ left(@stat, 10)+' on ' + @stat + '(id)'

                  PRINT 'Creating stats for ' + @stat

                  EXEC (@str1)

           END TRY

          

           BEGIN CATCH

                  PRINT '>>> Error occured creating stats:!'

                  SELECT ERROR_NUMBER() as Error_Messg

           END CATCH

          

           SET @str = 'UPDATE STATISTICS ['+ @stat + ']'

           BEGIN TRY

                  USE tempdb

                  EXEC (@str)         

                  IF EXISTS (SELECT * FROM sys.stats WHERE name = 'xxx'+@stat)

                  BEGIN

                           PRINT '>>> Successful update statistics!'

                  END

           --USE Test

           END TRY

          

           BEGIN CATCH

                  SELECT @str + ': not found'      

                  SELECT ERROR_NUMBER() as ErrorNum

                               ,ERROR_MESSAGE() as ErrorMessage

                               ,ERROR_PROCEDURE() as ErrorProcedure

                               ,ERROR_STATE()       as ErrorState

                               ,ERROR_SEVERITY() as ErrorSeverity

                               ,ERROR_LINE() as ErrorLine

           END CATCH

          

           SELECT 'SYSINDEXES',*

           FROM tempdb.sys.sysindexes where id=@ObjIdInQuestion

                 

           SELECT 'SYSOBJECTS',*

           FROM tempdb.sys.sysobjects where id=@ObjIdInQuestion

                        

           SELECT @id = MIN(id) FROM test.dbo.Stats WHERE id > @id

           SELECT @stat = tbl FROM test.dbo.Stats WHERE id=@id

    END

     

    SELECT *

    FROM INFORMATION_SCHEMA.COLUMNS

     

    --EXEC sp_msforeachtable 'USE tempdb update statistics ?' 

    GO

     

     

    image

    Figure 1.  Table variables are in fact ‘temp tables’

     

    image

    Figure 2.  Our temporary table in sys.sysindexes, sys.sysobjects

     

     

    image

    Figure 3.  Our table variable and object id, also a heap

     

    image

    Figure 4. Error attempting to create statistics, and error ‘not found’ when updating statistics.

     

    image

    Figure 5.  Success  creating statistics on temp table, and error when creating on table variable (message tab in SSMS)

     

    image

    Figure 6.  Our table variable as viewed in INFORMATION_SCHEMA.COLUMNS

     

     

    From Figures 1-6, we can see that table variables are in fact ‘temp’ tables; they appear to be the same when viewing in system tables and views, and display all of the similar information that a table or temporary table might. You can see that I create and then update statistics on our temp table in the loop, and then the second pass in the loop, it fails to when attempting same on the table variable. Also interesting to note, while table variables have object_ids, they don’t get created in the same manner. Crack open SQL Profiler and verify – you will not see an object created for table variables during the execution of this script. I found this odd that Microsoft doesn’t show this.

    Note that in Figure 3 we clearly see that our table exists, but when performing create statistics, the table is not found, as shown in figure 4.  For fun, remove the comment on the exec sp_msforeachtable; you'll find that the script dies because the table does not exist. Again, we know that it exists from figures 1 and 6 because at runtime we interrogate system views and discover its existence.  This is very clear in the INFORMATION_SCHEMA.COLUMNS view because we find the column names and data types that we created.

    In conclusion, it is easy to see from the screenshots that table variables, while obviously not exactly the same as temporary tables, are represented in SQL Server as a temporary table, and have all of the same metadata in system tables as their counterparts, including naming type, object_ids, type, minlen, dpages, etc.  From this test, we can also see that you cannot create any statistics on one of these tables to go along with constraints and indexes that we can place on them. Go ahead and add PRIMARY KEY CLUSTERED after TableVarID bigint IDENTITY  in the script. Also, they do in fact reside in tempdb, so maybe we answered three questions here?  Finally, I found nothing that differentiates them from temporary tables inside of any of the sys tables. 

    While nothing terribly exciting was uncovered here (such as my previous geeking with them when I realized that INSERT EXEC is now possible, beginning with SQL 2005), we at least clarified a few things…or maybe we didn’t.

     

    Lee

     

    ----------------

    Is it time for Thanksgiving Dinner already?  I'm starving.

    smilie_thanks_017

     

     

     

  5. Practical Bit mask in SQL Server

    Effective, simple, and fun.

     

    I always enjoy doing some ‘practical bitmasking’ whenever I can, if for no other reason than to show someone that there are some cool situations to use the technique. If you’ve happened to catch some of my other blogs or presentations on bitmasking, you know that it’s very slick and a great way to solve a wide array of problems.  Would you find a place to use this in every day coding? Of course not. Every month?  I doubt that very seriously as well. However, when you do find a neat place to use it, you can show off your computer science savvy to your friends and family and appreciate the greatness of masking.

     

    I just ran into a 1000 + line trigger that interrogated, for each country_id, 18 or 20 attributes, checking for the existence of a flag in each attribute. If any of the attributes – at least one – was turned on, then the extract_flag should be set to true. This flag indicated that, yes, let’s go ahead and download the 200 MB file from the AS/400.  A worthwhile endeavor since if no flag which represented a process was set to true, then the download was skipped because the file wasn’t being used at that time. Not a bad setup for sure, except for the 1000+ line trigger of course; who wants to download a bunch of files that will sit there and not be used?

     

    Let’s set up the data and take a look at the process similar to mine at work:

     

     

    CREATE TABLE Config

           (ID int PRIMARY KEY NOT NULL

           ,Extract_Flag bit

           ,A_val bit DEFAULT 0

           ,B_val bit DEFAULT 0

           ,C_val bit DEFAULT 0

           ,D_val bit DEFAULT 0

           ,E_val bit DEFAULT 0

           ,F_val bit DEFAULT 0

           ,G_val bit DEFAULT 0

           ,H_val bit DEFAULT 0

           )

    GO

     

    INSERT INTO Config (ID) VALUES (100)

    INSERT INTO Config (ID) VALUES (200)

    INSERT INTO Config (ID) VALUES (300)

     

    GO

     

    UPDATE Config

    SET B_val = 1

    WHERE ID=100

     

    UPDATE Config

    SET C_val = 1

    WHERE ID=100

     

    UPDATE Config

    SET H_val = 1

    WHERE ID=100

     

    UPDATE Config

    SET A_val = 1

    WHERE ID=200

    SELECT * FROM Config

     

     

     

     

    So, similar situation here – I’ve got some flags for an ID, and if any one of them are set to true, make sure that the Extract_Flag attribute is also set to true. In the case of ID 300, none of the flags are turned on, meaning that each of the processes have been shut off, so the source file download can be skipped. The nasty trigger which I speak so highly of was an FOR UPDATE trigger that checked all of the attribute flags, and if all were zero, then shut the Extract_Flag  off.  The update to true for any of the flags wasn’t the nasty part; rather, it got ugly when any of the flags were set to zero, which meant that all of the other flags needed to be checked to determine if the Extract_Flag needed to be set false.

     

    To test this, you could issue the following statement, one for each ID:

     

    UPDATE Config

    SET Extract_Flag =  A_val|B_val|C_val|E_val|F_val|G_val|H_val

    FROM Config

    WHERE ID=100

     

     

     

     

     

     

     

    Here’s the easy way to do it!  Run the following and see if you don’t like this one better than the above: 

     

     

    IF OBJECT_ID ('Config', 'U') IS NOT NULL

           DROP TABLE Config

    GO

     

    CREATE TABLE Config

           (ID int PRIMARY KEY NOT NULL

           ,Extract_Flag as A_val|B_val|C_val|E_val|F_val|G_val|H_val

           ,A_val bit DEFAULT 0

           ,B_val bit DEFAULT 0

           ,C_val bit DEFAULT 0

           ,D_val bit DEFAULT 0

           ,E_val bit DEFAULT 0

           ,F_val bit DEFAULT 0

           ,G_val bit DEFAULT 0

           ,H_val bit DEFAULT 0

           )

    GO

     

    UPDATE Config

    SET B_val = 1

    WHERE ID=100

      

    UPDATE Config

    SET C_val = 1

    WHERE ID=100

     

    UPDATE Config

    SET H_val = 1

    WHERE ID=100

     

    UPDATE Config

    SET A_val = 1

    WHERE ID=200

     

    GO

    SELECT * FROM Config

     

     

    For more info, check out the links below.

     

    Thanks for reading!

    Lee

     

     

    -----------------------------

    I used a Neti Pot today – thought I would be the first to ever drown from one.

     

     

     

     

    Links:

     

    ·         Introduction to Bitmasking and bit manipulation

    ·         Bitmasking using the SQLCLR

    ·         Sweet search engine similar one formerly used at Match.com (includes c# project and all code/data)

     

     

  6. Dynamic Pivot in TSQL

    Making 'dynamic' even more dynamic

    In this month’s SQL Server Magazine (11/2009) an article was featured on the concept of dynamic pivot.  This was of interest to me because, since I peruse the stats on how people arrive at this site, I have seen several searches from Google! looking for dynamic pivot examples.  Truth be told, I haven’t run into a situation to dynamically pivot anything in a production environment, so I’m not sure what all of the drooling is all about. But, since there is a demand, I thought I’d throw my .02 cents in, and I’ll use SQLMag’s example as my base reference.

    The author did a good job with the write-up - my hat is off to him. However, glancing at the article one can see that concept could be carried further…very little is “dynamic” about his code, so l invite you to look at it and mine and decide for yourself. First, let’s set up the tables and data. I have included both his [1] DDL as well as that from another blog posting elsewhere on this site (to demonstrate that it will work in a dynamic fashion, all things being equal).

     

    DROP TABLE DemoTable

    GO

    CREATE TABLE DemoTable

    (

     AssignmentName VARCHAR(255),

     StudentName VARCHAR(255),

     Grade INT

    )

    GO

    INSERT INTO DemoTable

     Values('Assignment1', 'John Smith', 70)

    INSERT INTO DemoTable

     Values('Assignment1', 'Jane Smith', 80)

    INSERT INTO DemoTable

     Values('Assignment1', 'Paul Smith', 75)

    INSERT INTO DemoTable

     Values('Assignment2', 'John Smith', 50)

    INSERT INTO DemoTable

     Values('Assignment2', 'Paul Smith', 65)

    INSERT INTO DemoTable

     Values('Assignment3', 'Jane Smith', 70)

    GO

     

     

     

     

    DROP TABLE PlantMetrics;

    GO

    CREATE TABLE PlantMetrics

    (PlantID int

    ,TransDt datetime

    ,Metric char (4)

    ,Value nvarchar (10)

    ,CONSTRAINT pk_plmt PRIMARY KEY CLUSTERED (PlantID, TransDt, Metric)

    );

    GO

    INSERT PlantMetrics

    VALUES (1,'7/1/09', 'MHRS', '24')

    ,(1,'7/1/09', 'WTHR', 'Sunny')

    ,(1,'7/1/09', 'TEMP', '88')

    ,(2,'7/1/09', 'TEMP', '94')

    ,(3,'7/2/09', 'DOWN', '7')

    GO

     

     

     

     

    Below is his code [1]:

    DECLARE @SQL as VARCHAR (MAX)

    DECLARE @Columns AS VARCHAR (MAX)

     

    SELECT @Columns=

           COALESCE(@Columns + ',','') + QUOTENAME(AssignmentName)

    FROM

    (

           SELECT DISTINCT AssignmentName

           From DemoTable

    ) AS B

    ORDER BY B.AssignmentName

     

    SET @SQL='

    WITH PivotData AS

    (

           SELECT

             AssignmentName,

             StudentName,

             Grade

           FROM DemoTable

    )

    SELECT

           StudentName,

           ' + @Columns + '

    FROM PivotData

    PIVOT

    (

           SUM(Grade)

           FOR AssignmentName

           IN (' + @Columns + ')

    ) AS PivotResult

    ORDER BY StudentName'

     

    EXEC (@SQL)

     

    A few things to point out: don’t use SUM for the aggregate. MIN or MAX works better, because sometimes the value that is the subject of the pivot is not an integral (integer, numeric, etc.) type.  Also, I see some columns that could be replaced here to make the entire code dynamic.  Finally, here is my code:

     

    DECLARE @SQL as VARCHAR(MAX)

    DECLARE @NewColumnList AS VARCHAR(MAX)

    DECLARE @table table (SourceColumnNames varchar (100))

    DECLARE @TargetTable varchar (100)

    -- Column names

    DECLARE @PivotColumn varchar (100)

    DECLARE @AnchorColumn varchar (100)

    -- Pivot

    DECLARE @PivotValue varchar (100)

    DECLARE @Operation char (3)

     

    SET @TargetTable = 'DemoTable'

    SET @PivotColumn = 'AssignmentName'

    SET @AnchorColumn = 'StudentName'

    SET @PivotValue = 'Grade'

    SET @Operation = 'MAX'

     

    INSERT @table

    EXEC ('SELECT ' + @PivotColumn + ' FROM ' + @TargetTable )

    SELECT @NewColumnList = COALESCE(@NewColumnList + ', ','') + QUOTENAME(SourceColumnNames)

    FROM

    (

           SELECT DISTINCT SourceColumnNames

           FROM @table

    ) AS B

     

     

    SET @SQL = '

    WITH PivotData AS

    (

    SELECT

           ' + @PivotColumn + ',

           ' + @AnchorColumn + ',

           ' + @PivotValue + '

    FROM ' + @TargetTable+ '

    )

    SELECT

           ' + @AnchorColumn + ',

           ' + @NewColumnList + '

    FROM PivotData

    PIVOT

    (

           '+@Operation+'(' + @PivotValue + ')

           FOR ' + @PivotColumn + '

           IN (' + @NewColumnList + ')

    ) AS PivotResult

    ORDER BY ' + @AnchorColumn + ''

     

    EXEC (@SQL)

    GO

     

    You can see that there is nothing terribly difficult or complicated about this.  Change up a few things, add a few variables, and we’re set! Notice my naming convention for the variables used to help in understanding this thing.  I named these for no reason other than they seemed to make sense to me, considering Books Online didn’t attempt to help out much with the components of the syntax. Here’s the skinny on the var names:

    ·          @TargetTable – the name of the dynamic table

    ·          @PivotColumn –the row name that will be used to generate columns in the pivot

    ·          @AnchorColumn – the column that stays put and is unchanged in its’ purpose in life

    ·          @PivotValue –the metric that in the pivot table

    ·          @Operation – probably shouldn’t have even used this, could have hard coded it MAX or MIN because sometimes values can’t be summed or averaged.

    In closing, do we need ‘dynamic’ pivot (or unpivot for that matter)?  Not sure, but I know that I haven’t needed it in production for either of my past two clients. Who knows…maybe someone can use something like this; as always let me know.

    Happy dynamic pivoting!

    Lee

     

    -----------------------------

    If I start walking to Las Vegas tonight, I think I can make SQLServerConnections before its over

     

     

     

     

    [1]  Sellers, M. (2009). Pivoting the Dynamic Way. SQL Server Magazine, November 2009. Retrieved 11/6/2009 from http://www.sqlmag.com/Article/ArticleID/102722/Pivoting_the_Dynamic_Way.html

     

     

     

     

  7. Deleting Data Quickly in SQL Server, Part 4

    Taking care of the transaction log (or not).

     

    Introduction

    It’s great to see that the response from the first three in this series of blogs indicates that they have been help to some folks. I continue getting searches and questions of this simple yet effective technique; there are many SQL Server installations in need of removing old records, and I’m glad I have been of assistance. At some point, an OLTP system will eventually need to be purged – maybe the data is now stored in a data warehouse, or maybe it’s simply past the point of requirements for an online system or online reporting goals.

    Purging old data

    I suggest once again that you go back and read the first three blogs for deleting data quickly to get an idea of what we’re doing here, and why. In a busy online system, simply hitting F5 to delete rows is a big no-no, and the method that I suggest avoids the problems associated with doing such. By now, your system probably has records ten years of age or older. Is this data being used? Are the indexes representing this data (that you rebuild every week) being used? How much smaller would your database backups become by removing these old records? How fast would your queries be by not having to spend the I/O or logical/physical reads traversing this data? Finally, how much disk space is being consumed by data that is not being used all of the time? These are the types of questions that should be answered when deleting old data in your system.

    Deleting data and the transaction log

    The TSQL DELETE statement is a fully logged operation, and when you delete data you will automatically get two unavoidable operations that coincide with the removal of rows; you delete data in SQL Server, and the data will get written to the transaction log. And, if you delete data, you’ll get the statement wrapped by an implicit transaction, unless you specify a BEGIN TRANSACTION with your delete. There is no way around this, regardless of the recovery model that is set for the database. That is, there is no “turning off” this action in any version of SQL Server.

    So, you would think that if you delete millions of rows you would need lots of space to handle the records until a checkpoint has occurred and a SHRINKFILE has been run. Wrong. Using the technique that I present in this series, there is no need to fiddle with the transaction log, and certainly, no need to include some elaborate and unnecessary code within your delete statement logic to back up the log with x number of deletes to y number of files as the operation progresses. Why? Because the statement that I have presented is wrapped within the aforementioned implicit transaction; simply deleting will not. Let’s take a look at what I’m describing here.

    In the following (Figure 1), I delete three million rows with a delete statement. While the phone rings that users are blocked, timeouts are occurring, and the boss wants someone’s head, you’ll see that the transaction log has gotten big – much bigger than when we started. We start off with 256MB but end up with almost 3GB worth of transaction log. Figure 2 is a SQL Profiler shot of what occurred when this statement was issued.

    clip_image002

    Figure 1. Simple delete on 3 million rows

    clip_image004

    Figure 2. SQL Profiler view of Figure 1.

    Notice that there is an implicit transaction that was mentioned that is wrapped around this delete; using this statement the log will grow depending on the number of rows deleted and the size of the dataset that is removed.

    Now, let’s look at my suggested delete statement using in the same light as above:

    clip_image006

    Figure 3. Our delete on 3 million rows

    clip_image008

    Figure 4. SQL Profiler view of Figure 3.

    Notice how each small chunk is wrapped in an implicit transaction – these nice, small chunks to delete serve several purposes, as we have discussed previously, including the one described here, which helps keep the transaction log from growing in some fashion that could cause problems when the delete occurs.

    Next Up

    I’ll have another or two additions in this series, including how to monitor the chunk size to gauge the relative pain or lack thereof that the delete is causing to your online users. Please let me know how this is working for you in your production system.

    Lee

    ------------------------

     

    "Look, when I was a kid, I inhaled frequently. That was the point."  President Obama

    HalloweenSmiley

     

     

  8. Handling Deadlocks in SQL Server 2005/2008

    Are Deadlocks becoming a thing of the past?

     

    Unless I missed it, I didn’t see his example

    There was a blog post [1] awhile back by super-stealth DBA dude Brad McGehee that described the notion of using the new Try/Catch syntax for handling deadlocks. Great idea and good blog posting! The only problem that I found, however, was that he never gave an example that actually showed the benefit of using such. So I thought that I’d go ahead and create a scenario that shows a deadlock, and then use the feature that he mentions to handle one. After I wrote mine, I peeked in Books Online (BOL) to find that they did a somewhat similar example, and per usual, I liked mine better than theirs; easier, simpler, and more straightforward IMHO, so I hope you find same. Let’s look at the code.


    Set up the tables and data

    Run the following in SQL Server Management Studio (SSMS)

     

    DROP TABLE ##test1, ##test2
    go
    CREATE TABLE ##test1 (id int PRIMARY KEY CLUSTERED)
    GO
    CREATE TABLE ##test2 (id int PRIMARY KEY CLUSTERED)
    GO
    INSERT INTO ##test1 VALUES (1)
    INSERT INTO ##test2 VALUES (1)
    GO
    

     

    Instructions next up. Please follow carefully or suffer severe flogging

    Here are the very simple and easy-to-understand instructions for all of the below code examples. Please read them – the order of execution is important, and if you don’t execute them in this order using the particular window as directed, the demos won’t work for you. When doing our own testing, either commit or rollback transactions often, or the code also won’t work.

    • Run SCRIPT 1 in an SSMS window
    • Run SCRIPT 2 in another SSMS window
    • Run SCRIPT 3 back in the window that you ran in step (1)

     

    Lunch is served at 1205 today

    The example, a variation from WardyIT’s blog [2], is an example of a “Hold and Wait” type of deadlock, where one process is waiting on another while that process attempts to gain access to a held process. Run example 1 (using the instructions given) and you should get a 1205 deadlock error. Remember – Script 1-> window 1, script 2 –> window 2, script 3 –> window 1.  Awesome! Great job.

     

    --EXAMPLE 1
    --SCRIPT 1 in window 1
    
    BEGIN TRAN
        UPDATE ##test1 SET id = 1
    
    --SCRIPT 2 in window 2
    BEGIN TRAN
        UPDATE ##test2 SET id = 2
        UPDATE ##test1 SET id = 2
        
    --SCRIPT 3 in window 1
    BEGIN TRAN
        UPDATE ##test2 SET id = 1

     

    Msg 1205, Level 13, State 51, Line 3

    Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

     

    TRY/CATCH is catching!

    Thank goodness for TRY/CATCH in TSQL; we now have a facility to handle errors thoroughly. Run Example 2 now in order to see that this logic does not work with the old syntax; I have deliberately commented the code to simple BEGIN/END blocks. You’ll notice that SQL Server punts when it receives an error, and what is happening is that the constructs cannot trap errors and essentially no exception handling is occurring.

     

    --EXAMPLE 2
    --SCRIPT 1
    BEGIN TRAN
        UPDATE ##test1 SET id = 1
    
    --SCRIPT 2
    DECLARE @err int
    
    BEGIN TRAN
    Retry:
        IF (@@TRANCOUNT=0)
            BEGIN TRAN
            
        BEGIN --TRY                
            UPDATE ##test2
            SET id = 2
    
            UPDATE ##test1
            SET id = 2
         END --TRY
         
        BEGIN --CATCH
            SET @err = @@ERROR
         
            IF (@err = 1205)
            BEGIN
                PRINT 'RETRY OCCURRED!'
                ROLLBACK
                SET @err = 0
                GOTO Retry
            END
        END --CATCH    
    COMMIT    
    
    --SCRIPT 3
        
    DECLARE @err int
    
    BEGIN TRAN
        Retry:
            IF (@@TRANCOUNT=0)
                BEGIN TRAN
                
            BEGIN --TRY    
                UPDATE ##test2
                SET id = 1
             END --TRY
             
            BEGIN --CATCH
                SET @err = @@ERROR
             
                IF (@err = 1205)
                BEGIN
                    PRINT 'RETRY OCCURRED!'
                    ROLLBACK
                    SET @err = 0
                    GOTO Retry
                END    
            END --CATCH
    COMMIT 


    Deadlocks – a dying breed

    Run example three now and check out the results. Notice that I use a GOTO. If you don’t like one, don’t use one. For me, though, this made sense. You’ll see that BOL created a cheesy loop for their transactions…think I’ll stay clear of all that, plus mine is shorter anyway. BOL also says that “GOTO statements cannot be used to enter a TRY or CATCH block”, but I noticed that it didn’t say a darn thing about exiting them, which is exactly what we do here.

    --EXAMPLE 3
    --SCRIPT 1
        BEGIN TRAN
        UPDATE ##test1 SET id = 1 
    
    --SCRIPT 2
    DECLARE @err int
    
    BEGIN TRAN
        Retry:
            IF (@@TRANCOUNT=0)
                BEGIN TRAN
                
            BEGIN TRY                
                UPDATE ##test2
                SET id = 2
    
                UPDATE ##test1
                SET id = 2
             END TRY
             
             BEGIN CATCH
                SET @err = ERROR_NUMBER()
             
                IF (@err = 1205)
                BEGIN
                    PRINT 'RETRY OCCURRED!'
                    ROLLBACK
                    SET @err = 0
                    GOTO Retry
                END
            END CATCH    
    COMMIT    
        
    
    --SCRIPT 3    
    DECLARE @err int
    
    BEGIN TRAN
        Retry:
            IF (@@TRANCOUNT=0)
                BEGIN TRAN
                
            BEGIN TRY    
                UPDATE ##test2
                SET id = 1
             END TRY
             
             BEGIN CATCH
                SET @err = ERROR_NUMBER()
             
                IF (@err = 1205)
                BEGIN
                    PRINT 'RETRY OCCURRED!'
                    ROLLBACK
                    SET @err = 0
                    GOTO Retry
                END    
            END CATCH
    COMMIT    
    
    

    Conclusion. Did you catch all of this, there buddy, or did you even try?

    Deadlocks can be handled a number of different ways. There are four different types of deadlocks, and we looked at and resolved a common “Hold and Wait” type in this blog posting. Here I showed a code snippet that created a deadlock, and then another one that took the same code and solved the deadlock problem using a retry label for a GOTO statement and a TRY/CATCH block to handle the error. In previous versions of SQL Server you couldn’t “trap” the error – SQL Server simply raised an internal error before your code could kick in, and one or more of the threads became a deadlock victim. The beauty of TRY/CATCH is that these constructs serve as a true mechanism for error handling, and allow the SQL dev to create robust code for trapping and handling errors.

    What I did here was a “proof of concept”  in tech speak – I am not at a client who is at this point suffering from deadlocks, so I cannot fully test this code. However, give this concept a shot if you’re dying from the dreaded deadlock; some variation of this might help you out.

    Lee Everest

    ----------------------------

    No, deadlocks are not a popular hairstyle of many NFL and NCAA football players.


    EM Banana Rastafarian

     

     

     

    [1]. McGehee, B. (2005). Using TRY/CATCH to Resolve a Deadlock in SQL Server 2005.  Retrieved on 9/22/2009 from http://www.sql-server-performance.com/articles/per/deadlock_sql_2005_p1.aspx

    [2]. WardyIT (2005). Creating a deadlock. Retrieved on 9/22/09 from http://wardyit.com/blog//blog/archive/2005/12/12/65.aspx 

     

  9. Deleting Data Quickly in SQL Server, Part 3

    Part three of this series of posts on deleting data will cover monitoring the rows that get deleted, and part four goes over server monitoring (locking, waits, etc.). If you haven’t gone over the first two in the series, you might want to now. Make sure and run these scripts in the order that they appear, and for crying out loud please don’t use any of these scripts on any production table until you have tested and tweaked the process.  I don’t want a mailbox full of nasty complaints that you’ve completely removed all of the data from all of the tables in your production backups and that you have requested my home address.  This is a proven technique that I have used in production, but it is only a replica – I most likely left out something important.

     

     

    STEP (1) -   CREATE OUR DATA
    Let’s create our test data by running the script that we previously had.  I have this set to 25 million rows.

     

    SET NOCOUNT ON

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

     

    IF OBJECT_ID ('tempdb..##t1') IS NOT NULL

        DROP TABLE ##t1;

    GO

    CREATE TABLE ##t1 (a int IDENTITY NOT NULL PRIMARY KEY, b char (10))

    go

    SET ROWCOUNT  25000000

    INSERT INTO ##t1 (b)

    SELECT 'a'

    FROM

        (SELECT 1 as Col

        FROM sys.syscolumns A, sys.syscolumns b) as Records

    GO

    SET ROWCOUNT 0

     

     

     

     

    STEP (2)  - BUILD OUR CONTROL TABLE

    This table, dbo.Chunksize, will hold values that will drive our delete script and help out with our reporting and monitoring.  The trigger has been placed on the table allow only one row in the table;  if more than one row were inserted, the calculations for reporting would be jacked.

     

     

    IF OBJECT_ID('Chunksize') IS NOT NULL

           DROP TABLE Chunksize

    go

    CREATE TABLE  Chunksize

            (id int identity (1,1) NOT NULL

            ,Chunk int

           ,Target int

           ,TotalBeginningRowcount int

           ) 

    GO

    CREATE TRIGGER  [tr_OneRowOnChunksize]

    ON Chunksize

    FOR INSERT

    AS

    BEGIN

           DECLARE @Rows int

     

           SELECT @Rows = COUNT(1)

           FROM Chunksize

     

           IF (@Rows != 1)

           BEGIN

                  RAISERROR ('Error!!! One row allowed to avoid invalid results!', 15, 2)

                  ROLLBACK

                  RETURN

           END

    END

    GO

       

     

    -- Initial insert of Chunksize
    INSERT Chunksize (Chunk, Target, TotalBeginningRowcount)

    SELECT   1 as InitialChunk -- Can change this later as script runs

            , 7000000 as Target -- Total # of rows we want to delete.

            , (SELECT COUNT (1) FROM ##t1)  -- Beginning rowcount in table.

    GO

     

    SELECT * From Chunksize

     

    GO

     

     

    This is the initial seed to the table; a value of 1 for the initial chunk of data to be deleted in each iteration of the loop, a value of our target number of rows that we wish to delete, and a beginning count of rows in the table for later reporting calculations. So, let’s say that you had a table of 150 million rows, and you wanted to delete 30 million rows, you would put the target to 30 million, and set the initial chunk to whatever you like (I usually start with 1, and then gradually move this up.  If you start too big, you could begin blocking users right off that bat…something to avoid).

     

     

    STEP (3)   -  Build the reporting table and Seed the Results table

     

    Run these two scripts together to seed the reporting table.


    IF OBJECT_ID ('Results') IS NOT NULL

           DROP TABLE Results

    go

    CREATE TABLE Results (id int identity (1,1) NOT NULL

                         ,Chunk int

                         ,TimeRecord datetime

                         ,Rows decimal (10,2)

                         )

    GO

     

    INSERT INTO Results

    SELECT 1 as Chunk, GETDATE(), (SELECT COUNT(1) FROM ##t1)

     

    GO

      

     


    STEP (4) – Start the delete statement   

     

    Do this step in a separate window.

     

    DECLARE @Rows int

            ,@Chunksize int

                  ,@Target int

     

    SELECT @Rows=0

    SELECT @Chunksize = Chunk FROM Chunksize

    SELECT @Target = Target FROM Chunksize

     

    There:

    DELETE TOP (@Chunksize) FROM ##t1

    SELECT @Rows = @Rows + @@ROWCOUNT

    IF (@Rows < @Target)

    BEGIN

        SELECT @Chunksize = Chunk FROM Chunksize WHERE id=1

           GOTO There

    END

    GO 

     

     

     

    STEP (5) -  Report Monitoring and adjusting the Chunk size

     

    Do this step in a separate window. 

     

    DECLARE @Chunk int

    SET @Chunk=1

     

    IF @Chunk!= (SELECT Chunk FROM Chunksize)

           UPDATE Chunksize SET Chunk=@Chunk

     

    -- DELETE Results

    IF (DATEDIFF (ss,  (SELECT MAX(TimeRecord) FROM Results), GETDATE())> 3)

           or ((SELECT COUNT(1) FROM Results)  =0)

    BEGIN

           INSERT INTO Results

           SELECT @Chunk, GETDATE(), (SELECT COUNT(1) FROM ##t1)

           PRINT 'Sucess! Record added to dbo.Result table'

    END

    ELSE

           PRINT 'Error!  Attempting to poll Chunksize too often'

     

     

    WITH CTE (Number_Of_Seconds, Rows_Deleted, Rows_Deleted_Per_Second

                  ,Chunk_Size, Target, TotalBeginningRowcount, CountInTable)

    AS (

           SELECT DATEDIFF(ss, b.TimeRecord, a.TimeRecord) as Number_Of_Seconds

                  , b.Rows-a.Rows as Rows_Deleted

                  ,(b.Rows-a.Rows )/DATEDIFF(ss, b.TimeRecord, a.TimeRecord)

    AS Rows_Deleted_Per_Second

                  , b.Chunk as Chunk_Size

                  ,(SELECT Target from Chunksize) as Target

                  ,(SELECT TotalBeginningRowcount FROM Chunksize)

    AS TotalBeginningRowcount

                  ,(SELECT COUNT(1) FROM ##t1)  as CountInTable

           FROM Results b

           INNER JOIN Results a

                  ON b.id = a.id-1

           )

    SELECT Chunk_Size

           , AVG(Rows_Deleted_Per_Second) AS Average_Rows_Deleted_Sec

           , AVG(Rows_Deleted_Per_Second)*60 as Average_Rows_Minute

           , 1000000/AVG(NULLIF(Rows_Deleted_Per_Second, 0)) 

    AS [Delete:Seconds_Per_Million]

           , 1000000/AVG(NULLIF(Rows_Deleted_Per_Second,0))/60

    AS [Delete:Minutes_Per_Million]  

           , Target - (TotalBeginningRowcount - CountInTable)

    AS RemainingRowsFromTarget

           , CAST((Target - (TotalBeginningRowcount - CountInTable) )

                  / (AVG(Rows_Deleted_Per_Second)*60 ) AS int) AS MinutesTillDone

    FROM CTE

    GROUP BY Target, Chunk_Size, CountInTable, TotalBeginningRowcount, CountInTable

     

     

    As our script runs and deletes rows, you will run this script to insert our status into the Results table and modify our chunk size if desired.  The attributes for the Results table are:


                Chunk – the chunk size that has been set, initially to 1, that will be the parameter for TOP

                TimeRecord – this is  used in calculations to follow

                Rows – the number of rows that are in the table, again for more calculations

     

    The CTE reports on how the delete is doing, at what rate the deletes are doing grouped by chunks size, and the estimated number of rows being deleted, as well as the remaining time left in the “MinutesTillDone” column.  After a few seconds, I run this and receive the following:

      

     

     

     

     

    We can see (barely) that for a chunk size of 1, the average rows deleted/sec is around 667, with the other calculations based on this valued. MinutesTillDone is based on the delete size, target, and how many rows are in the table as can be seen in the CTE.  Let’s set the chunk in script # 5 to 5, and view the results:

      

     

     

     

    Let’s up it to 10, 100, 1000, and finally, 5000.

      

     

     

     

     

    Notice that with an increase in chunk size comes a big increase in Average Rows Deleted/ Sec and Minute, and a big decrease in MinutesTillDone, all as we would expect. In part four, we will look to see how we can know what to set this chunk size to in order to minimize the blocking of user processes.

     

    Lee

     

    ------------------------------

     

     Is "America's Next top Model" on tonight???

     

    Smileys

  10. Deleting Data Quickly in SQL Server, Part 1. Their fast-ordered delete vs. mine

    Sharing some old tricks for deleting rows in a table

    I was looking around the SQL Blogs again this evening and stumbled on SQLCat and a blog entitled Fast ordered delete [1] from last May.  I won’t paste their code – get it at the link - but apparently the MySpace boyz were in Redmond, and had a delete problem;  they were faced with deleting large amounts of older data and were having an issue – apparently it took too long on large tables.  This is a problem that I faced a long, long time ago, and did a lot of experimentation and work on it.  At a local, large retail company located here in the Dallas/Ft. Worth metroplex, they had production tables in the 60-100 million row range that needed data removed upon archival.  (Actually, the problem in this exercise is not the deletes themselves, but really is the blocking that the deletions can cause.  I’m very surprised that this dude Kevin Stephenson didn’t bring that up as a concern in his shop.  Chopping large amounts of data out of a table is a no-no for the most part, while taking nice chunks is best.  Maybe this is the reason that they choose the 10000 value. My part 2 of this blog series will show you how to maximize what I call the “chunk size” while avoiding the blocking).

    SQLCat’s solution was a fairly simple one, and that was to use a view with a TOP clause to delete a chunk of data.  Not bad IMHO. Their reasoning for doing this can be seen in the below image.  Using a delete via a derived table that represented 10,000 rows, the got this execution plan [2]:

    The results were the following:

     

    Table 't1'. Scan count 2, logical reads 30564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 1, logical reads 20152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
       CPU time = 78 ms,  elapsed time = 80 ms.

     

    Changing to a view with a TOP, the significantly reduced their time and read, since they didn’t have to hit the table twice (see above plan).  That plan look like so [3]

     

     

    The results were the following:

    Table 't1'. Scan count 1, logical reads 654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
       CPU time = 31 ms,  elapsed time = 75 ms.

     

    And now for mine. You may have seen something like this before, maybe not. The first time that I did was about 5 or so years ago and I thought that it was slick.  Yeah, forget the “best practice” garbage about not using goto. This is a script and not a stored procedure, so anything is fair game.

    set rowcount 1000
    declare @cnt int
    set @cnt=0
    label:
    delete from t1
    set @cnt = @cnt + @@rowcount
    if @cnt < 10000
        goto label

     

    I won’t past the statistics because for each iteration, several rows will be printed. Summing them, you will see if you run for yourself that while their logical reads were about 200 less, my time was 63ms…16% faster.  Not only that, but mine takes smaller chunks, and is much easier on the transaction log and blocking on other processes (I removed 1000 at a time and they 10000).  Uh, whassat you say?  Rowcount is deprecated? Very astute, you are, but no worries, because you should remember that DELETE, INSERT, AND UPDATE now can use TOP; simply change the code to this:

     

    declare @cnt int
    set @cnt=0
    label:
    delete top (1000) from t1
    set @cnt = @cnt + @@rowcount
    if @cnt < 10000
        goto label

     

    Running this one we get actually 840 reads and 64ms, so it’s comparable to the old one using ROWCOUNT, and, yes, still faster than theirs. Come back next time for part 2 and I’ll show you how we can 1) maximize the chunk size for deletion and 2) monitor the process while it’s running.

     


    Lee

    -------------------------

     

     

     

    “My mind is a raging torrent, flooded with rivulets of thought cascading into a waterfall of creative alternatives”

     

     

    References


    [1] Stephenson, K, and Kollar, L.  Fast Ordered Delete. Retrieved on 9/2/2009 from http://sqlcat.com/msdnmirror/archive/2009/05/20/fast-ordered-delete.aspx 

    [2] Stephenson, K. and Kollar, L. Fast Ordered Delete FBB4/Pic1_2. Retrieved on 9/2/2009 from http://blogs.msdn.com/blogfiles/sqlcat/WindowsLiveWriter/Fastordereddelete_FBB4/Pic1_2.jpg

    [3] Stephenson, K. and Kollar, L. Fast Ordered Delete FBB4/Pic2_2. Retrieved on 9/2/2009 from http://blogs.msdn.com/blogfiles/sqlcat/WindowsLiveWriter/Fastordereddelete_FBB4/Pic2_2.jpg

     

     

  1. « First
  2. ‹ Previous
  3. 4