DBPedias

Your Database Knowledge Community

Michael Swart

  1. Myself in 2004

    This is me in 2004

    9 years ago

    I’m in the center with the glasses

    This picture was taken in an electronics factory in Zhongshan, China.

    I’m not a big traveler and I never have been.  I thought the opportunity to see China was a once in a lifetime thing. In my case, I found myself making the trip three times courtesy of my employer. It wasn’t the adventure I thought it would be. Unlike most tourists to China, my memories don’t include much else other than hotels, ferries and factory locations like the one you see here. I remember counting the days until each trip was over.

    When I look at this picture, I remember a lot of things. I remember the anti-static lab coats and I’m reminded that I used to have more hair than I do now. The machine in front of me is meant to test a motherboard for the upcoming XBox 360. The guy on the right was an engineer, a hardware guy. My job was to install and support the software that collected test results and store them in a SQL Server database.

    In Over My Head?

    I remember the other people in this photo. It was an amazing assembly of talent. The people here were smart. But not just smart, they were smart and competent. They were ambitious and passionate about what they do and it’s rare to see that. I felt a little overwhelmed and I felt a little out of place. Everyone seemed so confident about what needed to be done. I guess I was no different. I knew what I had to do, but the confidence I showed was half acting.

    For the most part, I was lucky. I installed the software and waited for a problem to support. It’s a testament to the company I worked for that I was able to wait more than work. But it wasn’t all super-smooth sailing.

    Eventually I was asked to troubleshoot a burning issue and I wasn’t used to the pressure; hopefully it didn’t show. I felt like I was thrown into the deep end in order to learn how to swim. The head guy asked me to resolve something (and by the way Michael, the factory can’t operate until you do). The issues I faced were new to me and the pressure was on. Here’s a small subset of the kind of things I was asked to tackle:

    • Replication configuration issues. <bleah!>
    • Log files filling up (because replication’s log reader wasn’t operating properly)<ouch>
    • Concurrency bottlenecks. <just the beginning of the rest of my life>

    In my career up until then, I was always able to ask a more experienced colleague for help. Here was the first time where I was it. There were no other colleagues to call on. If I couldn’t crack this nut, it wasn’t going to get cracked. In this case it was me and books online vs. SQL Server 2000. I did eventually get through those days and it felt amazing to beat those issues. But during the crisis itself, there was some anxiety.

    Out of the Deep End

    I came away from those experience with a bit more confidence than I started with. It was the first time I thought, “I’m fine, I can handle this”. By the end of my trips to China, the confidence I was showing wasn’t acting any more. I wasn’t just solving such crises, I was avoiding them. And back in Canada, I was asked to be on-call overnight in order to field questions from others in China.

    I don’t believe there exists a training course anywhere that is equivalent to the confidence gained by solving these FIX IT NOW crises.

    And it didn’t stop there, these other things helped boost my confidence even further:

    • SQL Server released 2005 with features that made a DBAs life so much easier than when supporting 2000.
    • I left my software development job for a different job focusing on databases full time.
    • twitter and #sqlhelp happened. It made me feel like I had the world on call. I don’t use it as much as I used to, but it’s nice to now it’s still there if I need it.

    The Guy in the Picture

    I’ve showed this picture a couple times to others in the past year. Each time I was encouraged to blog about it. This is me doing that. I have a lot of stories I could tell about these trips. In fact, I wish now that I kept a journal. So dear reader, if sometime in the future, we find ourselves hanging out and have nothing to chat about, ask me about 2004 Michael. Until then…

  2. The Sch-M lock is Evil

    A necessary evil, but still evil. Why? Because it it won’t share with Sch-S and Sch-S is taken and held for absolutely everything (Yes, even your NOLOCK queries). And that can lead to some interesting concurrency problems. Let me explain.

    Sch-M

    Sch-M is an abbreviation for Schema Modification. It refers to a kind of lock that is taken on a table/index or other object whenever you want to modify that object. SQL Server allows only one Sch-M lock on an object at a time. So if you want to modify a table, your process waits to takes a Sch-M lock on that table. Once that lock is granted, the modification is completed and then the lock is released.

    Sch-S

    Sch-S is an abbreviation for Schema Stability. It is a kind of lock that is taken on an object when a process doesn’t want that object to change its definition. It makes sense. If I’m reading a set of rows through a table, I don’t want a column to disappear on me half way through. SQL Server allows many Sch-S locks on a table.

    Sch-S vs. Sch-M

    But Sch-S locks are incompatible with Sch-M locks. This means that when you want to modify a table, you’re not granted a Sch-M lock immediately. You have to wait for everyone using that table to finish using it. You’re essentially put on hold until the existing queries complete and their existing Sch-S locks are released. This also means that while you’re waiting, every query who wants to begin using that table is waiting in line behind you too. Basically “Everybody outta the pool while the lifeguards change shifts.” But that’s usually acceptable right? Database schema modifications are a big enough change to require a maintenance window.

    Index Rebuilds Are Table Modifications

    It’s true, if you have the luxury of maintenance windows for your DB changes, you’ll be alright. But you also have to consider your database maintenance plans (automated or otherwise). Those plans can launch index rebuilds while the database is online. And all index rebuilds also count as table modifications and take Sch-M locks. An index rebuild has syntax like this:

    ALTER INDEX [PK_MyTable] ON [MyTable] REBUILD WITH (ONLINE=ON)

    Hopefully you’ve remembered that ONLINE=ON part. When you use that part, the index is rebuilt in the background and at the end of that processing time, a Sch-M lock is taken and released very quickly.

    But maybe you’re not so lucky. Maybe you’re not running 2012 yet and have an index that includes blobs. Or maybe you’re running on Standard Edition. In those cases you won’t be able to use the ONLINE=ON feature. In that case, the Sch-M lock is taken by the rebuild process and it’s held the entire time that index is rebuilt. During the rebuild, that index is now truly offline. No access for you.

    You Can Get Stuck

    Just like I did. A while ago, I was asked to help with this exact situation. An index rebuild had been running for hours it was offline and the Sch-M lock that was held was preventing anybody from using or even looking at that table. I was stuck between a rock and a hard place. I had to choose between letting the index rebuild complete (which could take hours) or cancelling the job (whose rollback could take hours). There was nothing I could do to avoid additional hours of downtime. There was another bit of irony in my situation. We didn’t care about the existing data in that table. A truncate table or drop/recreate table would have suited us just fine.

    … Like Really Stuck

    It occurred to me to try something out. What if I created an identical empty table with a different name. We didn’t need any data in the locked table. So using a new table could work. And because the locked table is only accessed by stored procedures, I can modify those procedures to use the new table instead of the locked one.

    Nope!

    For some reason, the ALTER PROCEDURE requires a Sch-S lock on the old table, the table it no longer refers to. The sprocs can’t operate or be modified without a Sch-S lock on that locked table. This sketch illustrates my situation.

    Make sure your index plans never attempt to rebuild big indexes offline.

    Extras

    Some relevant DBAReactions gifs:

  3. T-SQL Tuesday #41 – Presenting and loving it?

    T-SQL Tuesday LogoBob Pusateri hosts this months T-SQL Tuesday. The topic is Presenting and Loving it! You’ll notice that the topic of this blog post takes the exclamation mark and turns it into a question mark. The thing is that I’m not much of a presenter. I’ve presented twice in my life to groups who were not coworkers. Three times if you count my best-man speech for my brother.

    In Bob’s TSQL Tuesday invitation, he asks us “How did you come to love presenting?” and “When was the first time you presented and really loved it?” I’ll let you know when and if that ever happens.

    Take my SQL Server 2000 instances ... Please!

    But guess what? It turns out that I’m giving a talk to the Toronto SQL Server User Group tonight which will let me grow my presentation CV from two SQL talks to three. I bet no other blogger can brag that they’ve given 33% of their SQL Server Presentations on the same day as Bob’s T-SQL Tuesday about presentations.

    Advice

    So I’ve got a very short list of advice to give you today. I can only describe some of the things that I’ve done to prepare:

    • I follow Paul Randal’s advice at Configuring SSMS for presenting. I follow it to the letter. It’s easy to set up and makes SSMS readable.
    • I install Zoomit. Just in case. I don’t think I have any content that requires it, but you never know.
    • I installed SSMS Toolspack by Mladen Prajdić. The SQL Snippets feature (which are still way better than SSMS’s native snippets feature). Are great for keeping demo scripts at your fingertips. I have snippets for “d1″, “d2″, “d3″ and “d4″. Which correspond to the demos I plan to show.

    In Toronto?

    So are you in Toronto today? Got plans? Come on out to the user group tonight. I expect it will be pretty fun. I’m really comfortable with the topic and so I think it will be a blast. So when was the first time I presented and really loved it? Ask me again tomorrow.

  4. Checking Out Amazon Redshift

    In order to refresh my memory about what I learned in University, I’ve been watching a course that UC Berkeley made available on Youtube. It’s been a good course so far. Recently I got to the topic of logical fallacies. And so I’m reminded that the following is not a valid argument: Jimmy advises X; Jimmy profits if you follow advice X; Therefore, Not X. It’s not valid in the academic sense. In the practical sense, I have to remember to not distrust all commercials and marketing videos.

    But it’s hard, especially when Googling for “Big Data” or “Analytics”, I find it very difficult to find anything to read that’s both unbiased and useful. Maybe it’s because I’m still skeptical about any solution that is promoted by the people who stand to profit from following their advice (I’m trying not to discount their advice, I just squint at it a little).

    So when Amazon came out with a publicly available Beta for their new Redshift Datawarehouse service (accompanied by a slick marketing video), I decided to kick the tires. Here’s some notes I made:

    Amazon’s Redshift Commercial

    Their commercial is here. And I have to say it reminded me of an infomercial. You know the kind where they try to sell fancy mop gadgets by showing how ridiculously clumsy people can be with regular mops. The Amazon commercial is light on technical details but I don’t think it was meant for an audience like me and you. I made a conscious effort not to hold that against them.

    Warehouse in the cloud

    Having a warehouse in the cloud makes a lot of sense in some ways. Pay-as-you-go pricing is what you want when you prefer operational costs over capital costs. Many businesses who don’t have I.T. as a core competency will find this a pretty attractive alternative to running servers or an appliance themselves. But it can get expensive quick, Amazon advertises less than $1000 / Terabyte / year. But with upfront costs for reserved pricing and a two terabyte minimum. The smallest rate you can get away with is $3000 per year for three years. In practice it will likely be significantly more. I can’t see anyone choosing Redshift without doing due diligence on the pricing, but it’s probably not going to be dirt cheap.

    Connecting to Redshift
    Star Trek's Jean Luc Picard says "Very well Data, open a channel"
    Connections are made to Redshift only through ODBC (or JDBC) using Postgres drivers. Some challenges:

    • Picking my way through 32 bit vs 64 bit was tricky. Getting and using the right driver took some care.
    • Uploading through this connection is slow. So don’t try it this way. I mean it’s possible, but it’s simply not recommended. I learned this lesson not by reading through Amazon docs, but by attempting to run an SSIS job to funnel data into an ODBC destination. It was actually working, but it was impractical because it was so slow.

    Creating the warehouse schema wasn’t too hard: I scripted a subset of tables from an existing warehouse. I stripped indexes, constraints and identities. There were a couple syntax differences (int becomes integer, bit becomes boolean, and GO becomes semicolon) but it went pretty smooth. For Redshift, in lieu of indexes, you choose columns on each table for the distribution key and the sort key. Underneath the covers, Redshift uses ParAccel and so if you’re familiar with that, you’ve got a great head start. Otherwise, Amazon’s Developer Guide is a good start. I’d like to hear more from others who know more about modeling for Redshift; it seems like a regular star schema will work well here.

    Loading data is best through S3

    I created a SSIS package that takes data from my relational warehouse. It takes that data and dumps it into delimited files (Hat tip to @peschkaj for advice on delimiter values). Then I gzipped the whole thing. I loaded those gzipped files into Amazon S3 and loaded data into Redshift using those files. Initially, I was afraid of Amazon S3 (What? I have to sign up for and learn about another service?) but working with Amazon S3 was dead simple.

    Consuming data

    I connected a Tableau client to Redshift using an ODBC connection. This Tableau discussion tells me that’s the current way to connect Tableau and Redshift. There are quite a few SQL limitations imposed by the ODBC protocol. So the Tableau experience was not too smooth. Tableau has a new Redshift connector coming out in a matter of weeks which should clear these limitations.
    The point is that Amazon claims Redshift integrates seamlessly with a list of reporting apps (including Tableau). I believe it really will by the time Redshift is released, it’s just right now there’s a pretty big seam.

    Next?

    I’m going to get around to evaluating Google Big Query and of course Microsoft’s HDInsight. You might notice that lately, I find myself in a phase of learning (fun!) and so this post and the next few will be a description of my exploration.

  5. Follow up on Ad Hoc TVP contention

    Last month I posted PAGELATCH Contention on 2:1:103. It described a troubleshooting experience I had. I was troubleshooting a performance problem that was felt when applications issued very frequent ad hoc Table queries that used Table Valued Parameters (TVPs).

    I’ve recently learned some extra things from the SQL Server team at Microsoft that I wanted to pass on. So this post acts as a kind of update to that one.

    I know. I've drawn Columbo before, but it's my blog.

    Reproduction

    One thing I did was to come up with an application that reproduces the contention on 2:1:103 in the same way as production. I share that here now.

    What this reproduction does is launch fifty processes. Each process uses multiple threads to hit a db server with an ad hoc query that uses a TVP.  For the curious here it is: Download AdHocTVPContention.zip and then open readme.txt. This reproduction is a mashup of

    • A simple batch script which launches an executable 50 times at once. A technique I gleaned from Paul Randal in a demo of his.
    • A multi-threaded executable which sends queries asynchronously. A framework I first developed in a post I wrote on concurrency.
    • The ad hoc TVP query itself. It was a bit of a modification of my example at Table Valued Parameters, A Short Complete Example. The only difference is that the command is a query instead of a stored procedure.

    What I’ve Learned Since Then

    The Microsoft team helped me understand a few things and I want to share what I’ve learned. The main thing is that the particular version of SQL Server (including service pack level) matters a lot:

    SQL Server 2005 and Earlier

    TVPs weren’t supported then. (This post isn’t for you).

    SQL Server 2008 and R2

    In my earlier post, I dismissed a fix provided by Microsoft. Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. At the time, I dismissed it for a number of reasons:

    • The workaround listed on that kb article page did not help us.
    • The symptoms that were described at that issue did not match my own scenario (i.e. I wasn’t using table-valued functions).
    • The title mentioned R2 and I thought (mistakenly) that this fix was not applicable to SQL Server 2008.

    Microsoft helped me learn that although the symptoms in that fix differ from mine, the improvement in that fix involves a change to code that also improves my scenario. So with the fix applied, on my test environment, I saw that SQL Server could handle at least 6 times the volume of ad hoc queries than it could before (your mileage may vary).

    SQL Server 2012

    But there’s one other thing I noticed. When I ran the reproduction on a SQL Server 2012 environment, the process just flew through the workload! Temp table creation was no longer a bottleneck; there was no more contention on temp table creation. The Microsoft Engineer I worked with noticed this too and eventually it prompted a CSS blog entry. You can read all about this 2012 performance improvement at Temp table caching improvement for table valued parameters in SQL Server 2012

    Partner vs MVP

    I’m very lucky to have many investigation options available to me when I’m digging into a particular quirk of SQL Server behaviour. There’s the public options like #sqlhelp or dba.stackexchange or Microsoft support. And for me there’s some private options like the MVP mailing list. But this month, I feel lucky to work as a Microsoft partner. It’s through that connection that I learned so much.

    Too Long; Didn’t Read

    If you’re experiencing 2:1:103 contention because of too frequent ad hoc TVP queries then:

  6. Some Failed Blog Topics

    I list some blog topics that I don’t plan to complete.

    I have a rule of thumb that I try to follow when writing. If I can write something in a simpler way I do. If I can cut out words, sentences or whole paragraphs without altering the meaning, I do. Sometimes the thing I’m left with is so short, simple and straightforward that I wouldn’t even bother tweeting it.

    For example, a couple weeks ago, after a shortening revision, I wound up with “Index your tables properly while designing databases”. That’s not the title, that’s the whole thing. Something so short and simple doesn’t deserve a blog post and so neither did the long version of that post!

    So I’m cleaning house. I want to post a bunch of ideas that were never good enough on their own to make the cut.

    Knowledge Philanthropy

    Those two words sound great together don’t they? I’ve always felt like more of a knowledge consumer. In fact this blog you’re reading is an effort on my part to “give back”. But no matter what, I’ve always felt like I’ve received way more than I’ve given.

    Your Database is Not Perfect

    But it doesn’t need to be. It needs to be stable, scalable, and easy to maintain. It needs to meet any requirements you care to define. Anything after that is going to be a harder sell.

    Measuring the Effects of Fragmentation

    An interesting experiment I started, but it led me past DBA-land into SAN Admin land and I was not comfortable saying anything with authority here.

    Q & A for questions I made up:

    I must have written the following while half asleep and I have absolutely no idea what it could mean:

    ala Esquire: Fake… subtly bad advice “Jesus Bless You” Generic advice that gets suddenly personal: Why do delis always give pickles with sandwiches?” “Ordering Pheasant”

    Quotestring — Safe way to use Data as Code?

    No, I wouldn’t count on it.

    Hey bud!

    The following was inspired by comments at “Last Word On Scans” It was drawn for drawing’s sake.

    Wanna buy a query plan operator?

    Range Scans on Multicolumn Keys

    Say you have a table with an index on (LastName, FirstName) and you want to select all the rows which sort between “Lucas, George” and “Spielberg, Stephen”. I’ve never been able to find straightforward syntax that results in a query plan with a simple range scan.

    I’m a Paul White Fan

    Not very many of my friends and family hold up their side of the conversation when I talk about databases. But at the PASS Summit, there are hundreds of people who will! And I think that’s why I love it. I always intended to write about my experience at 2012′s summit, but I couldn’t think of anything meaningful to write. I knew that Paul White’s session was a highlight for me, but I didn’t have much else to say.

    paulwhitewanted

    On Being Competent

    Billy Joel described himself once like this:

    “… I consider myself to be an inept pianist, a bad singer, and a merely competent songwriter. What I do, in my opinion, is by no means extraordinary. I am, as I’ve said, merely competent. But in an age of incompetence, that makes me extraordinary.”

    That particular quote probably struck me at a time when I wasn’t feeling particularly perfect enough at what I do.

    UNDELETE

    Is here! Oh actually, april fools. It’s actually called Point in time restore, and it’s been available for a while. And the syntax is a little different…

  7. PAGELATCH_EX Contention on 2:1:103

    This blog post is meant to help people troubleshoot page latch contention on 2:1:103. If that’s what brought you to this page, then hopefully you find it useful. But first…

    A textbook entitled "Don't Panic"

    Initial Investigation

    Last week I was asked to help tackle a production outage. Queries were slow enough that the system was considered unavailable. And just like any other performance problem I started by having SQL Server tell me what was wrong with itself.

    • I first started with sp_whoisactive but it took about 30 seconds for it to return. Several queries were blocked by others and some of the lead blockers were waiting for a PAGELATCH_EX on 2:1:103
    • Then I checked on the most common waits using a query found at Paul Randal’s article: Wait statistics, or please tell me where it hurts. (I don’t have that page bookmarked, every time I’m in trouble, I google “Tell me where it hurts Paul Randal”). I learned that PAGELATCH_EX contention was our most serious wait type. Paul mentions that he sees this kind of contention most commonly on tempdb.
    • I followed a link from Paul’s article to Robert Davis’s Breaking Down TempDB Contention. A script there helped me to discover that while there was lots of tempdb contention, it was all on page 2:1:103, which is not PFS, GAM or SGAM.

    Finding More Info

    So I looked for more help.

    • Web Search: My work colleague found and then pointed me to Latch waits on 2:1:103? You are probably creating too many temp tables in SQL Server by Microsoft’s Matt Wrock who faced the same problem. If you found this page because you’re facing the same issue then stop reading this article and go read Matt’s first. He explains what’s going on better than I could. Think of this article as a kind of sequel to his article.
    • StackOverflow: (serverfault in this case) has TempDB contention on sysmultiobjrefs SQL 2005. The best answer there is Matt pointing back to his blog article.
    • Microsoft Support who helped us out, but maybe not as quick as we’d like. To be fair, the turnaround time that we would have been happy with was measured in nanoseconds.

    With that information it became clear to us that we were creating temp tables too often. And we were creating them in a way that made it impossible for SQL Server to cache. Did you know that? That SQL Server can cache temp tables? When a query is done with a temp table, SQL Server can truncate that table and give it to the next query to avoid having to create it again. Cached temp tables reduce tempdb contention including contention on this page. But as it turns out, SQL Server cannot cache temp tables from ad hoc queries.

    But who creates temp tables that often? We did, just by using a table valued parameter in a parameterized query. Since SQL Server began supporting table valued parameters (introduced in SQL Server 2008), we have been gradually moving towards this practice in lieu of sending xml to be shredded.

    … And More Info Including Some Other Links

    I can’t help including some extra articles I found on 2:1:103 contention.

    Robert Davis blogged about 2:1:103 contention in Tempdb Contention That Can’t Be Soothed. His advice is to remove statements in code like “SELECT … INTO”.  However, I believe that such statements contribute to PAGELATCH contention only if the statement is not part of a stored procedure (i.e. can’t be cached). I also think there are more common causes of 2:1:103 PAGELATCH contention than SELECT … INTO statements.

    But Robert did link to a demo by Paul Randal (an absolutely amazing 5 minute demo linked from a 2011 issue of his newsletter). Paul tells us

    • That “the SQL Team knows about this. It’s a known issue. Hopefully something will be done about it in one of the future releases.”
    • But unlike SGAM or GAM contention, there’s absolutely nothing you can do about this to spread the contention around.

    From a DBA point of view there’s not much that can be done. So as DB Developers, we have to find a workaround. But before I get to that, I want to mention some things that were not so useful.

    … And Some Bad Info

    There was some red herrings out there…

    • Add more tempdb files (nope, wrong kind of tempdb contention).
    • Many resources suggested enabling trace flag T1118. It’s a trace flag that eases tempdb contention, but not this kind.
    • Get faster tempdb disks? No, PAGELATCH contention is for in-memory copies of pages, not disk (that’s PAGEIOLATCH).
    • Tune the queries in question? Not a bad idea, but this problem is about the number of problem queries, not the performance of each.
    • A fix from Microsoft Support: Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. Oooh… so close, but we weren’t using table-valued functions. And the workarounds they list (disabling AUTO_UPDATE_STATISTICS) did not help. But maybe it might help you?
    • The Object:Created event. Whether traced with Profiler or collected with an Extended Events session, this event can report on created temp tables. Maybe I can use this event to tell me which queries are creating the tempdb tables. Nope, not this time! This event  has two drawbacks which make it useless for troubleshooting 2:1:103 contention:
      • The Object:Created event reports the creation of temp tables even when they’re cached (which don’t need a latch on 2:1:103).
      • The Object:Created event doesn’t report the declaration of any table variables (which may need a latch on 2:1:103).

    Strategy: So What Do We Do?

    Knowing is half the battle right? But that means we still have a lot of work to do. I’m going to recap what we know so far. We know that we have trouble when there are queries that:

    • are executed frequently
    • create temp tables (either explicitly or by declaring table valued parameters).
    • are not cached (Microsoft explains when temp tables are not cached. In my case, it was because they were ad-hoc queries)
    • require a page latch on 2:1:103.

    There is a performance counter that can track the all of the above (except maybe for that last bullet). It’s called Temp Tables Creation Rate and it’s found in the perf counter category “General Statistics”. Now this is a metric you can trust. We found that a high temp table creation rate was tightly correlated to the trouble we were seeing. So when troubleshooting, look at this performance counter (and leave the “Object:Create” event alone).

    So now what do we do? First, we must find the ad hoc queries that create these temp tables. Then, we have to put them in stored procedures so that the temp tables can be cached. Alternatively, we could reduce the need for creating them. It’s a workaround, but it’s what we’ve got.

    Finding Such Queries Is Difficult

    But here’s the hard part. In a high volume system, it’s difficult to identify exactly which queries are causing the most trouble. Microsoft support can go through tons of collected trace and performance data to try to find such queries, but it’s a long process. On our side, we looked at sys.dm_os_waiting_tasks:

    Select *
    From sys.dm_os_waiting_tasks

    And we saw all the contention on 2:1:103, but when we tried to look up the SQL text for it

    Select wt.*, st.text
    From sys.dm_os_waiting_tasks wt
    left join sys.dm_exec_requests er
        on er.session_id = wt.session_id
    outer apply fn_get_sql(er.sql_handle) st
    where wt.resource_description is not null

    The text was often unavailable. Basically, I’m guessing maybe the dmv’s I was using weren’t quick enough to tell me which queries were suffering from (or causing) contention on 2:1:103. So I decided to look through the cache for query candidates that might create temp tables. here’s what I came up with. It’s not a comprehensive list and there might be false positives, but it might be enough to go on. If you know your applications well, you can tailor the filters below to something more relevant for you.

    select cp.plan_handle, sql_handle, text, refcounts, usecounts
    from sys.dm_exec_cached_plans cp
    join sys.dm_exec_query_stats qs
        on cp.plan_handle = qs.plan_handle
    cross apply sys.dm_exec_sql_text(qs.sql_handle) st
    where (
           (
            st.text like N'%READONLY%'  -- possible table valued parameter (prepared queries)
            or st.text like N'%TABLE %' -- possible table creation
            or st.text like '%#%'       -- possible use of temp table
            or st.text like '%INTO%'    -- possible table creation
           )
           and objtype in ('prepared', 'ad hoc') -- temp tables not cachable with these
    )
    order by usecounts desc

    Going Forward

    Personally after helping identify and implement the workarounds. I’m doing a couple things:

    • I’m recommending that developers not create temp tables or declare table variables that cannot be cached. For now, this means we use stored procedures for any query that uses table variables or temp tables.
    • We now have thresholds on our performance tests which look at the performance counter Temp Tables Creation Rate.
    • I created a Microsoft Connect item. If you’re troubleshooting the same problem, head over there and let Microsoft know you’re having trouble too.
  8. Joins are Commutative and SQL Server Knows it

    I want to explore a common question I get from people who are getting up to speed on this whole SQL thing. At one time or another, we’ve all wondered whether we get any performance improvements by varying the order that we join tables together (and by joins I mean inner joins).

    The answer is no, so you can safely stop messing with the join order of your tables for performance reasons. So…

    Malcolm Reynolds of Firefly points a gun and says 'Quit your fussin'

    The point is that when SQL Server executes a query it explores query plans with different join orders, it then evaluates the estimated cost of each plan and picks the best one. It follows directly that changing the order of the tables in the from clause is not an effective optimization technique.

    In other words, A JOIN B is equivalent to B JOIN A and SQL Server knows it. You can see this for yourself:

    Showing that Joins are Commutative

    The best way to demonstrate that is to come up with an example where SQL Server chooses a different join order for a query plan than the order specified in the query.

    The Setup

    First create tables A, B and C and populate them

    use tempdb;
     
    create table A
    (
        id int identity 
            constraint PK_A primary key,
        value uniqueidentifier default newid()
    );
     
    create table B
    (
        id int identity
            constraint PK_B primary key,
        value uniqueidentifier default newid()
    );
     
    create table C
    (
        id int identity
            constraint PK_C primary key,
        value uniqueidentifier default newid()
    );
     
    GO
     
    set nocount on;
    insert A default values;
    GO 100
     
    insert B default values
    GO 500
     
    insert C default values
    GO 1000

    Query With Two Joins

    Check out the following query and query plan (with no query hints):

    select a.value, b.value, c.value
    from c
        join b on b.id = c.id
        join a on a.id = b.id
    where c.id > 95

    Plan with no hints
    Notice that SQL Server has changed the join order from C-B-A to A-B-C because it’s better that way.

    Same Query With Restricted Join Order

    If you really want to, you can force the join order with the FORCE ORDER query hint. We’re basically telling the query optimizer to not explore plans with different join orders. The query optimizer uses different rules when exploring different plans to evaluate. One of the rules is called JoinCommute. We can actually turn it off using the undocumented query hint QUERYRULEOFF.

    select a.value, b.value, c.value
    from c
        join b on b.id = c.id
        join a on a.id = b.id
    where c.id > 95
    OPTION ( QUERYRULEOFF JoinCommute)

    And we see that turning off JoinCommute gives behavior and performance just like FORCE ORDER.
    Don't Commute Joins Plan

    Kind of neat eh? Now when the next person asks you whether the join order makes any difference, you can confidently say no. And if that person is from Missouri (the “Show me” state) you can point them to this post.

  9. A Grade School Data Project

    This post is a story about my daughter’s school project with a plug for Wolfram Alpha and maybe Excel.

    My daughter came to me the other day and asked me to help her with some homework… She said she needed data. She was learning about plotting graphs in math class and she was asked to bring in some data that she could plot. Learning, Math, Data. I felt pretty proud (Just as any parent might when kids take an interest in their profession.)

    Personally, I thought this was great! My kid is asking me about data! At the very least, I could explain to her a little better what it is that I do at work. At best maybe she could “catch” what that I like about the topic.

    The project itself was pretty straightforward. The students were asked to bring in a list of numbers that they could use in class to create a chart. The data could be about anything and there were extra marks for plotting two series instead of one.

    What We Did

    So I asked my daughter what she wanted to bring… it could be anything she was curious about. Eventually we:

    • We started at Wolfram Alpha
    • And we picked the local temperature for 2012.
    • For the extra series, I picked temperature for a location that I thought would contrast well with the local temperature. That location was Sydney Australia.
    • This led me to search Wolfram Alpha for 2012 temperature for Kitchener, Canada and Sydney, Australia
    • But we needed the data points, this led me to use the most beautiful and useful feature that Wolfram Alpha provides, Data Download:
    • A little hidden, but super-useful

      Pretty much the only feature analysts need

    • It was the work of a minute (using Excel) to summarize the average monthly temperatures for both cities. I got the data into the right shape, created a pivot table, and adjusted some of the filters.

    After everything was done, the chart itself was interesting. As I expected, the average temperature (here) reaches its maximum in July and we saw the opposite trend in Australia. One surprising thing was that the temperatures varies very little in Sydney. It looks like 10°C (50 °F) is a very cold day in Australia. So the numbers confirmed what we guessed and showed us something new… Not bad.

    Keeping away from SQL Server (For Now)

    Not once did I even mention or use SQL Server or tables or databases. And I didn’t dwell on the Excel features. I just explained what we were looking at. In this case, I thought that it was better to show what the data tools could do before showing how they do it.

    Missed Opportunitiy?

    But I feel like I must have missed out on something here. Don’t you think so? Tracking temperature is kind of boring…  I felt like a carpenter who was trying to show off what could be done with a router but ending up with a birdhouse. What kind of data would you have picked? I tried to think of something that’s actually interesting or useful but came up empty and went with average temperature. If you have any better ideas, let me know.

  10. Well That Wasn’t Obvious

    I’ve discovered that non-intuitive lessons stick in the brain the best. I share three examples that I either learned recently (or at least re-learned recently).

    Uniqueifiers on Indexes That I Thought Shouldn’t Need Them

    Do you remember what a Uniquifier is? SQL Server uses these hidden values on rows to keep non-clustered indexes in sync with their non-unique clustered indexes. Think of this questionable schema:

    create table Actors
    (
    	ActorId int identity not null,
    	LastName nvarchar(100) not null,
    	FirstName nvarchar(100) not null,
    	AgentId int null,
    	primary key nonclustered (ActorId)
    )
     
    create clustered index IX_Actors
    	on Actors(LastName, FirstName)

    Then add some data:

    insert Actors (LastName, FirstName, AgentId)
    values ('Douglas', 'Michael', 3),
           ('Douglas', 'Michael', 4),
           ('Keaton', 'Michael', 5),
           ('Douglas', 'Kirk', 4)

    I can demonstrate the uniquifier value here because of the non-unique clustered index:

    select sys.fn_PhysLocFormatter (%%physloc%%) as [physical rid], *
    from Actors -- in my case, this shows page 231
     
    DBCC TRACEON (3604);
    DECLARE @dbid int = DB_ID(); 
    DBCC PAGE(@dbid, 1, 231, 3); 
     
    /* showing (among other things):
    ...
    Slot 1 Offset 0x60 Length 51
    UNIQUIFIER = 0                       
    LastName = Douglas                   
    FirstName = Michael                  
    ActorId = 1                          
    AgentId = 3                          
    KeyHashValue = (16035ff378a3)        
     
    Slot 2 Offset 0x93 Length 55
    UNIQUIFIER = 1                       
    LastName = Douglas                   
    FirstName = Michael                  
    ActorId = 2                          
    AgentId = 4                          
    KeyHashValue = (16035e22af06)        
    ...
    */

    Cool! Just like real life (See Michael Douglas (I – XXVI) at IMDB). This uniquifier value behaves very much like the roman numerals you see at IMDB.

    Here’s the part that wasn’t obvious… what if I chose a clustered index whose columns include the primary key…

    drop index IX_Actors ON Actors;
    create clustered INDEX IX_Actors
    	ON Actors(LastName, FirstName, ActorId)

    That should be unique right? The index includes the primary key column (ActorId) that guarantees it. So SQL Server shouldn’t need the uniquifier right?

    But it does! 

    SQL Server doesn’t count on the index being unique unless you say so. I actually discovered the above while using Kendra Little’s sp_BlitzIndex, a tool so underexposed it’s pasty.

    UPDATE Modifies Column Values Only Once

    What does the following code produce?

    -- set up test table 
    declare @test TABLE (id int, value int);
    declare @source TABLE (id int, increment int);
     
    insert @test (id, value)
    values (1, 0), (2, 0)
     
    insert @source (id, increment)
    values (1, 10),(1, 20),(1, 30),(2, 100),(2, 100),(2, 100)
     
    -- "sum": add each value in the source to the existing value
    update @test 
    set value = value + increment
    from @test t
    join @source s
        on s.id = t.id
     
    -- check the results
    select id, value
    from @test
    /* The results:
    id          value
    ----------- -----------
    1           10
    2           100
    */

    Surprised? I think this example (or one very much like it) surprises those of us who started our careers as programmers. Many of us follow up this lesson with the related “UPDATE statements don’t have GROUP BY clauses” and then the lesson “How do I use CTEs?”

    Indexed Views Don’t Support Max/Min Aggregates

    Indexed views support a couple aggregate functions like COUNT_BIG() and SUM(). And with some trickery you can calculate AVG() and STDEV(). But SQL Server restricts the use of MAX() and MIN() in indexed views despite how useful they’d be.

    It might help to understand why. SQL Server maintains indexed views as physical database objects and it can maintain aggregate values like COUNT() and SUM() by processing only the changing rows in the base table while safely ignoring the rest of the table. But I can’t say the same for MAX() or MIN(). If SQL Server supported MAX() and MIN() in indexed views, then when you delete the row in the base table that represents the MAX() value, SQL Server would have to search the rest of the base table to find the new MAX() value.

    Check out the microsoft.connect feature suggestion Expand aggregate support in indexed views (MIN/MAX). Aaron Bertrand created this Connect suggestion and I like it because it the feedback is constructive. I like it because of its description, comments and the useful workaround. The Microsoft team even gave some insight into how they almost included this feature in SQL 2008. This connect item only seems to be lacking an E.T.A. so go and cast your vote!

    Bonus Content

    I didn’t draw any illustrations this week so I’m including some bonus content (admittedly written by others):

    Non-obvious Things From Twitter Friends

    Request for Tweets: What thing in sql surprised you? Something not obvious. For me it was how the TRY and CATCH didn't always.

     I love how the log shipping "use default data and log folders" ... doesn't

    how about how INSERT <tbl> VALUES (1),(2),(3) limits you to 1000 entries

    Rolling time-window filtered index. The cool surprise - building each new index uses the existing one, so it's really quick.

    Trivia

    • Did you know Michael Keaton (born Michael Douglas) changed his name to avoid confusion with that other guy? Pretty wise.
    • Jes Schultz Borland reminded me recently that good writers use more active verbs. I took that advice to heart and turned the writing of this article into an exercise. I avoided using words like is, was or are here and I think it turned out pretty well.
  1. 1
  2. Next ›
  3. Last »