DBPedias

Your Database Knowledge Community

Valentino Vranken

  1. SSIS OLE DB Source, Parameters And Comments: A Dangerous Mix!

    Once again I’ve been wasting some time because of a silly bug.  This time it was due to the OLE DB Source component and the way it works with parameters.  If you are in a situation where you know your query is working fine and yet no records are going down the data flow, here’s a possible solution!

    Disclaimer: this issue exists up until SQL Server 2008 R2.  Read on for details!

    The Situation

    I had a Data Flow with an OLE DB Source that uses one parameter, for instance:

    select ProductAlternateKey, EnglishProductName
    from dbo.DimProduct
    --some really smart comment goes here
    where Color = ?

    I knew the query was working fine because when executed through SSMS and with the question mark replaced with ‘blue’, it would return 28 rows:

    28 records in Management Studio

    But when executed in BIDS, through either Execute Package or Execute Task, it would return zero records:

    Zero records, zilch, nada, niente, none at all!

    So I thought something must be going wrong with the package variable that gets passed into the source parameter, somehow.  I’m not going into details on what I tried out in my attempt to get this working, but I can tell you that I started to get really irritated.  My colleague Koen Verbeeck (b|t) can confirm this because I called him over to my desk to help me think! (thanks btw!) Smile

    After some further tinkering with the data flow, we had our smart moment of the day and decided to launch SQL Server Profiler to see what BIDS was sending to the server!  I’m not sure if you’re aware of this but BIDS is doing some metadata-related stuff when preparing queries.  As far as I can tell, it also tries to determine the parameter type by running the following query:

     set fmtonly on select Color from  dbo.DimProduct
    --some really smart comment goes here where 1=2 set fmtonly off

    When creating this statement, it seems to use the whole FROM clause of the original query, including any trailing comments.  It combines that with a SELECT statement that contains the field that gets filtered and it appends " where 1=2 set fmtonly off".

    But alas, apparently it’s not aware that lines can be commented out by using a double dash.  So part of its generated statement is commented out.  What it should have done is used some CRLFs, especially in front of the WHERE clause.  But it didn’t.

    So, as a result of that, FMTONLY remains on while the SELECT statement gets executed, resulting in zero records!

    For those unfamiliar with the FMTONLY setting:

    Returns only metadata to the client. Can be used to test the format of the response without actually running the query.

    And I can actually confirm what I’m stating here by changing the query to the following:

    set fmtonly off;
    select ProductAlternateKey, EnglishProductName
    from dbo.DimProduct
    --some really smart comment goes here
    where Color = ?

    28 records down the pipe!

    We've got data!

    But this hack is a little too dirty to put in production.  So what else can we do?  Well, use block-style comments instead and we won’t face the issue!

    select ProductAlternateKey, EnglishProductName
    from dbo.DimProduct
    /* some even smarter comment goes here */
    where Color = ?

    So, as I mentioned at the start of the post, this behavior can be reproduced using SSIS versions prior to 2012.  What about 2012 then?  Here’s the result of the Data Flow using the first query mentioned above:

    SSIS 2012: we've got data, even with the "faulty" query!

    Alright, that works better!  Now let’s use Profiler to check what’s going on here.  This is the first statement that gets executed:

    exec [sys].sp_describe_undeclared_parameters N'select ProductAlternateKey, EnglishProductName
    from dbo.DimProduct
    --some really smart comment goes here
    where Color = @P1'

    Further down, I also see this one:

    exec [sys].sp_describe_first_result_set N'select ProductAlternateKey, EnglishProductName
    from dbo.DimProduct
    --some really smart comment goes here
    where Color = @P1',N'@P1 nvarchar(15)',1

    It is using an entirely different approach, no longer using the FMTONLY setting!  Hang on, this rings a bell!  Look what the BOL page for SET FMTONLY (2012 version) specifies:

    Do not use this feature. This feature has been replaced by sp_describe_first_result_set (Transact-SQL), sp_describe_undeclared_parameters (Transact-SQL), sys.dm_exec_describe_first_result_set (Transact-SQL), and sys.dm_exec_describe_first_result_set_for_object (Transact-SQL).

    Cool stuff!

    Conclusion

    If you’re not on SQL Server 2012 yet, be careful with comments in OLE DB Sources in the SSIS Data Flow!  Ow, and get the SQL Server Profiler off its dusty shelf now and then!

    Have fun!

    Valentino.

    Share

  2. SSMS: Connect To Several Servers In One Click (Okay, Two)

    I’ve been using the SQL Server Management Studio (aka SSMS) since it was first released with SQL Server 2005.  And yet, it hasn’t stopped to surprise me.  Earlier this week I’ve discovered a feature which I will use frequently as of now!

    While doing my day-time job at the customer, I have a habit of connecting to several servers each time when I open SSMS.  I always connect to them in the same order so that I can quickly locate them in the Object Explorer.

    The servers I want open all the time are DEV DB server, DEV SSIS server, UAT DB server and UAT SSIS server.  The SSIS servers are needed to get quick access to the Job Agent, while the DB servers are what I use all the time to actually do my job.  To avoid confusion: on those SSIS servers, I’m connecting to the Database Engine, not the SSIS service.

    So, earlier this week I was trying to find a method to easily connect to these servers with as few clicks as possible.  And guess what: I found one! (Well, otherwise I wouldn’t be writing this of course.)

    In the following paragraphs I’ll describe a method which you can use to connect to several servers at once, with just two clicks!  I can tell you, it sure beats the Connect To Server popup window!

    Screenshots taken using SQL Server 2012, but this should work as of 2005.

    First we’ll create a group of all the servers that we’d like to connect to with just two clicks.

    Open the Management Studio and switch to the Registered Servers view.  If you can’t find it, use the menu to select View > Registered Servers or hit CTRL+ALT+G on your keyboard.

    Open the Registered Servers view

    Right-click the Local Server Groups node located under Database Engine and select New Server Group… to create a new group.

    Creating a new registered servers group

    Give it a good name and click OK.

    New Server Group Properties

    Right-click the new group and select New Server Registration….

    Enter the server credentials as appropriate.

    New Server Registration

    What’s important here, besides the obvious such as entering correct credentials, is the Registered server name.  The servers will be ordered alphabetically using that name, and that’s also the order in which they’ll be opened!  So if you’re like me and you want your servers to be opened in a specific order, you’ll need some naming creativity or use numbers.  In my case I’m glad that DEV orders alphabetically before UAT, which is what I want. Smile

    Ow, in case you’re wondering about that server name shown in the screenshot above, the dot refers to localhost and sql2012 is the instance name.

    To demonstrate that I’m not joking, I’ll now register a second server and name it “Another server”.  After clicking the Save button, here’s what the Registered Servers window displays:

    "Another server" shown above "My DB Server"

    As I told you, “Another server” is shown above “My DB Server” even though it was created later.

    Now, what you’ve all been waiting for, how do I tell SSMS to connect to these servers?

    Click number one is a right-click on the group name, My Favorite Servers in this case:

    Use Object Explorer on the group name to open all servers in the group!

    And click number two is the one on Object Explorer!

    SSMS will now switch to the Object Explorer window and connect to the servers, in alphabetical order!

    Object Explorer is connected to multiple servers by just two clicks!

    How’s that for a time-saver huh?  I hope you’ll enjoy this as much as I will!

    Have fun!

    Valentino.

    Share

  3. Creating Multiple-Column Reports

    Now and then I get inspired to write about a certain topic as a result of what I encounter on the forums.  Here’s one of those moments.

    The examples in this article are created using SQL Server 2012 RTM – 11.0.2100.60 (X64).  I’m also using the free AdventureWorksDW2012 sample database available at CodePlex.  But that doesn’t imply that you actually need 2012 to get this to work.  The principle explained in this article should also work on SQL Server 2008 and even 2005.

    You can download the resulting RDL from my SkyDrive.

    Introduction

    Have you ever needed to create a report that shows a list of something spread over multiple columns?  Then you may have come across a feature called newsletter-style reports.  Excited to have found out about that feature, you started creating your report.  But alas, upon rendering the preview you discovered that it didn’t work.  Further investigation made you conclude that the feature is only supported in two specific renderers (PDF and Image).  Ouch!

    In this article I’ll be showing you a method to create multiple-column reports that you can actually use with all renderers!

    Setting The Scene

    We’ve been asked to create a report that shows a list of product codes with each category displayed on a separate page.  The codes should be displayed in four columns.

    Implementing The Report

    To get these requirements implemented, we’ll be using some T-SQL skills in combination with nested tables.  So, first step is the dataset query.

    The Dataset

    The query retrieves only the fields needed in the report plus one additional, calculated field called DisplayColumn.  This additional field indicates the column in which the code should get displayed.  Here’s what the results look like:

    List of product codes with column number by category

    As you can see, the column numbering starts at 1, goes up to 4 and then starts again at 1.  It also starts at 1 again at the start of a new category.  This is needed because we’ll be grouping the data on category in the report.

    Now, let me show you that query and explain a bit how the DisplayColumn is getting calculated:

    declare @numberOfColumns int = 4;
    
    select dpc.EnglishProductCategoryName, dp.ProductAlternateKey
        , (ROW_NUMBER() OVER (
            PARTITION BY dpc.EnglishProductCategoryName
            ORDER BY dp.ProductAlternateKey) + @numberOfColumns - 1) % @numberOfColumns + 1
        as DisplayColumn
    from dbo.DimProduct dp
    inner join dbo.DimProductSubcategory dps on dps.ProductSubcategoryKey = dp.ProductSubcategoryKey
    inner join dbo.DimProductCategory dpc on dpc.ProductCategoryKey = dps.ProductCategoryKey;
    

    The main ingredient here is the ROW_NUMBER function.  According to its definition, it “returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition”.

    That brings us to the OVER clause, and its PARTITION BY statement.  In our example we partition by category because we need the numbering to start over for each category.

    Next up is the ORDER BY part.  We order by ProductAlternateKey, which is the product code that our report should display.  We want the codes to be ordered alphabetically so the column numbering should take that order into account.

    As you can tell, I’m using some numeric trickery in that ORDER BY.  The reason for that is because I’d like my numbers to start at 1 and end at 4.  If we would just do “row_number % 4” (% is the modulo operator btw), then our last column would be zero instead of 4.

    Okay, that’s part one of the method explained.  Over to the visualization part.

    Setting The Tables

    We’re going to create a table that shows the product category in a header with the product codes spread over four columns below the header.

    So, first drag a table into the design canvas and give it an additional column.

    To quickly link the table with the dataset, drag the product code field into the detail cell of the first column.

    Now we’re going to set up the grouping.  We’ll also be making use of nested tables and because tables cannot be nested on the detail level of a tablix, we need to set up grouping even on that detail level.  So first, open up the Group Properties of the Details level by right-clicking it in the Row Groups pane:

    Opening up the Detail-level Group Properties

    Click the Add button to add a group and group on the product category:

    Group the Detail level on product category

    We want to display a header on the category level, so we’ll need to add another group on top of this one.  Right-click the Details row group once more and add a Parent Group:

    Adding a Parent Group to the Details level

    Group by product category once more and activate the Add group header checkbox:

    Adding parent group on product category

    Here’s what our table now looks like:

    Some cleaning-up to be done

    We don’t need that first column and we don’t need that top row, so delete both.  Now select the four textboxes in the upper row, right-click and select Merge Cells:

    Merging the header cells to prevent text getting cut off or wrapped

    This way the header line gets to use all available space.  Now put the product category field into the remaining merged cell and render the report:

    Grouping set up as expected

    If all went well you should be seeing the four categories.  That means the grouping part is implemented.

    Of course, the detail level is not yet as needed because only the first code of each category is being shown and the three other columns are still empty.

    So let’s address that now!

    Nesting The Tables

    Currently our four detail-level cells are just plain textboxes.  But textboxes don’t allow filtering and we need filtering because the first column should show only the product codes with DisplayColumn equal to 1, second column 2, and so on.

    So what we’re going to do is nest another tablix inside each of those four cells.  This nested tablix should be one-celled, so just one column and one detail-level row.  That can be done through several ways.  I’ll show you two.

    To set up the first detail-level cell, drag another Table inside that first textbox (the one that currently contains the product code).  That gives us this:

    Nesting a table in a table

    Now select the nested table by clicking one of its cells:

    Select the nested table

    It may be a bit tricky so if needed, temporarily increase the height of the detail-level row to enlarge the nested table.

    Now remove two columns and remove the header row of the nested table.  Then right-click the grey square in the upper-left corner and open up the Tablix Properties:

    image

    These are the properties of the nested table.

    Switch to the Filters page, click the Add button and set the filter to DisplayColumn = 1.

    Filter on DisplayColumn

    Click OK to close the properties.

    Now put the product code field again in that first detail-level cell and render the report:

    One column done, three remaining!

    One column done, three remaining!

    Let’s set up the second column.  This time, drag a List into the second detail-level cell:

    image

    A List is in fact a one-celled tablix with a Rectangle inside.  But we don’t need that rectangle here, we need a Textbox.  So select the rectangle by clicking once inside the cell.  You can tell that the rectangle is selected by looking at the Properties window, as shown above.

    Now hit the Delete button on the keyboard.  In the Properties window, you should see Rectangle changing into Text Box:

    Removing the Rectangle from the List leaves us with a Text Box

    Once more we’ve got a one-celled tablix.  Set up filtering on DisplayColumn = 2, using the method as explained when setting up the first column (hint: upper-left grey square).

    Put the product code field inside the second detail-level cell and render the report:

    Second column set up

    If all went well, your second column should now display some codes as well.

    Two more to go!

    Setting up the remaining two columns is actually really easy.  Select the nested tablix inside the second cell by clicking the small grey square in the upper-left corner.  Then hit CTRL+C, select the third detail-level cell and hit CTRL+V.  Also paste into the fourth detail-level cell.

    Now change the filtering to 3 and 4 for column three and four and render the report:

    All four columns display product codes

    All four columns display product codes!

    What now remains is some visual cleanup and adding page breaks on the category group.  I’m only going into details on the page break.

    In the Row Groups of the main table, right-click the upper group and select Group Properties.  Then switch to the Page Breaks page and check the Between each instance of a group checkbox:

    Adding a page break on the category group

    Render the report to see the final result:

    Report implemented: product codes in four columns grouped per category!

    The first page displays the product codes of the first category, spread over four columns!  The other pages contain the other categories.

    Requirements implemented!

    Conclusion

    Even though it takes a bit of work and some tricks, it is possible to create multi-column (aka newsletter-style) reports using basic SSRS components whilst still supporting all renderers.

    I hope you enjoyed this article, have fun!

    Valentino.

    References

    ROW_NUMBER() function

    Share

  4. Displaying Checkboxes In An SSRS Report

    On the forums I now and then encounter questions asking for a checkbox control.  And the methods usually presented are either through images or by using a certain font one way or another.  However, as of SQL Server 2008 R2 there is actually a third and interesting alternative which comes really close to actually having a control!

    In this article I’ll be presenting those three methods.

    Screenshots are made using SSRS 2012, and so is the Checkboxes.rdl available for download on my Skydrive.

    Checkbox Control, huh?

    You may be wondering why people would want to put a checkbox on a report.  After all, reports are not capable of accepting input – except through parameters but that’s a different story – and isn’t that what a checkbox is all about?

    Not entirely.  Reporting Services is not only used for data exploration.  Sometimes people use it to produce data-driven printouts, such as letters or even checklists.  In that perspective, having checkbox control functionality would indeed be useful.

    A Silly Scenario

    My yummy pasta sauceMy imagination is failing me a little today so I came up with this silly example: a recipe checklist.  And today we’ll be cooking some pasta sauce!

    This is the query that produces the list of ingredients:

    select 'Yummy Pasta Sauce' as Recipe, 'zucchini' as Ingredient, 1 as Quantity, 'piece' as Unit, 1 as Needed
    union select 'Yummy Pasta Sauce', 'mushrooms', 500, 'g', 1
    union select 'Yummy Pasta Sauce', 'minced meat', 1, 'kg', 1
    union select 'Yummy Pasta Sauce', 'Brussels sprout', 1, 'kg', 0
    union select 'Yummy Pasta Sauce', 'onion', 2, 'piece', 1
    union select 'Yummy Pasta Sauce', 'tomato sauce', 1, 'L', 1
    union select 'Yummy Pasta Sauce', 'potato', 1, 'piece', 0
    union select 'Yummy Pasta Sauce', 'Brussels sprout', 1, 'kg', 0

    I’m cheating a bit here, all the data is hardcoded in the query.  Normally you’d of course have a database that contains your recipes.

    And I’ve also introduced some ingredients which I wouldn’t really want in my sauce, such as Brussels sprouts. Silly, but it gives me a good excuse to use the red checkbox.

    Time to explore the possibilities.

    Method 1: Images

    I will not be going into full detail to explain the usage of images in SSRS reports.  If you need additional information on that, please first read my previous article that covers all possibilities of putting images on reports.

    As I don’t have access to an inventory of stock images, I opened my favorite drawing program, Paint.NET, and created two images myself.  Then I embedded both images in the report:

    Two images embedded in the report

    I created a dataset using the query above.  Then I set up a Table with a group on the Recipe field whilst adding a header row to display the name of the recipe.

    The row groups: Recipe > Details

    The first column will show the ingredient details, using the following expression:

    =Fields!Quantity.Value & " "
        & Fields!Unit.Value
        & " of " & Fields!Ingredient.Value

    In the second column in the detail level cell, I did a drag & drop of one of my images.  That gives the following pop-up:

    Dragging an image into the report brings up the Image Properties

    Clicking the fx button brings up the brings up the Expression editor, in which I created the following expression:

    =IIF(Fields!Needed.Value = 1, "checkbox_true", "checkbox_false")

    Rendering the report shows us:

    Using images to display checkboxes on a report

    While easy to implement, a disadvantage to this method is that you need to get hold of the images.  For my example I created them myself to avoid any copyright issues, but I wouldn’t call them fit for professional purposes.  For a recipe list they serve just fine though! Smile

    Mission one accomplished: the report shows some image-based textboxes!  On to mission two then!

    Method 2: The Wingdings Font

    I won’t be going into too much detail on this method either.  It’s already been explained by Krzysztof Kryczka in his article here.

    In short, you just use the Wingdings font with the appropriate character to show an image.

    In my table, I added an additional column to the right of the last one and entered the following expression:

    =IIF(Fields!Needed.Value = 1, Chr(252), Chr(251))

    As you can see in the character table on Krzysztof’s site, character 252 is a check mark while 251 is a cross.  Using the Chr() function, I can put the character on the report without typing the letter. (I actually doubt that they even match a letter so typing it would be difficult anyway.)

    After setting some font-related properties, here’s what the rendered report looks like:

    Using the Wingdings font to show a check mark

    A small disadvantage to this method is that the Wingdings font needs to be installed on the SSRS server.

    Mission two accomplished!  On to number three!

    Method 3: The Indicator Control

    Have you already used the Indicator control?  It’s new since SQL Server 2008 R2!

    The list of Report Items in the Toolbox contains an Indicator

    Let’s create a third column to the right of the last one and drag an Indicator into it.  That brings up following pop-up:

    The SSRS Indicator comes in four different types with several different shapes!

    In our context here the “3 Symbols (Uncircled)” seem like a valid option, so I’m selecting that.

    The next step is to inform the indicator which field it should use.  In our case, it’s the Needed field.  It can be selected by using the dropdown next to (Unspecified).

    Telling the Indicator which field is should use

    Let’s have a look at the rendered report now:

    Using the Indicator control to display check marks!

    We’ve used the built-in Indicator control to show check marks, mission three accomplished!

    (remark from audience:)

    Erm, mister SSRS dude sir, how does the indicator know what image it should show?

    (me thinking:)

    Darn, they noticed I cheated a little.

    You’re right, I have something to confess.  I made use of some handy defaults of the Indicator to skip some steps.  Ready to have some fun?  Add the following line to the query in the dataset:

    union select  'Yummy Pasta Sauce', 'Let''s break it!', 1, 'attempt', 100

    Without any other changes, render the report:

    Oh no, I've broken the Indicator!

    Oh my, look at that! The image and font-based check marks are still okay but the Indicator is broken!

    Right, next test, in the additional query line added earlier, change the 100 to 2 and render the report:

    Now the Indicator is totally broken!

    That surely didn’t better the situation, it made it worse!  Don’t worry, the explanation is fairly easy.  Let’s have a look at the actual settings of the indicator.  Open the Indicator properties by right-clicking the indicator and selecting Indicator Properties.  Then select the Value and States page:

    The default settings of the indicator

    As you can see, the units are measured using percentages with “red” starting at zero while ending at 33.  That means that, based on all available values in the dataset, all values that fall in the first 33% will become red.

    And that is exactly what the two tests above demonstrate.  If we take the last one, the value ranges from zero to two.  So zero is 0%, 1 is 50% and 2 is 100%.  That’s why all ones end up as the yellow exclamation mark!

    If all you need is a green/red check mark like in our example, you need to ensure that you’ve only got two possible values and the “red” value needs to be lower than the “green” value.  That way you can make use of the default values of the indicator, making implementation a piece of cake.

    Alternatively, if you’d like to customize the behavior of the indicator, that’s an option as well.  Have a look at what the Icon dropdown produces:

    The Indicator ships with several different icons to be used

    Any of those built-in icons can be selected.  Customizing the color is easy too, just use the Color dropdown.  And of course the numeric ranges can be changed as well.

    If the percentage-based measurement doesn’t work out well in your situation, you can switch to Numeric:

    The indicator can use either Percentage or Numeric measurement

    If that hasn’t convinced you yet, then I don’t know what will…

    Alright alright, one more:

    Putting smileys on your SSRS report

    Conclusion

    In this article I’ve demonstrated three different methods of implementing checkboxes in SSRS reports.  Contrary to what is currently still believed, it is actually possible to add checkboxes to a report making use of just built-in functionality!

    Remember: have fun!

    Valentino.

    References

    MSDN: Indicators

    Share

  5. Counters In SSIS: What Are Your Options?

    The other day I needed a counter in my SSIS Control Flow.  Using the Foreach Loop container I was looping over a bunch of files and needed to count the number of files that got imported successfully.  The counter had to comply with the following requirements:

    • Easy to implement
    • Fast

    Let’s test some different methods on getting this implemented. 

    Counting Possibilities

    Two of the methods are possible as of SSIS 2005 while the third one is new since SSIS 2012. So yeah, the screenshots are taken using SQL Server 2012 RTM.

    Setting The Scene

    We’ll start from a new package and create a package variable of type Int32.  This variable will keep track of the count.

    To be able to performance test the different possibilities, I’m using a For Loop container.  This loop is using another package variable called loop, type Int32, and performs the loop 10,000 times.  In case you don’t know how to set up such a loop, check out this screenshot:

    Using a For Loop container to test task performance

    So my package contains the following two variables:

    Two package variables of type Int32

    The cnt variable is the one to actually hold the count.

    To test performance I’ll be using the command-line dtexec utility so that the debugging overhead does not mess up the numbers.   I’ve also executed each method at least three times to ensure a certain result was not “by accident”.

    Using a Script Task

    The most logical component that came to mind was a Script Task.

    Using a couple lines of code, C# in this case, the counter value can be incremented with one:

    int cnt = (int)Dts.Variables["cnt"].Value;
    Dts.Variables["cnt"].Value = ++cnt;

    The above code assumes that the cnt variable has been put in the ReadWriteVariables property:

    The User::cnt variable is in the ReadWriteVariables for the script to use

    Here’s what that looks like in the Control Flow:

    For Loop container with a Script Task

    So how fast does this execute?

    Looping 10,000 times over the Script Task takes 8 seconds

    About eight seconds, that’s acceptable.

    However, what I don’t like about this method is how long it takes to implement.  I mean, it takes more than just a couple of seconds, right?  And the fact that you actually need to use custom .NET code to perform such a trivial task as adding one to a number.  Using .NET code is good for complex situations, when there’s no other option.  But the risk of a bug is always larger, imagine I wrote cnt++ instead of ++cnt, what do you think the results would be? (Hint: my laptop would crash before the counter reaches 10,000).

    On to another option then!

    Using a Execute SQL Task

    Instead of resorting to .NET code, increasing a number by one is easy to achieve using a simple T-SQL statement, right?  So I thought, let’s try a Execute SQL Task!

    Here’s the query:

    select ? + 1 as cnt

    What does the task look like?

    Using Execute SQL Task to increase a package variable

    ResultSet has been set to Single row.

    The Parameter Mapping page has got the User::cnt variable specified:

    Specifying the User::cnt package variable in the Parameter Mapping page

    And the Result Set page has got the single value from the single row mapped to the User::cnt package variable:

    Mapping the cnt result to the User::cnt variable

    What do you say, easier to implement than the Script method?  I do think so!

    This method has one limitation though: it needs a Connection Manager connecting to a SQL Server database.  However, in most ETL packages you’ll probably have that present already.  What I was a bit worried about though is the overhead of connecting to the server, how much will it be?

    Let’s find out!

    The Execute SQL Task needs almost a minute to increase the counter

    That’s right, using the Execute SQL Task to increment the variable 10,000 times takes about a minute.  On my laptop.  Connecting to the tempdb on my local instance.  When testing this over a network, it even resulted in timings over four minutes.  So this solution is really unacceptable in my opinion.

    However, we can give it one more try.  A Connection Manager has got a property called RetainSameConnection.  By default this is set to False which means that our test above has opened and closed ten thousand connections to my local tempdb.  Oh my, of course that takes a while!

    Setting it to True gives us the following result:

    Setting RetainSameConnection to True speeds up the process by three

    About twenty seconds, which is about one third of the previous result.  That surely is better.  And what’s perhaps even more interesting is that a similar result is achieved when connecting to a database over the network: from over four minutes down to twenty seconds.  So yeah, this would work for me.

    Sidenote: for other interesting use of the RetainSameConnection property, check out this post regarding transactions by Matt Masson.

    But we’re not stopping just yet.  As of SQL Server 2012, we’ve got a third possibility!

    Using an Expression Task (SQL2012!)

    Instead of resorting to custom .NET code or (ab)using the Execute SQL Task, in SSIS 2012 we’ve got a new task: the Expression Task.

    The Expression Task builds and evaluates SSIS expressions, nifty!

    As you can read in the SSIS Toolbox, the Expression Task builds and evaluates SSIS Expressions that set variable values at runtime.  That surely sounds exactly like what we’d need, doesn’t it?

    So how does that work?

    Using the Expression Task to increase the counter

    Really easy to set up, we just specify that the User::cnt variable should be set to itself plus one.  When put in a For Container, we’ve got a counter!

    But how does it perform?

    The Expression Task is the easiest and the fastest method, mission accomplished!

    About seven seconds, which is even slightly faster than the Script Task method!

    With that we’ve found the method that complies with both requirements: easy to implement and performs fast!  Now how am I going to convince my clients to upgrade to 2012, hmm, need to think a bit…

    Conclusion

    We found out that the new Expression Task is a very useful task for its purpose.  In our case we used it to create a counter.

    If you’re not on SQL Server 2012 yet, better stick to either Script Task or Execute SQL Task with RetainSameConnection set to True.

    Have fun!

    Valentino.

    Share

  6. SSIS, Flat Files And Accents (é, è, …)

    This is a follow-up to my article on Loading Complex XML Using SSIS and XSLT.  In that article I demonstrated how you can convert complex XML into simple CSV using XSLT in SSIS.

    Dealing With Special Characters

    If you’ve followed the instructions in my article mentioned above and you need to deal with special characters such as the é and è encountered in the French language, you probably noticed that it wouldn’t really work as expected.  In fact, in your final result you may have ended up with the special characters being replaced with other, even more special, characters.  Obviously not good.

    Here’s an explanation on the reason why that happens, and also how to deal with it.

    Setting The Scene

    Imagine the following sample XML, representing a really huge book collection:

    
    <books>
        <book>
            <title>The Hitchhiker's Guide to the Galaxy</title>
            <author>Douglas Adams</author>
            <language>EN</language>
            <description>The Hitchhiker's Guide to the Galaxy is a science fiction comedy series created by Douglas Adams.</description>
        </book>
        <book>
            <title>Le Trône de fer</title>
            <author>George R.R. Martin</author>
            <language>FR</language>
            <description>Le Trône de fer est une série de romans de fantasy de George R. R. Martin, dont l'écriture et la parution sont en cours. Martin a commencé à l'écrire en 1991 et le premier volume est paru en 1996. Prévue à l'origine comme une trilogie, la série compte désormais cinq volumes publiés et deux autres sont attendus.</description>
        </book>
    </books>
    

    As you can see, the second book in the list is the French version of the first book in the A Song of Ice and Fire series by George R.R. Martin and as it goes with French, there are some accents in the description of the book.

    We’ll use the following XSLT to convert it to CSV:

    
    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs=http://www.w3.org/2001/XMLSchema xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
      <xsl:template match="/">
        <xsl:text>BookTitle;Author;Language;Description</xsl:text>
        <xsl:text>&#13;&#10;</xsl:text>
    
        <xsl:for-each select="books/book">
          <xsl:text>"</xsl:text>
          <xsl:value-of select="title"/>
          <xsl:text>";"</xsl:text>
          <xsl:value-of select="author"/>
          <xsl:text>";"</xsl:text>
          <xsl:value-of select="language"/>
          <xsl:text>";"</xsl:text>
          <xsl:value-of select="description"/>
          <xsl:text>"</xsl:text>
          <xsl:text>&#13;&#10;</xsl:text>
        </xsl:for-each>
    
      </xsl:template>
    </xsl:stylesheet>
    

    Using an XML Task in the Control Flow, as explained in my article, we’d get the following output:

    BookTitle;Author;Language;Description

    "The Hitchhiker’s Guide to the Galaxy";"Douglas Adams";"EN";"The Hitchhiker’s Guide to the Galaxy is a science fiction comedy series created by Douglas Adams."

    "Le Trône de fer";"George R.R. Martin";"FR";"Le Trône de fer (A Song of Ice and Fire) est une série de romans de fantasy de George R. R. Martin, dont l’écriture et la parution sont en cours. Martin a commencé à l’écrire en 1991 et le premier volume est paru en 1996. Prévue à l’origine comme une trilogie, la série compte désormais cinq volumes publiés et deux autres sont attendus."

    So far so good, all accents are still present!

    Then we’d import the file using a Flat File Source component in a Data Flow Task.  Here’s what the General page of the Flat File Connection Manager would look like:

    Flat File Connection Manager: General

    We’ve set double-quote as Text Qualifier and checked the Column names in the first data row textbox.

    Switching to the Columns page we’d get the following:

    Flat File Connection Manager: Columns - the Preview has messed up the accents!

    Hang on, that’s not right!  The Preview is not displaying our accents as expected!  Oh my, what’s going on here? Let’s call the code page detectives!

    A Mismatch Investigation

    Take a good look at the XSLT which we’ve used to convert the XML into CSV, especially the xsl:output line:

    <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>

    That line specifies that the text output should be encoded using the UTF-8 code page.

    Now take a good look at the General page in the screenshot earlier, more precisely this part:

    Code page: 1252 (ANSI - Latin I) is not what we need right now!

    Indeed, code page 1252 (ANSI – Latin I).  While the input is UTF-8.  Of course that results in a mismatch of certain characters, as demonstrated here.  The fix is fairly easy, just change the Code page setting to 65001 (UTF-8).

    Code page: 65001 (UTF-8) - much better!

    If we now switch back to the Columns page we should come to the following result:

    Flat File Connection Manager: Columns page preview with accents!

    Ah, sure looks better doesn’t it?  All accents are present as expected.

    But in case you thought that’s it, I’d advise you to think again.  Don’t worry, I’ll demonstrate what I mean.  Let’s do that by setting up a simple Data Flow.

    Setting Up The Data Flow

    Throw in a Flat File Source and specify our Flat File Connection Manager.  I also prefer to keep NULLs as they come in, using the Retain null values from the source as null values in the data flow checkbox.

    Flat File Source: Connection Manager

    If you click the Preview button you should get similar output as shown one screenshot earlier.

    Now hook this up to an OLE DB Destination that writes the incoming data into a table in your favorite database:

    OLE DB Destination is not happy :(

    As you can see, our destination is not entirely happy with all this.  Here are the details of one of the error messages:

    Validation error. Data Flow Task: Data Flow Task: The column "BookTitle" cannot be processed because more than one code page (65001 and 1252) are specified for it.

    Looks like once more we’ve got a code page conflict.  And we sure do. Clicking the Data Flow connector between the Flat File source and OLE DB destination shows us the following:

    Data Flow Path Editor shows that our strings are encoded using the 65001 code page.

    Each of our incoming string values is encoded using the 65001 (UTF-8) code page.  But our database was created using the Latin1_General_CI_AS collation.  So we’ve indeed got a code page conflict!

    Fear not, that’s easily remedied.  Add a Derived Column transformation in between the source and destination and convert each incoming string value using a cast expression such as this one:

    (DT_STR, 50, 1252)BookTitle_IN

    Note: whenever I need to manipulate incoming columns to create a second version of the same column, I rename the incoming column to TheColumn_IN.  The new version will be called TheColumn and preferably TheColumn is the name of the field in the destination table.  This makes it easy to distinguish all columns later down the flow.

    Here’s what the final version of the Derived Column looks like:

    Using the Derived Column transformation to cast the incoming strings into the correct code page.

     

    Next we’ll need to open the Destination and change the mapped fields to the new ones.  Because my new columns are called exactly the same as the fields in the destination table, I can do that easily.  In the Mappings page, all I need to do is right-click the grey background in between the two tables and click Select All Mappings, hit the Delete button, right-click again and click Map Items By Matching Names:

    Using Map Items By Matching Names, easy!

    With the data flow finished, let’s give our package a run!

    Flat File Source has got a length issue!

    Ouch, our source is not happy!  A closer examination of the Output pane brings us to the following error:

    Error: 0xC02020A1 at Data Flow Task, Flat File Source [16]: Data conversion failed. The data conversion for column "Description" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".

    Oh right, so far we haven’t bothered looking at the actual length of the data that we’re importing.  Actually, what is the length of our data flow columns??  Well, if you’ve been paying close attention you should have noticed the number 50 several times in the screenshots and expressions above.  That’s indeed the default length for text columns when importing a flat file.

    And if you scroll back up to the sample XML, you’ll notice that the content for the description is longer than 50 characters, thus causing our error!  Let’s find out how to get that solved!

    Fixing The Field Length Issue

    The first step in getting this fixed is opening up the Advanced page in the Flat File Connection Manager editor.

    Flat File Connection Manager: using the Advanced page to change field length.

    Then select the Description field and change its OutputColumnWidth property from 50 to 500.

    That will cause the source to generate a warning.  Remove this warning by opening and closing the source editor.  Click the Yes button in the popup that appears.

    The next step is changing the expression for the Description field in the Derived Column to this:

    (DT_STR,500,1252)Description_IN

    Indeed, the field length is one of the parameters in that cast.  The other numeric parameter is obviously the code page.

    Having done that you’ll notice that the destination will start complaining.  Of course, you’ll need to adapt the destination table to reflect the field length increase as well.  So change the table definition and open/close the destination editor to make it happy.

    Alright, let’s run the package once more!

    Finally the data flow is happy with it all and has inserted two records:

    That's more like it: all components colored green!

    And what does our table contain?  Let’s find out:

    All accents have been imported!

    That’s looking good for sure!

    Conclusion

    In this follow-up article I have demonstrated what might go wrong when you need to deal with special characters while importing flat files, and how to solve your possible issues.  In case you missed the original article, have a look through this link.

    Have fun!

    Valentino.

    References

    Wikipedia: UTF-8

    Share

  7. Building Reports With Dynamic Datasets

    On the forums I regularly encounter questions like:

    I have a report and want to show A, B, C but sometimes I want X, Y, Z and not A, B, C. How?

    Or, in other words:

    I have a parameter and based on the selection I want to return different fields in my dataset.  How can I do that?

    My initial reaction would be “I don’t think you can do that”.  But then I thought it would be useful in certain situations and decided to try it out. And guess what?  You can do that!  Here’s how.

    Just to make sure everyone is on track: this article is not about dynamicity in terms of rows returned, that would be filtered datasets and you’ll already find plenty of references on the internet.  This article is about a varying number of columns in the dataset, which is a little less straightforward.

    The database used in the examples is AdventureWorks2008R2, available at Codeplex.  And the screenshots are taken from SQL Server 2008 R2 x64 SP1.

    The Scenario

    The report we’re going to build should show a list of products sold.  But the report is used by different user groups: those who just want to see the sales numbers and those who also care about stock levels!

    So by default the report should show a list of items and number sold, but it should also be possible to render that same report whilst displaying the stock-related statistics.

    And to make it even a bit more interesting, by default the data should be ordered according to product number but in “stock level mode” the ordering should put those with the lowest current stock first.

    The Report

    Let’s get started!  The first step in creating a report is often the writing of a SELECT statement.  In this scenario we’ll be needing two of them, both in the same dataset.

    The Dataset

    Our dataset is going to need a parameter to be able to decide what type of user is requesting the report. Let’s call that parameter WithStockData, and its type will be Boolean.

    The layout of the dataset statement will be this:

    if @WithStockData = 1
    -- sales and stock data
    else
    -- regular sales data

    A simple IF statement, taking the parameter into account.  The parameter is a boolean value so when it’s True, it equals to 1.

    And here’s the full statement for the dataset:

    if @WithStockData = 1
        -- sales and stock data
        select P.ProductNumber, P.Name
            , P.SafetyStockLevel, P.ReorderPoint
            , SUM(SOD.OrderQty) SoldQuantity, SUM(I.Quantity) InventoryQuantity
            , SUM(I.Quantity) - SUM(SOD.OrderQty) CurrentStock
        from Production.Product P
        inner join Sales.SalesOrderDetail SOD on SOD.ProductID = P.ProductID
        inner join Production.ProductInventory I on I.ProductID = P.ProductID
        group by P.ProductNumber, P.Name, P.SafetyStockLevel, P.ReorderPoint
        order by SUM(I.Quantity) - SUM(SOD.OrderQty) asc
    else
        -- regular sales data
        select P.ProductNumber, P.Name
            , SUM(SOD.OrderQty) SoldQuantity
        from Production.Product P
        inner join Sales.SalesOrderDetail SOD on SOD.ProductID = P.ProductID
        group by P.ProductNumber, P.Name
        order by P.ProductNumber asc;

    Some data as returned by the SELECT in the then part:

    The sales and stock data: 7 fields in total

    And some data as returned by the query in the else part:

    The regular sales data only consists of three columns

    As you can clearly see, the first query returns seven fields while the second one contains only three.  You can also see that both results are ordered differently.

    Now, let’s get the fun started!  Create a new report, set up a data source that points to the AdventureWorks2008R2 database and create a dataset with the query above:

    Setting up the dataset

    Power tip: to create the dataset, do not right-click on the Datasets node in the Report Data pane, but right-click on the Data Source and then select Add Dataset.  That saves you some work because the Data Source will be pre-populated.  All you need to do is paste the query in the Query field and give it a decent Name.

    Click the OK button to close the Dataset Properties.

    Now open the new dataset in the Report Data pane and count its fields:

    Our dataset contains seven fields!

    You should come to seven!  How nice, all of our fields are there.  This is not always the case, but I’ll handle that later.

    The Parameters node in the Report Data pane should now contain a new parameter called @WithStockData:

    The @WithStockData parameter got created automatically

    Double-click it to get to its properties and change the Data type to Boolean (by default it’s Text).

    The WithStockData parameter properties

    If you want, you can also specify a default value.  The value should be either “true” or “false”:

    Our Boolean parameter is set to false as default

    With the dataset fully set up, let’s now move on to visualizing it.

    Displaying Dynamic Columns

    Put a Table on the report Design and set it up as follows:

    The Table contains seven columns

    All seven columns have been added, and to avoid whitespace in the table when some columns are hidden I’ve put the dynamic columns at the end.

    One step remains: even though the values won’t always be present, the columns will not disappear automatically.  To take care of that, we’ll enter an expression on the Hidden property of each column.  The expression looks like this:

    =Fields!InventoryQuantity.IsMissing

    Or visually:

    Using the IsMissing property to hide a column dynamically

    To set up the expression, click the grey area above the column title to select it and then locate the Hidden property in the Properties pane.

    Each column in a dataset has got the IsMissing property.  When its value is True, it means that the column is not present and should thus be hidden.

    And here’s what the rendered report looks like:

    Report with all columns displayed

    Once more, with the parameter set to False:

    The report showing sales data only

    The four stock-related columns are nicely hidden!  And the ordering is working as well because we’ve taken care of that in the dataset’s queries.

    Easy, huh?  Well, sure, but I’ve made sure that the process went as smoothly as possible.  It takes some knowledge on how SSRS actually works.  Let’s make this clear by adapting the example just a little.

    Understanding The Dataset

    Delete the current dataset and create a new one, using the following statement (ensure that you give it the same name as the original one):

    if @WithStockData = 0
        -- regular sales data
        select P.ProductNumber, P.Name
            , SUM(SOD.OrderQty) SoldQuantity
        from Production.Product P
        inner join Sales.SalesOrderDetail SOD on SOD.ProductID = P.ProductID
        group by P.ProductNumber, P.Name
        order by P.ProductNumber asc
    else
        -- sales and stock data
        select P.ProductNumber, P.Name
            , P.SafetyStockLevel, P.ReorderPoint
            , SUM(SOD.OrderQty) SoldQuantity, SUM(I.Quantity) InventoryQuantity
            , SUM(I.Quantity) - SUM(SOD.OrderQty) CurrentStock
        from Production.Product P
        inner join Sales.SalesOrderDetail SOD on SOD.ProductID = P.ProductID
        inner join Production.ProductInventory I on I.ProductID = P.ProductID
        group by P.ProductNumber, P.Name, P.SafetyStockLevel, P.ReorderPoint
        order by SUM(I.Quantity) - SUM(SOD.OrderQty) asc;

    The only difference with the previous version is that the IF condition is reversed and thus the two queries are swapped.

    Now render the report.  What do you see?

    An error occurred during local report processing.

    The definition of the report ‘/DynamicDataset’ is invalid.

    The Hidden expression for the text box ‘Textbox7’ refers to the field ‘InventoryQuantity’. Report item expressions can only refer to fields within the current dataset scope or, if inside an aggregate, the specified dataset scope.

    Letters in the names of fields must use the correct case.

    Oh my, it’s broken!

    Now take a good look at the available dataset fields:

    The dataset only has three columns, oh my!

    That’s right, only three!  Four of them have gone missing!  The reason for that is because SSRS uses the first SELECT query it encounters in the whole statement to determine the available fields.  It’s not able to automatically detect the different situations and create all the fields that can possibly be returned.

    One way to ensure all fields are created is to put the query that returns all possible fields as first query, which is what I initially did.  But of course that’s not always an option.

    Manually Adding Fields To A Dataset

    Luckily it’s possible to manually add fields to the dataset.  You can do this by clicking the Add button in the Fields page of the Dataset Properties and then selecting Query Field.

    Manually adding additional fields to the dataset

    So, add the four missing fields:

    The four additional fields added manually

    If you now render the report, it should behave exactly the same as in the initial version!

    What About Those Warnings?

    If you’re someone who pays attention to the Error List pane, you may have noticed some warnings.  Two for each dynamic field to be exact.

    Here’s an example of the rsMissingFieldInDataSet warning:

    Warning    1    [rsMissingFieldInDataSet] The dataset ‘dsProductSales’ contains a definition for the Field ‘SafetyStockLevel’. This field is missing from the returned result set from the data source.    C:\test\SSRS\SSRS2008\DynamicDataset.rdl    0    0

    And here’s the rsErrorReadingDataSetField warning:

    Warning    2    [rsErrorReadingDataSetField] The dataset ‘dsProductSales’ contains a definition for the Field ‘SafetyStockLevel’. The data extension returned an error during reading the field. There is no data for the field at position 4.    C:\test\SSRS\SSRS2008\DynamicDataset.rdl    0    0

    I’ve got a developer background, so I always try to remove all warnings.  So if you really want to get rid of those warnings too, even that’s possible.  But it will require some Custom Code writing.  I already covered that topic some years ago, when I wrote an article about Detecting Missing Fields.

    Conclusion

    With this article I believe to have demonstrated that datasets can be quite flexible, even though it doesn’t always seem so.

    Have fun!

    Valentino.

    Share

  8. Passing A DateTime Parameter Into The Execute SQL Task

    When you’ve used SSIS for a while, you may have run into the following situation already.  Or maybe today is your first time and that’s the reason that you’ve arrived here.

    “Huh, what’s he talking about?”, I hear you thinking.  Read on then. :-)

    The Scenario

    You’ve got a stored procedure or another SQL statement that needs to get called from the Execute SQL Task in the Control Flow of your package.  So far so good.  One of the parameters that needs to get passed into the statement is of the DateTime type.

    How would you do that?

    Parameter Mapping – Take 1

    Following the KISS principle, let’s say we’ve got the following really complex table in our database:

    create table dt ( dtVal datetime );

    And in our Execute SQL task we have this extremely complex INSERT statement:

    insert into dt values (?)

    The statement is expecting one parameter.  The parameter that I want to pass into it is System::StartTime which is of type DateTime as shown in the screenshot below.

    Show the system variables by activating the Show System Variables button

    “Hang on, how did you get the Variables window to display the system variables?”

    Ah, good question, by clicking that Show System Variables button, indicated with the red rectangle.

    So you set up the Parameter Mapping as follows, specifying DBTIMESTAMP as Data Type and zero as Parameter Name because it’s the first parameter in the statement:

    Execute SQL Task: Parameter Mapping

    Then you decide to give it a test run.  But alas, it throws you the following error:

    Error: 0xC002F210 at Execute SQL Task 1, Execute SQL Task: Executing the query “insert into dt values (?)” failed with the following error: “Invalid time format”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

    So we’ve got a datetime column in the table and we’ve got a DateTime package variable.  But alas, the Execute SQL Task is not happy with passing this value to the query.

    Now what?

    The SqlStatementSource Expression

    Let’s try another method then.  Instead of passing the parameter’s value through the Parameter Mapping page, we’ll set up an expression that constructs the whole INSERT statement, including the parameter’s value.

    Have a look at the following expression:

    "insert into dt values ('" +
    
    (DT_STR, 4, 1252) DATEPART("yyyy", @[System::StartTime]) + "-" +
    
    (DT_STR, 2, 1252) DATEPART("mm", @[System::StartTime]) + "-" +
    
    (DT_STR, 2, 1252) DATEPART("dd", @[System::StartTime]) + " " +
    
    (DT_STR, 2, 1252) DATEPART("hh", @[System::StartTime]) + ":" +
    
    (DT_STR, 2, 1252) DATEPART("mi", @[System::StartTime]) + ":" +
    
    (DT_STR, 2, 1252) DATEPART("ss", @[System::StartTime]) + "." +
    
    (DT_STR, 3, 1252) DATEPART("ms", @[System::StartTime]) + "')"

    It uses the DATEPART function to fetch parts of the System::StartTime variable and feed it into the INSERT statement using a format that works all the time (YYYY-MM-DD HH:MM:SS.MIL).  Here’s what it generated when I clicked the Evaluate Expression button in the Expression Builder:

    insert into dt values (’2011-5-31 17:59:37.0′)

    So where exactly would you specify that expression?  In the Execute SQL Task editor, open up the Expressions page.  Then click the Expressions item in the Misc list so that the button with the ellipsis appears.  Now click that button, select SqlStatementSource as property and click the Ellipsis button in the Expression field to get to the Expression Builder.

    Then you’ll end up with something like this:

    The Property Expressions Editor with an expression specified for the SqlStatementSource property

    Give the package another run.  If everything has been set up as expected, the Execute SQL Task should color green and a select on the table should give one record:

    Our test table contains one timestamp!

    Hang on, does it really have to be this complicated?

    Well, maybe not…

    Parameter Mapping – Take 2

    So let’s give the Parameter Mapping another go.

    Set up the Execute SQL Task just like in Take 1 above, with one small difference: select DATE instead of DBTIMESTAMP as Data Type for the parameter.

    Choose DATE as Data Type when passing a DateTime package variable into the Execute SQL Task

    Now give the package another run.  Look at that, it colors green and there’s an extra record in the table:

    An extra timestamp was written to the table

    DATE doesn’t seem like the most logical type to choose in this scenario, which is why most people won’t even consider it.  But it works!  Actually, “DATE” is not really the best name that could be given to this particular data type.  Here’s the description of DT_DATE (not to be confused with DT_DBDATE!) according to MSDN:

    A date structure that consists of year, month, day, hour, minute, seconds, and fractional seconds. The fractional seconds have a fixed scale of 7 digits.

    The DT_DATE data type is implemented using an 8-byte floating-point number. Days are represented by whole number increments, starting with 30 December 1899, and midnight as time zero. Hour values are expressed as the absolute value of the fractional part of the number. However, a floating point value cannot represent all real values; therefore, there are limits on the range of dates that can be presented in DT_DATE.

    On the other hand, DT_DBTIMESTAMP is represented by a structure that internally has individual fields for year, month, day, hours, minutes, seconds, and milliseconds. This data type has larger limits on ranges of the dates it can present.

    What this means is that you have to be careful when using this type.  Even though it works fine today, it may not run fine in a similar scenario that required different date ranges.  But obviously you’ve got that covered by your unit test scenarios!

    Conclusion

    In this article I have demonstrated how a DateTime package variable can be passed as parameter into the Execute SQL Task in more than one different way.  My method of preference is the one using the DATE type in the Parameter Mapping.

    Have fun!

    Valentino.

    References

    SSIS Execute SQL Task

    SSIS DatePart function

    KISS Principle

    SSIS Junkie: Datetime variables don’t always do what you expect

    Share

  9. Loading Complex XML Using SSIS

    In my previous article I showed you how the XML Source component can be used to load XML files into a SQL Server database, using fairly simple XML structures.  In this follow-up article I will demonstrate how to tackle the complex XML issue.

    The Complex XML Example

    You probably know that SSRS reports, RDLs, are actually XML files.  And they’re not the easiest types of XML files around.  To humans they are still readable but the structure can be quite complex.  So there we’ve got our example: an RDL.  More specifically I’ll be using the RDL that’s available for download in one of my earlier articles.

    The Goal

    Every good example has got a goal.  Our goal today is to retrieve a list of datasets and fields as defined in the RDL.  Shouldn’t be too difficult, right?

    Using The XML Source Component

    Let’s try to get this done through the XML Source component with which we’re very familiar by now.  You know the drill: drag an XML Source into your Data Flow, open it up and configure the XML and XSD locations.

    Note: to be able to do this I cheated a bit by manually manipulating the RDL a little.  More precisely I removed all the namespace references from the <report> tag and further down the XML (removed “rd:”).

    With both files configured, let’s have a look at the Columns page:

    The XML Source component handling a really complex XML file

    Look at that massive list of output flows!  In total I’ve gotten 45 of them, all for free!  Even if you’re up to the task of creating 45 output tables, do you really want to find out how to get these joined together?  To prevent creating that bunch of tables you may consider using the Merge Join component… 45 times in your data flow. Didn’t think so!

    Sure, it would run fine if you manage to get it all constructed.  But in my opinion this is just too silly to try out because there’s an interesting alternative.

    And that alternative is XSLT – eXtensible Stylesheet Language Transformations.

    Using XSLT

    With XSLT you describe what you want to retrieve from the XML document and what it should look like.  In this example we’ll be retrieving the list of datasets and their fields, in CSV format.  CSV stands for Comma-Separated Values, although I prefer the term “Character-Separated Values” as the separator is not always a comma.

    To be able to write correct XSLT, you need to know what the XML structure looks like.  Here are the first 31 lines of the sample RDL file mentioned earlier.

    <?xml version="1.0" encoding="utf-8"?>
    <Report>
      <AutoRefresh>0</AutoRefresh>
      <InitialPageName>A Very Unique Name</InitialPageName>
      <DataSources>
        <DataSource Name="srcContosoDW">
          <DataSourceReference>ContosoDW</DataSourceReference>
          <SecurityType>None</SecurityType>
          <DataSourceID>b7a3d32c-e95d-4acf-bb99-9d60755303ea</DataSourceID>
        </DataSource>
      </DataSources>
      <DataSets>
        <DataSet Name="dsProductList">
          <Query>
            <DataSourceName>srcContosoDW</DataSourceName>
            <CommandText>select DPC.ProductCategoryName, DPS.ProductSubcategoryName, DP.ProductName
    from dbo.DimProduct DP
    inner join dbo.DimProductSubcategory DPS
        on DPS.ProductSubcategoryKey = DP.ProductSubcategoryKey
    inner join dbo.DimProductCategory DPC
        on DPC.ProductCategoryKey = DPS.ProductCategoryKey;</CommandText>
          </Query>
          <Fields>
            <Field Name="ProductCategoryName">
              <DataField>ProductCategoryName</DataField>
              <TypeName>System.String</TypeName>
            </Field>
            <Field Name="ProductSubcategoryName">
              <DataField>ProductSubcategoryName</DataField>
              <TypeName>System.String</TypeName>
            </Field>

    As you can see, the main node is called Report.  Nested under Report we’ve got DataSets, which can have several DataSet elements.  Each DataSet has a set of Fields with one or more Field elements.  Using that information we come to the following XSLT.

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
      <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
      <xsl:template match="/">
        <xsl:text>DataSource;DataSet;Field</xsl:text>
        <xsl:text>&#13;&#10;</xsl:text>
    
        <xsl:for-each select="Report/DataSets/DataSet/Fields/Field">
          <xsl:text>"</xsl:text>
          <xsl:value-of select="../../Query/DataSourceName"/>
          <xsl:text>";"</xsl:text>
          <xsl:value-of select="../../@Name"/>
          <xsl:text>";"</xsl:text>
          <xsl:value-of select="@Name"/>
          <xsl:text>"</xsl:text>
          <xsl:text>&#13;&#10;</xsl:text>
        </xsl:for-each>
    
      </xsl:template>
    </xsl:stylesheet>

    So, what is the XSLT describing?  On line three, we say that the output should be text in UTF-8 encoding.  The “template match” on the fourth line takes the whole XML document into consideration, hence the forward slash.  Then on line five we start writing output through the xsl:text tag.  This is our header line.  As you can see we’re using the semi-colon as column separator in the CSV output.  Line six adds a CRLF (carriage-return + line feed) to the output.

    Then the fun part starts.  If you have experience with XPath, the way XSLT walks through the XML document should look familiar to you.

    The xsl:for-each tag loops over all the Fields in all the DataSets in the document.

    Using the xsl:value-of tag, we can fetch values out of the XML.  The first value being retrieved is the name of the data source that dataset is using.  (I’ve added the retrieval of the data source to demonstrate how element values are retrieved.)  The path to the DataSourceName element is Report/DataSets/DataSet/Query/ so we use the double-dot syntax to navigate two levels up in the XML tree.  The value of the element itself is retrieved by just using its name, as demonstrated in the XSLT above.

    The next value-of tag retrieves the Name attribute of the DataSet, hence the two levels up, and the final value-of fetches the Name attribute of the Field element.

    Now that the XSLT is clear for everyone, how do we apply it to our XML document?  Here comes the time for SSIS once more!

    Open up the BIDS with the Control Flow of an SSIS package active and throw in an XML Task component.

    The XML Task, one of the Control Flow Items in Integration Services

    Double-click the component to open up the XML Task Editor.  This is what it looks like by default:

    XML Task Editor: default settings

    As this is an all-round XML task that can handle several XML-related tasks, the first setting that we need to modify is called OperationType.  That’s not too complicated because it comes with a dropdown and XSLT is one of the possible values.

    The different operation types supported by the XML Task

    With XSLT selected, the editor transforms into the following:

    The XML Task Editor with XSLT as OperationType

    Now we need to configure where the task can find our XML file, through the Source property.  Click the Source textbox to make the dropdown appear and select <New File connection…>.

    You can create a new File Connection through the XML Task Editor

    In the File Connection Manager Editor, leave the Usage type at Existing file and select the RDL.

    Next up we’re going to specify where the task can find the XSLT that needs to be applied to the XML.  That can be done through the Second Operand settings.  As SecondOperandType, select File Connection.  Use the dropdown of the SecondOperand property to create a second new file connection that points to your XSLT file.

    With that set up as well, only one step remains.  The task still doesn’t know where the output should be saved.  Or that it actually should get saved.  So first switch the SaveOperationResult property to True.  As you can see, DestinationType is already set to File Connection, that’s what we need.  Use the dropdown of the Destination property to create a third new file connection.  This time however, Usage Type should be set to Create File.  Specify path and filename for the output file and click OK to close the File Connection Manager Editor.

    This is what our XML Task now looks like in the editor:

    The XML Task Editor with all input and output files specified, as expected for our XSLT experiment

    As shown above, I’ve called the output file DatasetInfo.csv.

    One more property that can be interesting is the OverwriteDestination property.  Setting it to True can ease the testing of your package if you need to execute it multiple times.  Which you’ll probably want when your XSLT is not giving the expected output.  Don’t forget to set it to False afterwards (depending on what behavior you actually expect from your package).

    Okay, now close the XML Task Editor and execute the package.  If you haven’t made any mistakes, the task should color green and you should have an extra file on your hard drive somewhere.  Here’s what the content of my DatasetInfo.csv looks like:

    DataSource;DataSet;Field

    “srcContosoDW”;”dsProductList”;”ProductCategoryName”

    “srcContosoDW”;”dsProductList”;”ProductSubcategoryName”

    “srcContosoDW”;”dsProductList”;”ProductName”

    “srcContosoDW”;”dsProductList”;”ProductCategoryColor”

    “srcContosoDW”;”dsProductList”;”EasterEgg”

    Look at that, a list of fields, all part of the dsProductList dataset.

    “Hang on, wasn’t this article going to demonstrate how to get complex XML files imported into our database?  And now you’re writing the data to a file?!”

    Well yeah, you’re right.  Unfortunately the XML Task does not offer the possibility to write to a table in a database.  So to get the data imported into your database you’ll need to set up a Data Flow that imports the CSV files.  But that shouldn’t be too difficult to achieve, right?

    Mission accomplished!

    Conclusion

    With this article I have shown how Integration Services can be used to retrieve data out of complex XML files, without actually using the XML Source component.  I hope you’ve enjoyed reading it as much as I had while writing.  Or maybe you know another interesting method to get complex XML imported.  Feel free to post comments!

    Have fun!

    Valentino.

    References

    XSLT (Wikipedia)

    CSV (Wikipedia)

    XML Task (MDSN)

    Share

  10. Loading XML Using SSIS

    SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.)

    But how far can you go?  When does the XML Source component become unusable?  Let’s find out!

    To create the examples I’m using the following SQL Server version:

    Microsoft SQL Server 2008 R2 (RTM) – 10.50.1600.1 (X64)   Apr  2 2010 15:48:46   Copyright (c) Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )

    Basic Example

    This first example is a really simple XML file containing a list of colors with their corresponding RGB code.

    <colors>
      <color RGB="FF0000">Red</color>
      <color RGB="00FF00">Green</color>
      <color RGB="0000FF">Blue</color>
      <color RGB="FFFFFF">White</color>
      <color RGB="000000">Black</color>
    </colors>

    Let’s import this into a database.  Open up the BIDS, create an SSIS project and throw a Data Flow Task into the package and open it up.

    The component that we’re now most interested in is the XML Source, one of the components in the Data Flow Sources category in the Toolbox.

    The XML Source component

    Add one of those to your Data Flow and double-click it to open up the XML Source Editor.

    The Data Access Mode should be set to XML file location, which is the default setting.  The other options are XML file from variable – useful if you’ve got the file path and name of the XML file in a variable – and XML data from variable – interesting if your XML data is actually stored in a variable.

    As XML Location, select the .xml file.  Our XML sample does not have an inline schema, so we can’t use that checkbox.  And we can’t click the OK button either, it’s grayed out.  The source component really expects a description of the XML structure before the editor can be closed.

    The bottom of the screen even shows a warning with the following message:

    XML Schema (XSD) is not specified. Select an existing XSD or click Generate XSD to create an XSD from the XML file.

    So, what are you waiting for,  Click the Generate XSD button to let the XML Source Editor generate the XSD schema for us.  Real easy, right?

    Remember where you save the file, and when it’s generated, select the .xsd file in the XSD location textbox.  As you can see, the OK button will become available.  But don’t click it just yet.

    Here’s what the XML Source Editor now looks like:

    XML Source Editor with an XML and XSD file specified

    Let’s now move on to the second page of the XML Source Editor, called Columns.  When you open it, you’ll receive the following popup with a couple of warnings:

    Warning gets displayed when opening the Columns page

    The editor is letting us know that the columns that are being generated do not have a maximum length specified.  So it’s setting them to Unicode (DT_WSTR) with a length of 255.  Click the OK button to get rid of that message and to be able to see the generated columns.

    Note: if your data elements or attributes may contain longer strings then you should have a look at modifying the length specification.  This can be done through the Advanced Editor, which is opened by right-clicking the XML Source.  The Input and Output Properties page is the one you’re after.

    The Columns page, showing the columns that the XML Source generated for us

    As you can see, our only attribute – RGB, is nicely put in a column with the same name.  The value of each <color> node however is not put in a column called Color.  By default, this value is put into a column called “text”.  Which is a weird name for a column in an SSIS data flow if you ask me.  The good thing is that you can just rename it by changing the Output Column value.

    Let’s test this out.  My favorite way is to add a Multicast component to the Data Flow, then add a Data Viewer on the connector (right-click the green arrow, select Data Viewers, click Add > OK > OK).  Now execute the package to get this result:

    Testing the output of the XML Source through the Data Viewer

    Mission accomplished, we’ve retrieved data from a very basic XML file!

    Adding Some Complexity

    Let’s move on to the second example of this article.  The difference with the previous example is that now we’ve got multiple nested structures to deal with.

    The example represents a list of book reviews, including some details on the books themselves.  A book can have multiple writers and obviously multiple reviews as well.

    <books>
        <book pages="300">
            <title>Microsoft SQL Server 2008 R2 Master Data Services</title>
            <category>Information Technology</category>
            <authors>
                <author>Jeremy Kashel</author>
                <author>Tim Kent</author>
                <author>Martyn Bullerwell</author>
            </authors>
            <reviews>
                <review>If you're looking for an excellent book on the new Master Data Services component of SQL Server 2008 R2, definitely check this one out!  To be released in June 2011 by Packt Publishing!</review>
            </reviews>
        </book>
        <book pages="832">
            <title>Inside Microsoft SQL Server 2008: T-SQL Querying</title>
            <category>Information Technology</category>
            <authors>
          <author>Itzik Ben-gan</author>
          <author>Lubor Kollar</author>
          <author>Dejan Sarka</author>
          <author>Steve Kass</author>
            </authors>
            <reviews>
                <review>Every "Inside SQL Server" book can be recommended, especially when written by Itzik!</review>
            </reviews>
        </book>
        <book pages="1137">
            <title>The Lord of the Rings</title>
            <category>Fantasy</category>
            <authors>
                <author>J.R.R. Tolkien</author>
            </authors>
            <reviews>
                <review>Like fantasy?  What are you waiting for then?  It's a classic!</review>
                <review>If you liked the movie, you'll love the book.</review>
            </reviews>
        </book>
    </books>

    Configure an XML Source so that it uses the books.xml file, generate the XSD and specify its location.  I’m not going into details on that, the procedure is the same as in our first example above.

    Now open up the Columns page to have a closer look at how the XML data is going to get imported.

    XML Source generates multiple=

    So how does the XML Source component deal with the multiple nested structures?  It generates multiple outputs!  If you select another output from that dropdown, you get to see its fields.

    To get a clear understanding of what exactly is going on, let’s connect each output with an OLE DB Destination component.  The target table can be generated based on the incoming fields by clicking the New button.  Replace the table name in the generated CREATE TABLE script with a clear one that fulfills your naming convention requirements – such as NO SPACES IN A TABLE NAME for instance – and hit the OK button.

    Destinatio table can be generated by using the New button in the OLE DB Destination Editor

    Now that the table is created, it will be automatically selected in the Name of the table or the view dropdown.  Don’t forget to visit the Mappings page so that the, well, mappings are created.  If no field names were modified in the CREATE TABLE script then all fields should be mapped automatically based on their names.

    With all five destinations added, execute the package.

    Each XML Source output is connected to an OLE DB Destination - executes fine

    So now we’ve loaded the data from our XML file into a database, but the data is spread over five tables.  How do we retrieve that data?  Join them together!

    select * from XML_book
    inner join XML_authors on XML_authors.book_Id = XML_book.book_Id
    inner join XML_author on XML_author.authors_Id = XML_authors.authors_Id
    inner join XML_reviews on XML_reviews.book_Id = XML_book.book_Id
    inner join XML_review on XML_review.reviews_Id = XML_reviews.reviews_Id;

    And the result looks like this:

    XML data imported into the SQL Server database

    Conclusion

    We have managed to flatten the data from an XML file containing multiple nested repeating nodes, nice huh?  But do you also feel the limitation using this method?  In terms of modern XML, this was still a fairly easy XML file and yet we already needed five tables to store the data.  Can you imagine what this will give with a really complex file?

    Watch this blog for the follow-up article where I will try to import data from a really complex XML file!

    Have fun!

    Valentino.

    References

    XML Source component

    Share

  1. 1
  2. Next ›
  3. Last »