DBPedias

Your Database Knowledge Community

Chris Webb

  1. Accumulating Data In An Excel Table Using Data Explorer and PowerPivot

    One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

    Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

    Product,Sales
    Apples,1
    Oranges,2

    It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

    image

     

    Here’s the code that Data Explorer generates:

    let

        Source = Csv.Document(File.Contents("C:\InputData.csv")),

        FirstRowAsHeader = Table.PromoteHeaders(Source),

        ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                                  {{"Product", type text}, {"Sales", type number}})

    in

        ChangedType

     

    Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

    Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow())

    Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

    image

    Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

    image

    image

    At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

    Excel.CurrentWorkbook(){[Name="ExistingData"]}[Content]

    Then to make sure the Load Date is treated as a DateTimeZone type:

    Table.TransformColumnTypes(GetExistingData,{{"Load Date", type datetimezone}})

    Then finally to combine the two tables:

    Table.Combine({ChangedType1,InsertedCustom})

    Now, whenever you Refresh your Data Explorer query, you will see the data from the csv file appended to the data that has already been loaded:

    image

    image

    Here’s the complete code:

    let

        Source = Csv.Document(File.Contents("C:\InputData.csv")),

        FirstRowAsHeader = Table.PromoteHeaders(Source),

        ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

                      {{"Product", type text}, {"Sales", type number}}),

        InsertedCustom = Table.AddColumn(ChangedType, "Load Date", each DateTimeZone.UtcNow()),

        Custom1 = Excel.CurrentWorkbook(){[Name="Table_Input_Data"]}[Content],

        ChangedType1 = Table.TransformColumnTypes(Custom1,{{"Load Date", type datetimezone}}),

        Custom2 = Table.Combine({ChangedType1,InsertedCustom})

    in

        Custom2

    Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

    You can download the sample workbook here, and the csv file here.


  2. UseRelationship() and Tabular Row Security

    Quick summary: DAX measures in SSAS Tabular that use the UseRelationship() function return an error when row security is applied to a table. I’m surprised this hasn’t been documented somewhere – I know Marco came across it some time ago, but I ran into it again recently so I thought I’d mention it.

    Consider the following simple SSAS Tabular model, based on Adventure Works DW:

    image

    There’s an active relationship between DateKey and OrderDateKey, and an inactive relationship between DateKey and ShipDateKey. The following measure returns the sum of Sales Amount and activates the inactive relationship:

    Sales Amount by Ship Date:=
    CALCULATE(SUM([SalesAmount]), USERELATIONSHIP(FactInternetSales[ShipDateKey], DimDate[DateKey]))

    image

    However, when there’s row-level security defined on the DimDate table (though not FactInternetSales) you will see an error for this measure when you browse the model:

    image

    image

    ERROR – CALCULATION ABORTED: USERELATIONSHIP function cannot be used while querying table ‘FactInternetSales’ because of the row level security defined on table ‘DimDate’.

    No workaround, I’m afraid, but this isn’t a bug, it’s a known limitation.


  3. SQLBits XI Summary

    So, another SQLBits is over. After the London event last year, we (ie the SQLBits Committee, which I’m a member of) decided to scale things back a bit and return to a more manageable, friendly size, and to concentrate more on making the conference fun to attend. That’s not to say we didn’t want to maintain our high standards regarding content – and yet again we had some great sessions from world-class speakers – but a conference isn’t just about the presentations, it’s also about networking, meeting people face-to-face that you’ve only had contact with online, and having a few beers to facilitate this. As SQL Server professionals we’re a lot better off as part of a wider community: in terms of our technical knowledge, in terms of who we know to ask for help when we hit a problem, in terms of finding our next job, and in many other ways. I hope SQLBits does its bit to help build that community.

    You can see what people are saying about SQLBits by following @SQLBits and searching for the #SQLBits hashtag on Twitter, and liking the SQLBits Facebook page; there are some eye-popping photos there, not to mention a video of my performance in the pie-eating competition. If you were there and you’ve got more photos and videos, please share them!

    It only remains for me to thank the rest of the committee, Simon, Martin, JRJ, Darren, Chris T-O, Tim and Allan; our team of helpers, ably led by Annette; Helen, for her work on the party and merchandising; our sponsors; our speakers; and of course everyone who attended and made this the best SQLBits so far. I know I always say that, but it really is true.


  4. PowerPivot Workbook Size Optimizer

    Browsing through my RSS feeds this morning, I saw a new download on the Microsoft site: an Excel addin (Excel 2013 only, I think) called the PowerPivot Workbook Size Optimizer. You can get it here:
    http://www.microsoft.com/en-us/download/details.aspx?id=38793

    Here’s the blurb from the site:

    The Workbook Size optimizer for Excel can better compress data inside workbooks that use PowerPivot or PowerView if this data comes from external data sources. The best size compression can be achieved for workbooks based on SQL Server databases and there are a few tricks we can do for other SQL datasources as well. The optimizer will install as an add in to excel and will provide you with a nice wizard to better compress the size of your workbook. Using the optimizer you can often get more than 1,000,000 rows datasets in a workbook under 10 MB, share it in SharePointOnline and interact withit using the Excel Web App in any browser.

    Here’s a screenshot:

    image

    Despite a testing a few models with data from Adventure Works I couldn’t get it to suggest any changes (it didn’t spot that I had imported a column containing binary data, hmmm) but I guess it needs more testing on larger/more diverse data sources. Maybe there’s a blog post coming from the PowerPivot team coming soon explaining how to use this?


  5. SSAS on Windows Azure Virtual Machines

    You may have already seen the announcement about Windows Azure Virtual Machines today; what isn’t immediately clear (thanks to Teo Lachev for the link) is that Analysis Services 2012 Multidimensional and Reporting Services are installed on the new SQL Server images. For more details, see:
    http://msdn.microsoft.com/en-us/library/jj992719.aspx

    SSAS 2012 Tabular is also supported but not initially installed.


  6. Applying a Function to Every Cell in a Table in Data Explorer

    Now that the PASS Business Analytics Conference is over, I can get back to playing around with Data Explorer and blogging about it. I’ve been working on a fun demo that I’ll try to blog about later this week, but in the course of creating this demo I came across a technique that I didn’t end up using but which I thought deserved a post on its own: how to apply a function to every cell in a table, rather than just every cell in a column.

    For example, let’s imagine that you have a table that looks like this:

    image

    …and you want to add one to every single cell in the table, so you get:

    image

    It’s possible in the UI by creating lots of custom columns and then deleting the original columns, for sure, but I found a more elegant solution. Here’s the full DE code:

    let

        Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],

        FunctionToApply = (x) =>; x + 1,

        GetColumnNames = Table.ColumnNames(Source),

        TransformList = List.Transform(GetColumnNames, each {_ , FunctionToApply}),

        Output = Table.TransformColumns(Source, TransformList)

    in

        Output

    Here’s what each step does:

    • Source: gets the data from the Excel table named Input

      image

    • FunctionToApply: defines a new function that takes a single parameter, x, and returns the value x+1. See this post for more details on using function in Data Explorer.image
    • GetColumnNames: returns a list object which contains the names of all of the columns in the table returned in the Source step.

      image

    • TransformList: this is the interesting step! It creates a new list based on GetColumnNames, but whereas GetColumnNames contains just one record per item in the list (the column name), this returns a list of lists, each with two items: the column name and a reference to the function FunctionToApply. It looks like this in the UI:

      imageIf you click on the first of the list links shown (and this isn’t part of the process, I’m only doing this to show what’s there) you see the following:

      image

    • Output: the list of lists created in the previous step can now be passed to the Table.TransformColumns() function to apply the function FunctionToApply() to every cell in every column in the table.

      image

    I’m sure this is going to be useful to me at some point in the future… You can download the sample workbook here.


  7. PASS Business Analytics Conference Summary

    The PASS Business Analytics Conference in Chicago finished yesterday, and because I was there and because I did a fair amount of cheerleading for it (see here and here for example) I wanted to post a few thoughts on how it went.

    I’ll be honest and say that I had a few worries before the event. Would anyone want to go? Would the sessions be a repeat of what get on the BI tracks at the PASS Summit and hundreds of other SQL Server conferences? In fact, everything went really, really well. Some of the sessions were quite sparsely attended (though this had nothing to do with the quality of the content – some of the best sessions didn’t get much of an audience) but overall there was a very respectable number of people (1200ish?). I had as many people in my session on OData as I’d get at any other large conference, and it was standing room only in at least one of Marco’s sessions. I also rather liked the fact that it was smaller than the Summit – it made it much easier to meet all the people I wanted to meet. If it carries on for a few years it could easily attract a much larger number of people.

    Regarding the content I was particularly pleased because a lot of the topics I’d asked for turned up on the schedule. In fact one thing that struck me (and a few other people said the same thing to me as well) was that this was the first conference I’d been to in a long time where there were sessions that I really wanted to see in every time slot. My favourite session of the whole conference was Marc Smith on NodeXL; anyone that reads my blog knows I’ve been a big fan of NodeXL for a long time, but I learned a lot from this session because it concentrated on the basics of social network analysis rather than the tool itself. This was a prime example of the kind of topic that you simply wouldn’t get at a regular SQL Server conference – it was a business analytics session. Even the more technical presentations, such as the one on HPC Services for Excel, was outside the usual boundaries of SQL Server BI. Incidentally, I must get round to playing with HPC Services for Excel – you could use it to parallelise some DAX calculations, or even to batch process large numbers of PowerPivot models on desktop machines overnight…

    So, in summary, the conference was a big success and I had a great time. I’ll definitely be going back next year. And did I mention that I got to meet Steven Levitt of Freakonomics fame?


  8. GeoFlow Public Preview Available

    First big news from the PASS BA Conference: the public preview for GeoFlow is now available. You can download it here:
    http://www.microsoft.com/en-us/download/details.aspx?id=38395

    Here are the official announcements with all the details:
    http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/11/day-2-pass-business-analytics-conference-new-3d-mapping-analytics-tool-for-excel.aspx
    http://blogs.office.com/b/microsoft-excel/archive/2013/04/11/dallas-utilities-electricity-seasonal-use-simulation-with-geoflow-preview-and-powerview.aspx

    GeoFlow is an addin for Excel 2013 that allows you to visualise your data on a 3D map, to zoom in and explore that data, and record ‘tours’ of this data. It’s a lot of fun! As a taster, here’s a screenshot of a visualisation showing English secondary schools exam results data (average A-Level point score per pupil) broken down by school gender of entry:

    image

    UPDATE: one other thing I have to mention is that when this was announced in the keynote at the PASS BA Conference this morning, Amir Netz did an absolutely astounding demo showing GeoFlow’s touch-based capabilities running on a massive Perceptive Pixel screen (I think it was this one: http://www.perceptivepixel.com/products/82-lcd-multi-touch-display). It was possibly the most impressive demo I’ve seen of any Microsoft BI product. Anyway, I got to play on it myself later and it was as cool as it looked. If you’ve got $80000 burning a hole in your pocket then you could do worse than invest in one of these babies.


  9. LightSwitch and Self-Service BI

    Visual Studio LightSwitch has been on my list of Things To Check Out When I Have Time for a while now; my upcoming session on the uses of OData feeds for BI at the PASS BA Conference (which will be a lot more exciting than it sounds – lots of cool demos – please come!) has forced me to sit down and take a proper look at it. I have to say I’ve been very impressed with it. It makes it very, very easy for people with limited coding skills like me to create data-driven line-of-business applications, the kind that are traditionally built with Access. Check out Beth Massi’s excellent series of blog posts for a good introduction to how it works.

    How does LightSwitch relate to self-service BI though? The key thing here is that aside from its application-building functionality, LightSwitch 2012 automatically publishes all the data you pull into it as OData feeds; it also allows you to create parameterisable queries on that data, which are also automatically published as OData. Moreover, you can publish a LightSwitch app that does only this – it has no UI, it just acts as an OData service.

    This is important for self-service BI in two ways:

    • First of all, when you’re a developer building an app and need to provide some kind of reporting functionality, letting your end users connect direct to the underlying database can cause all kinds of problems. For example, if you have application level security, this will be bypassed if all reporting is done from the underlying database; it makes much more sense for the reporting data to come from the app itself, and LightSwitch of course does this out of the box with its OData feeds. I came across a great post by Paul van Bladel the other day that sums up these arguments much better than I ever could, so I suggest you check it out.
    • Secondly, as a BI Pro setting up a self-service BI environment, you have to solve the problem of managing the supply of data to your end users. For example, you have a PowerPivot user that needs sales data aggregated to the day level, but only for the most recent week, plus a few other dimension tables to with it, but who can’t write the necessary SQL themselves. You could write the SQL for them but once that SQL is embedded in PowerPivot it becomes very difficult to maintain – you would want to keep as much of the complexity out of PowerPivot as possible.  You could set up something in the source database – maybe a series of views – that acts as a data supply layer for your end users. But what if you don’t have sufficient permissions on the source database to go in and create the objects you need? What if your source data isn’t actually in a database, but consists of other data feeds (not very likely today, I concede, but it might be in the future)? What if you’re leaving the project and need to set up a data supply layer that can be administered by some only-slightly-more-technical-than-the-rest power user? LightSwitch has an important role to play here too I think: it makes it very extremely easy to create feeds for specific reporting scenarios, and to apply security to those feeds, without any specialist database, .NET coding or SQL knowledge.

    These are just thoughts at this stage – as I said, I’m going to do some demos of this in my session at the PASS BA Conference, and I’ll turn these demos into blog posts after that. I haven’t used LightSwitch as a data provisioning layer in the real world, and if I ever do I’m sure that will spur me into writing about it too. In the meantime, I’d be interested in hearing your feedback on this…


  1. 1
  2. Next ›
  3. Last »