DBPedias

Your Database Knowledge Community

Rod Colledge

  1. See you in Chicago

    The SQL Server Community, specifically PASS, has provided me with enormous opportunities for professional growth, and so I always look to give back in whatever way I can. In the last 5 years, my focus has been exclusively on Business Intelligence, so when I heard about the PASS Business Analytics conference being held in April in Chicago, I knew this was something I wanted to be a part of. I submitted an abstract, and was selected to speak, so in April, I’ll be making the 18 hour trip to Chicago with one of my StrataDB colleagues, Mr PopBI, aka Peter O’Gorman.

    What immediately struck me about the conference is that it uses the phrase “Business Analytics”, instead of “Business Intelligence”. “Business Intelligence” is used almost universally these days as a blanket phrase for all sorts of things. So I did some research on the difference between the two terms; Intelligence is a synonym for “Aptitude”, “Clever”, and “Brain Power”, all of which are fairly generic terms which could really mean whatever you want them to mean. “Analytics”, on the other hand, is a synonym for “Investigation”, “Scrutiny” and “Breakdown” which I believe to be much more of a descriptive term for the work we as BI professionals are engaged in.

    Before you think I’ve set off on a self-indulgent grammar excursion, this is a really important distinction, because what we as BI professionals do, or what we should do, is provide business users with a platform for information discovery - Investigation, Scrutiny and Breakdown. It’s really important to understand that the sole reason we exist as IT professionals is not just to make lots of money and drink beer, but to support the business. At the end of the day, they’re the reason why we’re here, and if we forget that, they’ll forget about us. So with this in mind, my session, Self-Service Business Analytics in 2013, is about how we as IT professionals can assist the business using the Microsoft BI platform.

    Including the term “Self-Service” in my title was always going to open me up to criticism from my colleagues. Take this blog post for example, Self-Service Business Intelligence: It’s Wrong, Bad and Shouldn’t be Anyone’s Goal – Wow, that’s a broad sweeping statement! Read the post, however, and the core point is that the business cannot (effectively) perform BI without IT professionals, but the reverse is also true; IT professionals cannot perform BI without the business. This is something that Microsoft has long recognised; effective BI systems are those that combine the traditional strengths of Corporate BI (Data Quality, Security, Governance and Performance) with the flexibility and agility of Self-Service BI. Microsoft’s term for this is Managed Self-Service BI.

    My session will explore the awesome benefits of Managed Self-Service BI – how we, as IT professionals, can work with the business to achieve truly meaningful business outcomes. It is, after all, the whole reason for our existence.

    If you’re a BI professional or a business user with a keen interest in analytics, this is the conference to attend in 2013. I’m really pumped about this one, and I’d love to see you there - I might even shout you an (Australian) beer!

    PASS_BAC

  2. Using Policy-based Management to Check Sysadmin Membership

    I was sent an email recently from a reader of my simple-talk article Policy-based Management and Central Management Servers asking about the usage of the @WindowsUsersAndGroupsInSysadminRole property to detect the existence of sysadmins outside a known base; in other words, a scheduled check to see if unauthorised users had gained access to the sysadmin role.

    As much as I love Policy-based Management, the documentation and error messages leave a little to be desired, so it wasn't immediately obvious how to use this property to achieve the desired result.

    Here's the solution...

    The trick is using the Array function as shown below. "Field" is simply @WindowsUsersAndGroupsInSysadminRole, but "Value" uses the Array function to convert a CSV list of domain accounts (AD groups or users) into an array data type for comparison with the array data type returned by the @WindowsUsersAndGroupsInSysadminRole function.

    Condition

    On my StrataDB development environment, the results of this policy execution is shown below;

    Results

    The policy failed because the actual list (array) of sysadmin accounts includes more than the expected list (array).

    Hope this helps.

    Cheers,
    Rod.

  3. Unrelated Dimensions

    What’s wrong with this picture?

    UnrelatedDimensionSlicing

    I’ve browsed the AdventureWorks cube, and sliced Internet Sales by the Employee Department hierarchy. The same dollar amount ($29,358,677.22) is showing for every employee, which is obviously wrong and the sort of thing that confuses everyone who uses the cube.

    The issue here is that the Employee Department hierarchy is not related to the Internet Sales Amount measure, and by default, the measure’s ALL member (grand total) is shown for any unrelated dimensions.

    As a developer, it’s easy to dismiss this as a non-issue; it doesn’t make any sense to slice internet sales by staff member, however, this is the sort of thing that makes business people nervous about the quality of the BI system, and whether or not they trust it.

    There’s two ways around this type of problem; Perspectives and IgnoreUnrelatedDimensions.

    Perspectives, an Enterprise Edition feature, allows grouping together measures and their related dimensions. In the above example, there’s a “Direct Sales” perspective which excludes the Employee dimension and includes the Internet Sales Amount measure, therefore avoiding this type of invalid cube usage.

    The other option is IgnoreUnrelatedDimensions, a property of a measure group, as shown here;

      IgnoreUnrelatedDimensions

    The default value for this is True, leading to the behaviour in the first image. By setting this to False, as shown here, unrelated dimension slicing is prevented; if attempted, the measure value will be blank, and no dimension members will show.

    More details on this property can be found in this blog post from Hilmar Buchta.

    Cheers,
    Rod.

  4. SSAS: Dynamic Security, Kerberos and PerformancePoint

    Richard Lees recently blogged SSAS Dynamic Security. It reminded me that I had an outstanding blog post of my own on the same topic, with a slightly different twist involving PerformancePoint and Kerberos. Before we get to the twist, let’s recap on dynamic security within SQL Server Analysis Services (SSAS).

    The concept behind SSAS dynamic security is quite simple. Like SQL Server itself, SSAS uses role based security as a means to restrict the user’s access to various parts of the cube. A commonly cited example is restricting sales people from a particular sales region dimensioning by other sales regions e.g.; Australian sales people should not be able to view sales figures for Canada.

    Using an SSAS role, we can implement this quite easily. As shown below, I’ve created a new role, and on the Dimension Data page, restricted the Sales Territory Country to Australia. On the Membership page (not shown), I would then select the appropriate Windows login(s), and those users would then be restricted from dimensioning by countries other than Australia.

    CreateRole

     

    So far so good. Nice and easy. But, let’s imagine a situation in which there were thousands of sales territories. Creating a role for each one of them (and then managing membership over time) is simply not feasible.

    Enter dynamic security. In essence, we use a many to many structure which maintains the mappings between users and allowed dimension members in tables. Using the AdventureWorks sample, we would create a table structure like this ...

    Schema

     

    The UserTerritoryBridge table simply maps users (Windows accounts) to Sales Territories. In a typical BI implementation, this data would be maintained by an ETL process from the source system that defines such mappings. Note that the structure allows for 1 user to be mapped to many territories and vice versa. A true "many to many" relationship.

    The next step is defining the usage of these tables inside Analysis Services. In summary, we create a dimension using the dimUser table, and a measure group using the UserTerritoryBridge table. Such a measure group is commonly called a “factless fact table”. In both cases, we make their measures and attributes hidden; we don’t want users seeing them, they’re meaningless and there for security purposes only. Once defined, the cube structure looks like this ...

     

    Cubestructure

     

    ... and the dimension usage looks like this ...

     

    Dimensionusage

     

    With those structures in place, we can go back to our role, and modify it to be dynamic, i.e.; instead of creating a role per user/group, we now have a single role which handles ALL users. We do that by first selecting “Deselect all members” on the Basic tab, and then on the advanced tab use an MDX expression as follows ...

     

    Dynamicrole

     

    This expression uses the UserName() function to determine the current user, and uses that to filter out the Sales Territory Country members the user is not setup to view. Note the “Enable Visual Totals” checkbox down the bottom. This ensures the user can only see measure totals for the dimension members they can see. Without that selected, they will see totals for all countries, including those they cannot see when browsing the cube.

    So that’s a basic implementation of SSAS dynamic security. Now comes the twist (with triple pike).

    Consider an implementation of a PerformancePoint dashboard through SharePoint 2010 which uses the cube as a data source i.e.; a user browsing the cube through a PerformancePoint dashboard. In terms of user authentication/delegation, we have;

    1. The user logs onto Windows,
    2. The user connects to SharePoint,
    3. SharePoint launches PerformancePoint services to invoke a Dashboard,
    4. The Dashboard connects to Analysis Services

    This is Kerberos heaven (or hell). There’s a lot of hopping there. More hops than a Kangaroo on a caffeine binge. There’s a lot that can go wrong, particularly when you add an ISA/Proxy server to the mix.

    From a PerformancePoint perspective, there are now 3 authentication options for data sources;

     

    Pps

     

    Unattended Service Account (first option) is fairly straight forward – All users connect to the data source using the security context of the defined account. Simple, but not granular enough for custom security.

    Per-User Identity (third option) solves this problem, assuming you have a working Kerberos environment.

    An additional option is listed in the middle; “Unattended Service Account and add authenticated user name in connection string”.  As per the description, this adds the user’s login details to the connection string, and can be consumed within Analysis Services by using the CustomData() function, instead of the UserName() function, which is a nice little workaround for environments that are “Kerberos challenged”.

    Now, something important to point out here is that this does not mean we can forget about Kerberos. As Richard and I discussed in the comments on his post, someone with access to the SSAS cube through a different front end client could potentially circumvent the intended security by altering the CustomData component of the connection string to connect as a different user. This presupposes a sophisticated user, but it’s a concern nonetheless. So at best, this is a temporary workaround for environments that ONLY use PerformancePoint as the front end for their BI solution.

    To finish, I just wanted to leave you with this blog post from Adam Saxton, titled “My Kerberos Checklist…”

    Enjoy :-)

     

    Cheers,

    Rod.

  5. Project Phoenix ... Update 1

    Since my first Project Phoenix post, I’ve received a lot of interest from MVPs, Vendors and the community in general, and I’m pleased to announce the following updates;

    1. MVPs Greg Low and Arnie Rowland have each contributed one of their MSDN subscriptions, bringing the total to 5 so far,
    2. We’re expanding the project to include New Zealand,
    3. Telerik have kindly volunteered their Ultimate Collection for .NET product,
    4. More details on additional training and learning products to come in the following days!

    Further, we now have the links available for both under/unemployed applicants and non-profits to submit project proposals.

    More details to follow, all of which will be posted to the dedicated Project Phoenix page

  6. Project Phoenix

    In January this year, I received the MVP award for my contributions to the SQL Server community, joining an amazing group of people from around the world, each of which contribute to the community in their own unique way. Be it running user groups or websites, writing blogs or speaking at events, each of these people contribute their own time to make our community the vibrant and dynamic success that it is today.

    This year, Microsoft rewarded MVPs with 3 MSDN Ultimate subscriptions to give away in any manner of their choosing. Each MSDN Ultimate subscription grants access to Visual Studio Ultimate, SQL Server, Windows Server and almost all other Microsoft software. I’ve wrestled with many different ideas on how best to give these away, none of which I was really happy with; each of these subscriptions retail at ~ $12,000 USD, so it’s not an easy decision.

    Recently, I was inspired by MVP Arnie Rowland’s decision to create Project Phoenix, a program in which under/unemployed people are granted software, books & training materials in return for devoting their time to develop solutions for non-profit organisations. In Arnie’s own words ...

    "...The idea is to provide the recipient access to all of the tools needed to improve his/her skills, an opportunity to gain practical experience, the potential to earn a recommendation and/or referral –and to positively contribute to society as a form of 'give-back'. No free lunch, just sweat equity –the kind that makes us all feel good for the effort..."

    The program, based in the US, has been an amazing success, and looks to be gaining in strength. A truly inspirational idea from Arnie, and exactly the type of thing that makes me proud to be a fellow MVP.

    I asked Arnie about the possibility of running something similar in Australia, and he enthusiastically supports the idea, so today, I’m announcing Australia’s own Project Phoenix, and to kick start the program, I’ll be donating my own 3 MSDN subscriptions. In coming weeks, I’ll be announcing more details, including qualification/award rules, and a formal application process. For now, this is a call to arms;

    1. If you’re an Australian non-profit organisation in need of software/database development services, email me a description of the services required and your location,
    2. If you’re from an organisation willing to contribute books, software, hardware or training, email me the details of what you’ll contribute,
    3. If you’re an MVP willing to contribute your MSDN licenses to this program, email me,
    4. If you’re a successful software developer willing to assist the award recipients in their development efforts, email me the support services you can provide
    Together, let’s make this happen.

    Cheers,
    Rod.
  7. Distinct Count, NULL & Analysis Services

    Consider the following record set …

    A
    B
    C
    NULL
    D

    … and the following T-SQL command;

    SELECT COUNT(DISTINCT(<value>))
    FROM <table>

    In SQL Server, running the above command on the record set will return 4, because the NULL value is ignored. What about the equivalent command in Analysis Services? More on that shortly …

    I’m currently implementing a BI solution for a client in which the fact table contains a “Purchase Order” column. The granularity of the fact table is the invoice line item, with dimensions for products, clients, date etc …. Two of the questions this solution needs to answer are;

    1.    How many invoices did we process?, and
    2.    How many purchase orders did we process?

    The answer to the first question is easy. Every invoice in the fact table contains an invoice number so we can use a simple DistinctCount aggregate function to return the number of invoices as a measure. Note we use *distinct* count instead of count because the granularity of the fact table is the line item, so many rows will have the same invoice number.

    Answering the second question (how many purchase orders) is a little bit more difficult. Not every invoice has a corresponding purchase order. “Direct invoices” are those that are paid without needing to have a pre-existing purchase order. In these cases, the Purchase Order column is NULL.

    Back to the example at the start of the post; a distinct count in Analysis Services works differently than T-SQL. Where T-SQL will return 4, Analysis Services returns 5, considering NULL as something different than the other values. As a result, using the same DistinctCount aggregate function on the purchase order column will return one more than it should, assuming there are some invoices with a NULL purchase order value.

    There’s a few ways around this problem, but the method I used was to create a new named calculation column in the data source view called NULL_PO_Exists with the following expression;

    CASE WHEN PurchaseOrder IS NULL THEN 1 ELSE 0 END


    Next, I created a new hidden measure called [Maximum NULL PO Exists] which used the MAX aggregate function over the NULL_PO_Exists column.

    Next up, I set the visible property to false for the existing Purchase Order count measure called [Total Purchase Orders], and created a new calculation [Purchase Order Total] which used the following expression;

    [Measures].[Total Purchase Orders] - [Measures].[Maximum NULL PO Exists]

    In summary, the new measure is using the old measure as a base, and then subtracting 1 (if NULLS exist) or 0 (if no NULLS exist). A bit of fiddling around, but it works :-)

    Finally, there’s a number of performance considerations when using distinct count. This white paper from Denny Lee does a great job of explaining the problem and a number of best practices for optimizing its performance.

    Cheers,

    Rod.

  8. Distinct Count, NULL & Analysis Services

    Consider the following record set …

    A
    B
    C
    NULL
    D

    … and the following T-SQL command;

    SELECT COUNT(DISTINCT(<value>))
    FROM <table>

    In SQL Server, running the above command on the record set will return 4, because the NULL value is ignored. What about the equivalent command in Analysis Services? More on that shortly …

    I’m currently implementing a BI solution for a client in which the fact table contains a “Purchase Order” column. The granularity of the fact table is the invoice line item, with dimensions for products, clients, date etc …. Two of the questions this solution needs to answer are;

    1.    How many invoices did we process?, and
    2.    How many purchase orders did we process?

    The answer to the first question is easy. Every invoice in the fact table contains an invoice number so we can use a simple DistinctCount aggregate function to return the number of invoices as a measure. Note we use *distinct* count instead of count because the granularity of the fact table is the line item, so many rows will have the same invoice number.

    Answering the second question (how many purchase orders) is a little bit more difficult. Not every invoice has a corresponding purchase order. “Direct invoices” are those that are paid without needing to have a pre-existing purchase order. In these cases, the Purchase Order column is NULL.

    Back to the example at the start of the post; a distinct count in Analysis Services works differently than T-SQL. Where T-SQL will return 4, Analysis Services returns 5, considering NULL as something different than the other values. As a result, using the same DistinctCount aggregate function on the purchase order column will return one more than it should, assuming there are some invoices with a NULL purchase order value.

    There’s a few ways around this problem, but the method I used was to create a new named calculation column in the data source view called NULL_PO_Exists with the following expression;

    CASE WHEN PurchaseOrder IS NULL THEN 1 ELSE 0 END


    Next, I created a new hidden measure called [Maximum NULL PO Exists] which used the MAX aggregate function over the NULL_PO_Exists column.

    Next up, I set the visible property to false for the existing Purchase Order count measure called [Total Purchase Orders], and created a new calculation [Purchase Order Total] which used the following expression;

    [Measures].[Total Purchase Orders] - [Measures].[Maximum NULL PO Exists]

    In summary, the new measure is using the old measure as a base, and then subtracting 1 (if NULLS exist) or 0 (if no NULLS exist). A bit of fiddling around, but it works :-)

    Finally, there’s a number of performance considerations when using distinct count. This white paper from Denny Lee does a great job of explaining the problem and a number of best practices for optimizing its performance.

    Cheers,

    Rod.

  9. Built to last .... for how long?

    One of the enduring memories from my childhood was my father bemoaning the design and build quality of contemporary objects. Whether it was the aesthetics of a newly constructed building or the durability of a recently acquired television, the song remained the same … “They don’t build ‘em like they did in my day”.

    In many ways he was right, although what he meant by “my day” was never well defined. Presumably “his day” included the construction of the pyramids and the Sistine Chapel! His point, while imbued with plenty of artistic license, raises a good question; Are we less concerned with build quality today?

    In exploring this question, let’s use a 2 year old cell phone as an example. If my 2 year old phone stops working, do I send it away for repair? No, I buy a new one because a) it will probably be cheaper than repairing the old one and b) the technology has improved so much that I’ll get a much better phone anyway. The manufacturers know this, so they have no real incentive to engineer products that outlast the warranty period.

    In that context, when it comes to database design, there are a number of important questions we need to ask ourselves as database professionals. Firstly, how much time should we spend engineering designs that achieve maximum performance when modern hardware can easily mask the effects of poor design, and secondly, with the increasing emphasis on timely delivery of working solutions, how much time should we spend on flexible designs that cater for unknown future usage scenarios?

    The answer to both of those questions is “it depends”. A purist would argue otherwise, but a pragmatic design approach suggests that we take advantage of modern hardware and spend less time wringing every last drop of performance from our designs. In a similar manner, a certain element of design flexibility can deliver future cost savings, but should not be used as a substitute for gathering business requirements at the commencement of a development project.

    Flexible and high performance database designs are often the unconscious side effect of using an experienced designer, someone whose default approach to database design has been shaped over many years of experience. Those same people often cringe when they see the work of less experienced practitioners whose designs only work due to the modern hardware they run on.

    In terms of design flexibility, an area I often target is the relationships between entities. For example, a classic entity relationship is between a product and a product category. There’s a number of ways of representing this relationship. We could store the category as a column within the product table, or create a separate table for categories and include a foreign key in the product table. In both cases, there are inherent design limitations here; What if we need to include subcategories, or add a product to more than one category? With a little bit more thought, we can create a many to many table between products and categories, and design a parent-child relationship within the categories table. Such a design provides a lot more flexibility, and the cost of implementing the design up front is orders of magnitude cheaper than retrofitting the design at a later point.

    Performance can be seen in a similar light; the system may perform fine with the current production load, but what happens if the user base triples overnight? Depending on the application design, throwing more hardware at the problem may not be a viable solution.

    While this post doesn’t provide any answers, I hope that these questions make you think about the database design process a little more deeply, and perhaps help you to build a solution that stands the test of time, and maybe even make my father stand back and admire your creation :-)

    Cheers

  1. 1
  2. Next ›
  3. Last »