DBPedias

Your Database Knowledge Community

Audrey Hammonds

  1. Excited to be participating in the Global Windows Azure Bootcamp Next Saturday April 27th, 2013

    from Julie Smith

    Saturday I was at Innovative Architects with several of my colleagues prepping labs for an exciting event being held April 27th. It’s the Global Windows Azure Bootcamp and it’s a very cool idea and thing that Microsoft is doing.  There will be sessions all over the world (see below map) where Microsoft will be hosting free training on Azure.  While we are all training, we will also each deploy a processor to a Global Render Farm. Here is a cute video explaining what a render farm is:  http://www.youtube.com/watch?v=4VUWrZRCtI8 (this video is a plug for renderfarm.fi , which is a volunteer/free service, NOT affiliated with Microsoft, I just thought it offered a cute and good explanation of what a render farm is.)  Here (link) is another video showing creator Alan Smith explaining the render farm he created for Azure.

    The session in Atlanta will be hosted by two of my colleagues from Innovative Architects, Vaishali Shah and Rob Bramhill.  They have devoted weeks of time to prepping the day and their hard work was very evident yesterday as we were doing final preparations.

    Apparently penguins are participating from Antarctica.

    Apparently penguins are participating from Antarctica.

    I look forward to working more with Azure, specifically creating HDInsight clusters to apply the same type of scalability to data as this Render Farm experiment applies to processing animations.


  2. A Giant Thank You, and a Link

    First of all, thanks so much to everyone who supported, organized, and attended the Day of Data Warehousing fundraiser yesterday. I am humbled and honored to have had a chance to spend the day with all of you. Couldn’t have asked for a better group to spend a Thursday with!

    As promised, here is the link to my slides, demos, databases, and documentation: http://sdrv.ms/16tszZZ

    If you have any questions, comments, feedback, or just want to say hi, please get in touch!

    Personal E-Mail: audreydhammonds@gmail.com
    Work E-Mail: audrey.hammonds@innovativearchitect.com
    Twitter: @DataAudrey

    Model on, my friends…

    –Audrey


  3. Importing Values into DQS Domains from Excel, and Gratitude

    by Julie Smith

    I’m back from the MVP Summit and processing a staggering number of thoughts. Not only am I processing the brain dump of NDA technical goodness of exciting new stuff coming to the product, but a number of personal reflections. How lucky I am to have been somehow deemed worthy of this honor. How fortunate the SQL MVPs are in the number of sessions, pampering and genuine respect we receive from the SQL Server Product team. How lucky I am to be a part of one of the closest technical communities out there. The SQL Community calls itself family because we truly have a network of close, warm friendships. It dawned on me recently that I could go practically anywhere in the world (the world!) and find a colleague I have met from being a part of the SQL Community. I am a happy lady right now (in spite of the double ear infection).

    1badge

    Another benefit to you, my readers from my attendance to the Summit is a new motivation to continue sharing my knowledge with the community.  That said, let me be the millionth prodigal blogger to apologize for my lack of blogging.  Insert every lame but true excuse here:  I’ve been busy, I’ve had things come up in my personal life, all true, but I know you want to hear from me.

    So from the back logs of my “things I wanted to blog about” I picked a random DQS tidbit:  How to import domain values and synonyms into DQS from Excel.  Get ready for a wee bit of technical content!

    When I first started working with DQS and saw the ability to import values from Excel, I used the function, but only to import the actual values.  I did not realize until PASS Summit this year that you could also import the synonyms.  I was asked if you could do so during a presentation on DQS, and luckily Rakesh Parida from Microsoft was in the audience and was able to field the question.

    Let me do a small intro to DQS so that you have context.  In Data Quality Services, the equivalent to a database which we are all familiar with is called a Knowledge Base.  Knowledge Bases contain Domains, which are the equivalent to columns or attributes in a database.

    In a domain, you can create domain values, which are the expected values for an attribute in the data.  The beauty of DQS is that we can also create relationships or synonyms between domain values.  As an example, if I have a list of vending machine suppliers for my company, I know that I should expect to see Coca Cola and Pepsi.  But as an information worker in my company, I also know that sometimes I see Coke instead of Coca Cola.  Coke is valid in my domain, but it should be corrected to Coca Cola so that at the end of the month, I know for certain how much I spent with the one company Coca Cola.  Coca Cola and Coke are both valid domain values, but Coca Cola in my example would be the leading value.

    So I’m going to show you how to import both domain values and their synonyms into a DQS Knowledge Base from Excel.  Here is the data in Excel.  The first column are the leading values, or the correct value for the domain.  The Second column contains values possible for the domain, but that should be corrected to the leading value.  Keep in mind that if a row in my spreadsheet has a value for both Corrected and Synonym, then DQS is going to insert two Domain values, one for each, but with the second having a correct to value.

    1excelJPG

    Let’s set up a quick Knowledge Base and a Domain with datatype String for the Vendor:

    1. From the DQS Client, click New Knowledge Base.  
    2. Type in “Vendors” for the Name.  
    3. Leave None as the choice in the Create Knowledge Base from text box.
    4. Click Next

    2client


    CreateVendorsDomain

    To create the Domain:

    1. Click the Create a Domain button in Domain Management.
    2. Once the Domain properties modal box comes up, type in Vendors for the Domain name.  Leave the other values as is.  This will give you a string datatype Domain.

    addDomain

    domainProperties

    Once the Domain is created, you will see the tabs for various properties of the domain above.  We are going to be in the Domain Values Tab:

    domainvaluetab

    From here, click on the third button from the right (the Import Values Button) and select “Import valid values from Excel”.  That will bring up the Import Values dialog box.  Select your workbook and sheet names:


    importvalues

    be sure to click the box beside “Use first row as header” and click ok.  You’ll get a logging dialog to let you know how it went.

    importLog

    Now you will see your values from Excel available as values in your domain.  Not only did you import the values, but the leading values.  So when we run actual data through this KB in a cleansing project, the input of “Coke” will return as the having a corrected value of “Coca Cola”.

    1imported

    I hope this was a useful demo of loading your Knowledge Bases more quickly and I look forward to another wonderful year of SQL Server Community.


  4. Full Day of Training Offered by the DataChix! (On Pi Day, no less)

    Do you want to learn about Data Warehouses?  Do you want to learn about Extract, Transform, Load with SSIS?

    Do you like Pi?

    piday







    Do you like Pie?

    Learn DataWarehousing with the DataChix!

    SQL Server MVPs and popular blogging team Audrey Hammonds and Julie Smith of DataChix.com offer this one day of training to benefit The Cloverleaf School of Atlanta in cooperation with The American Legion Post 251, Duluth, GA.  Audrey will take you on a guided tour of converting a transactional model into Dimension and Fact tables.  After the model has been created, Julie will guide you through loading those tables using SQL Server Integration Services.

    To top it all off, it’s happening on March 14th folks.

    That’s right. 3/14  Pi Day.  Could there be a better way to celebrate the irrational number which represents the ratio of a circle’s circumference to its diameter than by using SSIS?  NOPE!  But maybe we’ll get some delicious Pie to celebrate as well!

    Audrey and Julie bring over 25 years of warehousing experience to this fantastic day of training, but don’t say that to their faces!

    audreyjulieaction2

    Part I Dimensional Modeling with Audrey

    Why Transactional databases are the way they are—Codd’s rules to live by

    Why Codd’s rules don’t work for Reporting– Kimball’s rules to live by

    Fun at the whiteboard converting a transactional model to a dimensional model.

    Part II Extract Transform and Load using SSIS with Julie

    A tour of SSIS

    audreyjulieaction3








    Control Flow Basics

    Data Flow Basics

    How to load Dimensions with SSIS

    How to load Facts with SSIS

    Why SSIS has got such a bad rap sometimes.

    Frameworks

    What happened and when

    When things go wrong, how to know the details

    Top quirks of SSIS—what, why, and how to get around them.

    Data Quality!  We’ll throw in a little Data Quality Services with our ETL!

    Lunch and Continental Breakfast provided.  Please contact the organizer at the above email for any special dietary considerations!  Thanks and see you on Pi Day!

    FOR TICKETS—–http://datawarehousingdatachix.eventbrite.com/#

    Can’t make it to this great day of SSIS training to benefit the school, but you’d still like to support our mission? http://t.co/VX6yGftC


  5. Great way to begin 2013! (Hint: it involves a TLA)

    Quick, happy announcement… My fellow Datachix, Julie Smith, has received the SQL Server MVP award for 2013!  Also, I’ve been renewed as a SQL Server MVP for 2013.  What does this mean?  Yeah… 100% of the people who blog on this site (all two of us) are MVPs!

    mvp

    So, Julie, while I am not the first to congratulate you, let me be the first to do it on our blog. :)

    On a personal note, I am grateful and humbled to be a part of the MVP community for another year.  Having experienced 12 months of the program, I’m honored that Microsoft saw fit to include me for another year.

    As a bonus, here’s a picture of Julie being awesome:

    Wizard

    Rock on, my friend!

    –Audrey

    p.s. Note to self:  It’s been a while since you blogged about anything, Audrey.  Get your act together and post more often!  (Nodding…)


  6. “Pattern Matching” in Data Quality Services –Domain Rules

    Guess what guys?  I’m speaking at PASS Summit this year!  My session titled Data Quality Services—Finally!    was selected and I get to go to the premier SQL Server conference and present on a topic which I find truly exciting.  Data Quality Services (DQS) is new with SQL Server 2012.  Along with Master Data Services (MDS), it represents Microsoft’s entry into Master Data Management (MDM).

    The ideal use of MDS and DQS represent to me a paradigm shift in the way I.T. folks and business folks interact.  These tools are not just another way for I.T. to develop solutions for “the business” based on requirements (whether those requirements are thoroughly documented via a watershed lifecycle or verbally shared with agile).  DQS and MDS were explicitly designed for continued and iterative use by business users with expert Knowledge about their data.  In MDM speak, these experts are called Data Stewards.  I recommend that anyone in Data continue getting familiar with these concepts.  Data Governance and Master Data Management are terms you are likely to see from now on.  For more information I recommend David Loshin (http://mdmbook.com/) .  His book,  Master Data Management is a comprehensive guide to this topic.  I’ve also found useful articles here , including articles on ways for I.T. to get “buy in “ from C-levels.

    So, back to today’s post.  As I was prepping my session for PASS Summit on Data Quality Services, I was extremely fortunate to be in communication with Matthew Roche and Matt Masson of Microsoft, who are also presenting on DQS  (and MDS– actually all of EIM) at Summit.  These two are both uber “ersum”, always willing to help me, answering emails at all hours of day and night.

    Let’s dive in:  Keeping data clean with DQS starts with creating a Knowledge Base.  This is done in the Data Quality Client.  Inside a Knowledge Base you create and maintain Domains.  Single domains are the equivalent to a column or attribute in a database.

    The Data Quality Client.

    Domain properties are configured using a combination of five tabs in the Data Quality Client:

    1. Domain Properties –Data Type and other basic properties
    2. Domain Values –area where possible values, correct, invalid and incorrect are stored, as well as corrected values.
    3. Reference Data—area to map the domain value to Reference Data Sources, like those supplied by vendors such as Melissa Data.
    4. Term-Based Relations—area to configure strings within a domain and a replacement value.  Where the Domain Value would replace the entire content of an attribute, a term based relation would replace a part of it.  So if I have “Ave.”  related to “Avenue” as a term based relation, and Apple Ave comes through, the TBR will turn it into Apple Avenue.
    5. Domain Rules—area to create validation rules for values in the domain.

    The Domain Tabs

    Domain rules are pretty exciting to me.  (I might need to get out more though).  If I want to make sure my Canadian friends receive my chain letters (Matt Masson LOVES snail mail chain letters), I could create a domain for Canadian Postal Codes:  For a domain with a string data type, you get a plethora of options for defining valid data:

    But wait, there’s even more:

    Most of these are self-explanatory.  But I wanted to know what type of wildcard symbols were used with the “pattern” choices.  The definition on MSDN had a definition for it, but not a complete one:

    BOL definition is not quite complete.

    So I emailed Matthew Roche with the question, and he sent me a complete (unofficial but trustworthy, no guarantee expressed or implied ) definition within a few hours from the DQS dev team.  Many thanks to Matthew and the team:

    The rules of the patterns are as following:

     

    1. Any letter (not only A) is considered as a pattern for any letter
    2. Any digit (not only 1) is considered as a pattern for any digit
    3. Any special character (not a letter and not a digit) is considered as pattern for itself
    4. Square brackets define optional matching

     Examples:

     Pattern for the current standard of Washington vehicle license: AAA-1111

    1.      The same pattern can be written as ABC-1234 and will present the same pattern
    2.      “-“ sign represents itself, while letters and digits represent any letters and digits
    3.      Matching samples: AIA-4831

    4.    Not matching samples: AIA-2A34 (not a digit), AIA+3456 (wrong special character)
     Let us say the new license standard will allow 4 or 5 digits in the end. Here we can use square brackets to define optional part.

    New pattern would be: AAA-1111[1]

    Matching samples: AIA-4831 (without additional digit), AIA-34561 (with additional digit)

    So back to my domain rule.  The format for Canadian postal codes is:  letter-number-letter space number-letter-number.

    Using A for a letter and 1 for a number, I could define my rule as this:  A1A 1A1

    But I could get silly and also define it like this, where H is the letter and 0 (zero) is the number:  H0H 0H0 (this is actually the postal code used for Santa Claus in Canada—pretty cute eh?)

    With that, the following value would be valid: V5S 3Y2

    While this value would be invalid:  VSS 3Y2  — “S” is a letter, not a number.

    And running data through a cleansing project, both valid and invalid would go something like this:

    Note that some of these are invalid as “Domain Value” because they were in the kb previously as invalid.  The ones designated as failed rule, were new values to the kb.

    I hope this “sneak peek” of DQS has been of interest to you.  And I hope to see you all in Seattle!

    Resources:

    http://blogs.msdn.com/b/dqs/archive/2012/03/09/one-stop-resource-for-data-quality-services-dqs-in-sql-server-2012.aspx

    Channel 9 videos of TechEd Elad Ziklik, Matt Masson, Matthew Roche


  7. Don’t Forget to Vote for PASS Summit’s Lightning Talks!

    Tomorrow is the deadline for PASS Summit’s Lightning Talk Community Votes–Make your choice known for your favorite sessions.  And if you need help with the menu, may I suggest the ham?

    Also, if you are in the area, I hope you are considering Columbus, Georgia’s first SQL Saturday (#167) !  I am delighted to be speaking at this event!


  8. SQL Saturday 126: I presented DQS Finally! in Indianapolis

    Thanks to the folks who came out to see my DQS presentation in Indy this past weekend.  Thanks to Caroline Bailey, Hope Foley, Kyle Neier , Eddie Weurch, and the rest of the Indy crew who put on an excellent event.  Here is my slide deck as promised.  I met many fantastic folks and had a paleolithic good time.


  9. VMWare Player Version Issues

    A Non SQL Server Blog post

    I was working on a presentation last night and I tried to open a virual machine on vmware player, and I got this error:  Invalid configuration file. File “filename.vmx” was created by a VMware product with more features than this version of VMware Player and cannot be used with this version of VMware Player.  Cannot open configuration file filename.vmx.

    Search engine results were not plentiful.  I finally found the solution here.  What had happened was that the virtual machine I was trying to open had been created in a newer version of VMWare (4.0) than my Player was (3.1.4).  By opening the vmx file in notepad and changing the line virtualHW.version = “8″  to : virtualHW.version = “7″, I was able to open the virtual machine and carry on.  The other option is to do a free upgrade of your version of VMWare player.  The problem is the error message doesn’t tell you that clearly.

    Posting this because it seemed like the error message doesn’t yield a ton of content via search engines and hoping it will save someone else’s behind like it did mine  :)


  10. We don’t write, we don’t call…it’s like we’re a couple of jerks.

    I check our blog every now and then and hang my head in shame.  We don’t write, we don’t call.  What happened?  Well actually a lot of good things.  We failed to announce on our site when Audrey became an MVP!  

    Audrey’s an MVP!!!!

    That happened in January!  We’ve got this marketing thing nailed I tell you.

    Audrey and I are both finishing up the final touches on our chapters for Wiley’s SQL Server 2012 Bible, which should be available in August.   

    I would like to point out that this book took as long to finish as it would have taken me to carry two children to term, and in many ways was more painful than bearing the two children I have. So for me over the last 18 months, I would often think “Gosh, you should blog”. And then my conscience would scream at me: “How dare you put letters in a row in a document that isn’t going to the Wiley Editor?” and then I’d go have a cookie.   

    I am now working as a consultant again for Innovative Architects. My boss is funnier than I am, which is hard for a ham like me to take, but other than that I’m really tickled to be onboard.  We are building a very impressive data services practice and I love my team.

    I’m hitting the road with a newer presentation.  It’s all about Data Quality Services and it’s like the director’s cut version of the ten minute DQS talk I gave at the Special Ops tour in Atlanta.  I’ll be presenting Data Quality Services—Finally!  to the Columbus user group  on June 25th (thanks Tim Radney (b | t).

    Data Quality Services is a tool designed to capture the “folklore” of every organization.  You know, the guy (let’s call him Phil)  who’s worked with the data for 10 years and has all of the variations memorized.  He can look at a spreadsheet and tell you which rows to consolidate, which rows are garbage, and he has all the macros needed to clean that crap up.  Well now there is a way to capture the folklore in Phil’s brain and apply it much more efficiently.  Its main interface, the DQS client, is very user friendly, as it is designed for the business user rather than developers.  But the projects created with it can be applied in SSIS and MDS.

    A nice clean interface with which we shall clean the data.

    SSIS has now been kicked out of the official BI stack (sort of) and is now joins DQS and Master Data Services new a new stack called Enterprise Information Management.  Read more about it here.     I’m thrilled to have a whole new area of Data Geekiness to dive into with these additions to SQL Server, and all of the clients I speak with seem pretty stoked about Data Quality as well.


  1. 1
  2. Next ›
  3. Last »