Recently we attended and sponsored the Global Data Summit. I can honestly say that it was the most diverse set of speakers and attendees that I’ve ever had the privilege to meet. I’d like to share with you some of my experiences both as an attendee and as a sponsor of the event.
As a sponsor, I wasn’t able to attend as many sessions as I'd have liked to. I was able to meet individually with most of the speakers and listen to their 5 x 5 presentations, which were a real hit at the conference. Just listening to these 5-minute sessions, I was struck by the diversity of the speakers and their knowledge of data solutions.
Mike Ferguson demonstrated how to derive structure from unstructured data using machine learning combined with text mining techniques to provide sentiment analysis. I’m no expert in text analytics, but it was really interesting to get a perspective from somebody that is clearly an expert in his field.
Dirk Lerner and his presentation on bitemporal modeling at first gave me a brain freeze. However, his explanation of using a Data Vault satellite to track changes over time and collapsing the changes into a dimension for analytics was not only a requirement for financial reporting in Germany but also quite ingenious.
The next session that resonated with me was John Myers’ discussion about metadata and its importance within the data world. It was refreshing to listen to an expert like John discussing the challenges in the big data space, and how metadata is even more important when using "unstructured" data. This is similar to some of the work we have done using metadata within BimlFlex to perform the schema on read in a persistent staging layer and to this excellent post by Roelant Vos: Embrace your Persistent Staging Area for Eventual Consistency
Rick van der Lans’ session about data virtualization with double flexibility and agility was very informative, and the audience really liked his double and triple Oreo analogy. I saw many parallels between Rick’s insights and what we have implemented in BimlFlex for Azure SQL Data Warehouse deployments. His logical data warehouse architecture was also a great example of the type of metadata driven solution that John advocated in his session.
The session by Ralph Hughes about Agile2020 was of particular interest to me and was also the only full session that I was able to attend due to the busy schedule at our sponsor booth. Ralph highlighted the need for a better way of managing data warehouse projects, and he and his team has developed among the best methodologies I’ve seen. His presentation was clear in explaining the gap between requirements, development, and testing. I’ve taken numerous ideas from his presentation, especially around automated testing. We’re looking at ways of incorporating the tools he recommended with BimlFlex. We are also working towards implementing this methodology as our standard project delivery approach, which I think of as “supercharged” agile for data warehousing. We will also be recommending that our customers either contact Ralph or consult one of his many great books when embarking on a data warehouse project.
Claudia Imhoff’s session, "How to lie with statistics" was amusing and highlighted the different ways the same data can be represented with wildly different narratives and conclusions. It was the first time that I saw a presentation from somebody as knowledgeable and experienced as Claudia showing how an analyst could "cheat" with data. The tongue-in-cheek made it all the more fun and engaging. One of her slide bullet points stood out the most to me: "Hide your sources at all cost, which stops anyone from verifying or disproving your facts." Having worked with data for more years than I'd like to count, I have seen too often that reports cannot be verified against a single source of truth like an enterprise data warehouse.
Damian Towler, our guest, presented a session on the selection process CUA undertook in evaluating and choosing a metadata-driven data warehouse automation solution. It was very well received because we, as the vendor, decided it best that our potential customers talk to an existing customer without us in the room to influence the conversation. Damian took them through his selection criteria and independent scoring process. He then showed them their single customer view solution and how they were able to implement it within CUA in only a couple of weeks. We have incorporated their evaluation criteria into our existing spreadsheet that you can download here. Please let us know if you have any criteria you would like us to add.
Len Silverston had a session on the Universal Data Model. I was somewhat skeptical at first about the concept of a Universal Data Model, thinking that it might be a one-size-fits-all approach. However, the more I listened, the more it made sense. I also thought about some of our customers and how much time they spend figuring out the "right" target model which is what Len has done for you. Later during the summit (or more correctly as we were packing up), Damian facilitated a discussion with Len and Hans. Hans announced earlier in the day that he and Len were collaborating on converting the Universal Data Models to Data Vault models. Damian shared that identifying the core business concepts using a pre-fabricated core of Hubs and Links (backbone) would have significantly reduced their project timeline. This lead to us to trying to see if the BimlFlex can be used to accelerate his 3NF Universal Data Models to a Data Vault model. It all makes sense. Imagine you’re in the insurance industry or health industry and could use a predefined target model as a starting point, saving you weeks or months on your project. We hope to bring you more information about our collaboration soon.
There were many other great sessions at the Global Data Summit, and I’ve only highlighted ones that resonated the most with me. Many attendees and speakers were discussing Big Data, machine learning, and even blockchain at the event. As a sponsor of the event, we had many interesting discussions with so many attendees and received extremely positive feedback for BimlFlex and our roadmap for the future.
In summary, the Global Data Summit was one of the most enjoyable and diverse data events I have attended. A big thank you to all the sponsors, attendees, speakers and organizers for a fantastic time. I especially would like to thank Hans and Patti Hultgren for making us feel welcome in Genesee.
Investing into designing and implementing a substantial Data Warehouse creates a foundation for a successful Data Platform architecture. Using a configurable Data Warehouse Automation solution that support all the best bits of Azure SQL Data Warehouse as standard is essential. For more information on why we use this approach please also read this blog post by Roelant Vos Embrace your Persistent Staging Area for Eventual Consistency.
Azure SQL Data Warehousing
Leverage metadata-driven data warehouse automation and data transformation optimized specifically for all Microsoft Azure SQL Data Warehouse options. The ability to extract, compress and prepare data at source is critical to delivering an optimized solution. Using Polybase with parallel files you can improve the data warehouse loads well over ten times from traditional SSIS packages.
We demonstrate extracting data from a source system taht can be staged or presisted into tables or loaded directly into type 1, 2 or 6 dimensions and facts.
In the previous webinar, we touched on data warehousing using Azure SQL Data Warehouse and will go into detail showing the parallelism and transformation using Polybase.
Traditionally most of the project time is spent on connecting to the source systems configuring CDC and parameters to extract data. We will look at how easy BimlFlex implements scaling out your data ingest by creating parallel threads and multiple files. This approach is vital for optimal performance as explained by James Serra in the following blog post. James Serra. PolyBase explained.
BimlFlex data warehouse automation, especially when combined with Azure SQL Data Warehousing, is worth investigating if you are about to embark on a modern data warehouse project.