Combining DBT with BimlFlex
Written by Roelant Vos on 8.21.2022
During our travels, we are regularly asked how BimlFlex compares against DBT. Our BimlFlex data solution automation framework would be compared against all our competitors, of course, but DBT is the odd one out in this case. It is worth covering this in detail.
This is because BimlFlex and DBT are not really competing. In fact, they can complement each other.
DBT stands for ‘Data Build Tool’ and is an effective approach for organizing data pipelines using simple configuration files and a Command Line Interface.
At a high level, DBT works by defining a project that contains ‘models’. Models contain SQL files that are saved to a designated directory where DBT can find them. A project itself references a ‘profile’ which contains connectivity information, and DBT will look after deployment. Models can be referenced to each other to construct a data logistics pipeline with separate steps, which can be visualized as a data logistics lineage graph.
The fact that everything (i.e. configuration files, model files) are stored in plain text (i.e. not as binaries) means that it’s a good match for data solution automation framework – which BimlFlex provides.
As a software for managing orchestration and deployment for data logistics processes, DBT does not provide a framework or patterns out of the box. It comes with templating engine support (Jinja) that allows users to define their own SQL templates, but users would have to provide the patterns themselves as well as the framework to design and manage the metadata that would power those templates.
In short, it does not provide a data solution automation framework, and this is where BimlFlex comes in.
BimlFlex contains production-ready templates for common architectures as well as a Graphical User Interface (GUI) to manage the models and metadata. BimlFlex compiles this design metadata into executable artifacts for various target platforms – including the Microsoft Database family, Synapse, Snowflake and Databricks. BimlFlex can natively generate fully-fledged Execute Pipelines and Mapping Data Flows, but it can also generate SQL Stored Procedures.
If BimlFlex would be combined with DBT, data teams would be able to leverage DBT’s strengths but not have to invent or manage their own automation framework and metadata management solution.
With some configuration you can configure BimlFlex to generate data logistics processes to a directory that is configured for DBT and manage orchestration there.
Options for using BimlFlex with DBT
Even though BimlFlex’s automation framework is a good fit for DBT, there are a few approaches you can take which may require some customization of the BimlFlex output.
Because DBT’s workflow uses Create Table As Select (CTAS) statements in queries or common table expressions (CTEs) to define tables, it does not directly support table using Data Definition Language (DDL) statements. For the same reason, Stored Procedures are usually not part of the DBT workflow either. In DBT, this is managed through SQL SELECT statements.
A possible approach is to use DBT to execute the BimlFlex-generated Stored Procedures as Macros. This means that the procedure requires a wrapper such as in this example:
{% macro output_message(return code) %} <Stored Procedure> call output_message(); {% endmacro %}
In BimlFlex, this can be done using the Extension Points for the Stored Procedures.
Using more advanced use of Extension Points and Bimlscript, it is also an option to modify the procedure into one or more SQL statements or common table expressions. This will remove the procedure headers and use C# or Visual Basic to spool the results into DBT model files.
Data sources can be emitted using the same techniques to represent the DBT source Yaml files, for example:
sources: - name: AdventureWorksLT tables: - name: address - name: customer
Of course, it is also possible to combine the strengths of BimlFlex and DBT in different ways. You can use the standard BimlFlex features to generate Execute Pipelines, SSIS Packages or Mapping Data Flows to connect to the data sources, and load these into a staging layer from where DBT can pick up the data logistics using BimlFlex generate SQL statements.
If you want to know more, please reach out to the BimlFlex team to see how you can combine the power of BimlFlex’ code generation with DBTs orchestration and lineage capabilities.
Share