Dev Diary - Adding Transformations to Mapping Data Flows

Written by Roelant Vos on 2.1.2022

TAGS: Biml,BimlFlex,adf,DataFlowMapping


The Mapping Data Flows ('data flows') feature of Azure Data Factory (ADF) provides a visual editor to define complex data logistics and integration processes. Data flows provides a variety of components to direct the way the data should be manipulated, including a visual expression editor that supports a large number of functions.

Work is now nearing completion to make sure BimlFlex can incorporate bespoke logic using this expression language, and generate the corresponding Mapping Data Flows.

For every object in every layer of the designed solution architecture, it is possible to add derived columns. Derived columns are columns that are not part of the 'source' selection in a source-to-target data logistics context. Instead, they are derived in data logistics process itself.

This is done using Derived Column Transformations in Mapping Data Flows, in a way that is very similar to how this works in SQL Server Integration Services (SSIS).

In BimlFlex, you can define complex transformation logic this way using the data flow expression syntax. The resulting code will be added to the selected Mapping Data Flow patterns, and visible as Derived Column Transformations. You can also define dependencies between derived columns, for example that the output of a calculation is used as input for the next one.

Consider the screenshot below.

alt text here…

The column 'MyDerivedColumn' is defined as part of the 'Account' object. The 'IsDerived' checkbox is checked, and the 'Dataflow Expression' is provided. This metadata configuration will generate a Derived Column Transformation with this column name, and the expression.

The resulting column is added to the output dataset.

alt text here…

In the first screenshot, the 'solve order' property was also set - with a value of 0.

The solve order directs BimlFlex in generating the logic in a certain (incremental) order. So a lower solve order will be created as an earlier Derived Colum Transformation, and a higher solve order later. The exact numbers do not matter, only that some are higher or lower than others.

If multiple derived columns are defined with the same solve order, they will be generated in the same Derived Column Transformation. This makes it possible to break apart complex logic in separate steps, and use them for different purposes.

This is one way to allow potentially complex logic to be defined in the metadata, and use this to generate consistent output that meets the requirements.