Dev Diary - Connecting to unsupported data sources using Mapping Data Flows and BimlFlex

Written by Roelant Vos on 2.4.2022

TAGS: Biml,BimlFlex,adf,DataFlowMapping


When configuring BimlFlex to use Mapping Data Flows ('data flows'), the generated data flow will in principle directly connect to the configured Source Connection and process the data according to further configurations and settings.

For example, to initially 'stage' data into the data solution patterns such as the ones below can be generated:

In both these cases, however, this relies on data flows to actually be able to directly connect to the data source. But, this is not always supported. The following link to the Microsoft documentation shows which technology is supported by the various areas of Azure Data Factory (ADF).

As Varigence, we still want to be able to enable our users to mix-and-match the BimlFlex project configuration using our supported data sources - even though this may not (yet) be supported directly by the ADF technology in question.

For example, BimlFlex supports using Salesforce REST APIs as data source, and this can be used to generate ADF pipelines that stage and process data further. If you would want to use data flows to implement complex transformations -for example using derived columns- for a Salesforce data source that would be an issue.

To make sure our users can still use data flows for data sources that are not (yet) natively supported, the BimlFlex engine has been updated to detect this. If this is the case, the generated output will include additional processing steps to first 'land' the data into a data lake or blob storage, and then process the data from here using data flows.

In practice, this means that an additional Copy Data activity will be added to the ADF pipeline that starts the data flow. The ADF pipeline acts as 'wrapper' for the data flow and already includes the interfaces to the runtime control framework (BimlCatalog).

To summarize if a BimlFlex is configured to use data flows and - contains a source connection that is supported, then a direct connection will be generated - contains a source connection that is not support, then an additional landing step will be generated

Consider the following project configuration:

alt text here…

In this example, a Salesforce connection is configured as source. A staging connection is also defined and ultimately the data will be integrated into a Data Vault model. No persistent staging area connection is configured for this project. Note that the integration stage is set to Mapping Data Flows. Because Salesforce connections are not natively supported for data flows, a landing step will be added by BimlFlex.

To direct BimlFlex where to land the data, the source connection will need to be configured with a landing connection. This is shown in the screen shot below.

alt text here…

The result in the generated output will appear as per the screenshot below. The Copy Data activity will land the data before the data flow will process these to the staging area.

alt text here…