Dev Diary – Defining Mapping Data Flow Parameters with Biml

Written by Roelant Vos on 8.31.2021

TAGS: Biml,BimlFlex,adf,DataFlowMapping

Share

The previous post on developing Delta Lake using Azure Data Factory Mapping Data Flows covered how and why we use parameters in our BimlFlex-generated Mapping Data Flow data logistics processes.

As a brief recap, parameters for a Mapping Data Flow can both inherit values from the process that executes it (the Execute Pipeline) as well as hold values that can be used to define the data flow itself.

For BimlFlex this means the Mapping Data Flows can integrate with the BimlCatalog, the data logistics control framework, and provide a central access to important information received from the metadata framework. For example, parameters contain the composition of the Integration Key and the names of standard fields as they have been configured in the BimlFlex App.

Adding Parameters using Biml

The code snippet below uses the example from the previous post on this topic and adds a number of parameters to the Mapping Data Flow. Once defined, these become accessible to the Execute Pipeline that calls the Mapping Data Flow so that information can be passed on.

The first four parameters of this example reflect this behaviour. The start time of the individual process and its batch (if there is one) are passed to the Mapping Data Flow so that these values can be added to the resulting data set.

The other parameters are examples of attributing both the name of custom columns as well as their default values. This is really useful to dynamically add columns to your data flow.

This approach allows any custom or standard configuration to make its way to the generated data logistics process.

As a final thing to note, some parameters are defined as an array of strings by using the string[] data type. Mapping Data Flows has a variety of functionality that allows easy use of an array of values. This is especially helpful when working with areas where combinations of values are required, such as for instance calculation of full row hash values / checksums or managing composite keys. Future posts will go into this concept in more detail.

For now, let's have a look at the syntax for defining parameters in Mapping Data Flows - using the Parameters segment.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<DataFactories> 
    <DataFactory Name="bfx-dev-deltalake-demo">
        <Dataflows>
            <MappingDataflow Name="HelloWorld">
                <Sources>                       
                    <InlineAzureSqlDataset 
                        Name="HelloWorldExampleSource" 
                        LinkedServiceName="ExampleSourceLS" 
                        AllowSchemaDrift="false" 
                        ValidateSchema="false" />                       
                </Sources>
                <Sinks>                     
                    <InlineAzureSqlDataset 
                        Name="HelloWorldExampleTarget" 
                        LinkedServiceName="ExampleTargetLS" 
                        SkipDuplicateInputColumns="true" 
                        SkipDuplicateOutputColumns="false" 
                        InputStreamName="bla.Output" />
                </Sinks>
                <Parameters>
                    <Parameter Name="BatchExecutionId" DataType="integer">-1</Parameter>
                    <Parameter Name="BatchStartTime" DataType="string">'1900-01-01'</Parameter>
                    <Parameter Name="RowAuditId" DataType="integer">-1</Parameter>
                    <Parameter Name="ProcessStartTime" DataType="string">'1900-01-01'</Parameter>
                    <Parameter Name="EffectiveFromDateAttributeName" DataType="string">'FlexRowEffectiveFromDate'</Parameter>
                    <Parameter Name="RowRecordSourceAttributeName" DataType="string">'FlexRowRecordSource'</Parameter>
                    <Parameter Name="RowHashAttributeName" DataType="string">'FlexRowHash'</Parameter>
                    <Parameter Name="RowSandingValue" DataType="string">'~'</Parameter>
                    <Parameter Name="RowSourceIdAttributeName" DataType="string">'FlexRowSourceId'</Parameter>
                    <Parameter Name="RowRecordSource" DataType="string">'awlt'</Parameter>                   
                    <Parameter Name="SourceKeyAccount" DataType="string[]">['AccountCodeAlternateKey']</Parameter>                   
                    <Parameter Name="SourceColumnsAccount" DataType="string[]">['AccountDescription','AccountType','Operator','CustomMembers','ValueType','CustomMemberOptions']</Parameter>
                    <Parameter Name="SchemaNameSource" DataType="string">'dbo'</Parameter>
                    <Parameter Name="TableNameSource" DataType="string">'Account'</Parameter>
                </Parameters>
            </MappingDataflow>              
        </Dataflows>
        <LinkedServices>
            <AzureSqlDatabase Name="ExampleSourceLS" ConnectionString="data source=example.com;"></AzureSqlDatabase>
            <AzureSqlDatabase Name="ExampleTargetLS" ConnectionString="data source=example.com;"></AzureSqlDatabase>                
        </LinkedServices>
    </DataFactory>
</DataFactories>
</Biml>

Comments