Defining the Data Flow in Biml
Written by Varigence Blog on 6.15.2011
This post is part 5 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.
In the previous post in the series, I talked about controlling the order of execution in the control flow. In this post, the focus will be on the dataflow, and controlling how the data in the pipeline flows from one component to the next. This post uses a new table as the target of the data flow, so you may want to review Part 2: Creating Tables using Biml and BimlScript to see how to create the table locally. The Biml below describes the table. You can create it in the database of your choice – I used a database named Target.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/> <OleDbConnection Name="Target" ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/> </Connections> <Tables> <Table Name="DimAccount_Test" ConnectionName="Target"> <Columns> <Column Name="AccountKey" /> <Column Name="ParentAccountKey" IsNullable="true" /> <Column Name="AccountCodeAlternateKey" IsNullable="true" /> <Column Name="ParentAccountCodeAlternateKey" IsNullable="true" /> <Column Name="AccountDescription" DataType="String" Length="50" IsNullable="true" /> <Column Name="AccountType" DataType="String" Length="50" IsNullable="true" /> <Column Name="Operator" DataType="String" Length="50" IsNullable="true" /> <Column Name="CustomMembers" DataType="String" Length="300" IsNullable="true" /> <Column Name="ValueType" DataType="String" Length="50" IsNullable="true" /> <Column Name="CustomMemberOptions" DataType="String" Length="200" IsNullable="true" /> </Columns> </Table> </Tables> </Biml>
With the table created, we can move on to the interesting part – transforming the data. In a simple, straightforward data flow, the Biml compiler will do most of the work for you. Take this data flow as an example:
<Dataflow Name="Dataflow 1"> <Transformations> <OleDbSource Name="Source" ConnectionName="Source"> <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput> </OleDbSource> <OleDbDestination Name="Target" ConnectionName="Target"> <ExternalTableOutput Table="dbo.DimAccount_Test"/> </OleDbDestination> </Transformations> </Dataflow>
In this case, you don’t have to specify any data paths. The Biml compiler will infer that the OleDbSource’s output should be connected to the input of the OleDbDestination.
The compiler is able to do this by using default outputs. In Biml, most components have a default output defined. In the absence of other information, the compiler will automatically connect the default output of a transformation to the input of the next component defined in the Biml. So, if we use a slightly more complex data flow, like this:
<Dataflow Name="Dataflow 2"> <Transformations> <OleDbSource Name="Source" ConnectionName="Source"> <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput> </OleDbSource> <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput"> <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput> <Inputs> <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/> </Inputs> </Lookup> <ConditionalSplit Name="Test ID Range"> <OutputPaths> <OutputPath Name="High ID"> <Expression>AccountKey >= 100</Expression> </OutputPath> </OutputPaths> </ConditionalSplit> <OleDbDestination Name="Target" ConnectionName="Target"> <ExternalTableOutput Table="dbo.DimAccount_Test"/> </OleDbDestination> </Transformations> </Dataflow>
We end up with a data flow that still automatically connects data paths between components. In this case, though, it’s probably not doing exactly what we want, since it’s just connecting the default outputs. The Conditional Split (“Test ID Range”) in this example is connected by the default output, but we want to use the”High ID” output to filter out IDs less than 100. In the case of the Lookup (“Check For Existing”), the default output being used is the “Match” output, but we only want the non-matched records, so that only new rows are inserted.
I explicitly choose the option BIDS to display the path Source Names for this screenshot – by default, they aren’t displayed in the generated package. You can change the setting in BIDS by selecting the path, opening the Properties tool window, and changing the PathAnnotation property to SourceName.
So how would we change the Biml to get the desired results? If we add an InputPath element to the appropriate components, we can control which output is tied to the component’s input. In this case, we need to add explicit InputPath instructions to the Conditional Split (that will reference the Lookup’s NoMatch output) and to the OleDbDestination (which will reference the ConditionalSplit’s High ID output).
<Dataflow Name="Dataflow 3"> <Transformations> <OleDbSource Name="Source" ConnectionName="Source"> <DirectInput>SELECT * FROM dbo.DimAccount</DirectInput> </OleDbSource> <Lookup Name="Check For Existing" OleDbConnectionName="Target" NoMatchBehavior="RedirectRowsToNoMatchOutput"> <DirectInput>SELECT AccountKey FROM dbo.DimAccount</DirectInput> <Inputs> <Column SourceColumn="AccountKey" TargetColumn="AccountKey"/> </Inputs> </Lookup> <ConditionalSplit Name="Test ID Range"> <InputPath OutputPathName="Check For Existing.NoMatch"/> <OutputPaths> <OutputPath Name="High ID"> <Expression>AccountKey >= 100</Expression> </OutputPath> </OutputPaths> </ConditionalSplit> <OleDbDestination Name="Target" ConnectionName="Target"> <InputPath OutputPathName="Test ID Range.High ID"/> <ExternalTableOutput Table="dbo.DimAccount_Test"/> </OleDbDestination> </Transformations> </Dataflow>
This gives you the following data flow.
That’s a few examples of controlling the data paths in a data flow. There are a few other bits of information that are important to know about data paths in the data flow.
Most components have a default output named “Output”, and a second output named “Error” for the error output (if the component supports errors). The Multicast component has no default output, so you always need to explicitly define the data path mapping from it to the next component. The Union All, Merge, and Merge Join components need to be explicitly mapped, since they support multiple inputs. The Slowly Changing Dimension (SCD) transformation has multiple outputs. The “New” output is the default. There are also outputs named “Unchanged”, “FixedAttribute”, “ChangingAttribute”, “HistoricalAttribute”, and “InferredMember”. The Percentage Sampling and Row Sampling transformations have two output named “Selected” (the default) and “Unselected”. The sample Biml for this post is on my SkyDrive. Please download it and try it out with the latest release of BIDS Helper.
Share