Varigence Blog
Implementing Row Condensing in BimlFlex
Efficient data processing is crucial in data warehousing, especially when managing historical data in Data Vault Satellites. Row condensing ensures that only meaningful changes are captured, eliminating redundant records. This guide details how to implement advanced row condensing during both the staging and Data Vault loading processes using BimlFlex.
Understanding Row Condensing
Row condensing involves processing data deltas to ensure that only genuine changes are loaded into historized tables. This process prevents the storage of unnecessary records, optimizing storage and query performance.
Consider the following scenario:
Integration Key | RowEffectiveFromDate | City | Country |
---|---|---|---|
awlt~540 | 2025-02-19 | Toronto | Canada |
awlt~540 | 2025-02-20 | Montreal | Canada |
awlt~540 | 2025-02-21 | Toronto | Canada |
In this example, the attribute City
changes from 'Toronto' to 'Montreal' and back to 'Toronto'. A simple SELECT DISTINCT
would not suffice to identify these changes accurately, as it would miss the reversion to 'Toronto'. Advanced row condensing logic is required to capture such 'A-B-A' changes effectively.
Implementing Row Condensing in BimlFlex
BimlFlex provides a robust framework to implement advanced row condensing through its metadata-driven approach. By configuring specific Settings
and utilizing BimlFlex's templating system, you can automate the detection and processing of true data changes. BimlFlex provide you with an option Delta Collapse Rows
to configure this for both the Persistent Staging
and Satellite
loads.
Sample Implementation
Below is a sample SQL implementation demonstrating how BimlFlex handles row condensing during the staging process:
SELECT STG.[Address_SK], STG.[Address_BK] AS [Integration Key], STG.[AddressID], STG.[AddressLine1], STG.[AddressLine2], STG.[City], STG.[StateProvince], STG.[CountryRegion], STG.[PostalCode], STG.[rowguid], STG.[ModifiedDate], STG.[RowEffectiveFromDate], STG.[RowAuditId], STG.[RowRecordSource], STG.[RowHash] INTO [#TMP_AWLT_01_Sales_01_Address_STAGE] FROM ( SELECT SRC.[Address_SK], SRC.[Address_BK] AS [Integration Key], SRC.[AddressID], SRC.[AddressLine1], SRC.[AddressLine2], SRC.[City], SRC.[StateProvince], SRC.[CountryRegion], SRC.[PostalCode], SRC.[rowguid], SRC.[ModifiedDate], SRC.[RowEffectiveFromDate], SRC.[RowAuditId], SRC.[RowRecordSource], SRC.[RowHash], LEAD(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LEAD_DATE_DIFF], LAG(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LAG_DATE_DIFF], COALESCE(LAG(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LAG_HASH_DIFF], COALESCE(LEAD(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LEAD_HASH_DIFF] FROM [#TMP_AWLT_01_Sales_01_Address_PRE_STAGE] AS SRC LEFT OUTER JOIN [#TMP_AWLT_01_Sales_01_Address_CURRENT] AS TGT ON SRC.[Address_BK] = TGT.[Address_BK] AND SRC.[RowEffectiveFromDate] <= TGT.[RowEffectiveFromDate] WHERE TGT.[Address_BK] IS NULL ) AS STG WHERE STG.[LAG_DATE_DIFF] IS NULL OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_DATE_DIFF] IS NOT NULL) OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_HASH_DIFF] = STG.[LEAD_HASH_DIFF]);
In this script:
- Window Functions:
LEAD
andLAG
functions are used to compare the current row'sRowHash
with the previous and next rows, based on theRowEffectiveFromDate
. - Change Detection: Rows are filtered to include only those where a genuine change is detected by comparing
RowHash
values. - Eliminating Redundancy: The
WHERE
clause ensures that only rows representing actual changes are processed further.
By incorporating this logic into your BimlFlex framework, you can automate the row condensing process, ensuring efficient and accurate data loading into your Persistent Staging
and Data Vault
.
Conclusion
Implementing advanced row condensing in BimlFlex streamlines your data integration processes by capturing only meaningful changes. Leveraging BimlFlex's metadata-driven approach and configurable templates, you can ensure your Data Vault remains efficient, accurate, and free from redundant data.
For more detailed information on configuring these attributes in BimlFlex, refer to the BimlFlex Settings Documentation.