Varigence Blog

Implementing Row Condensing in BimlFlex

Written by Peter Avenant on 2.20.2025

Efficient data processing is crucial in data warehousing, especially when managing historical data in Data Vault Satellites. Row condensing ensures that only meaningful changes are captured, eliminating redundant records. This guide details how to implement advanced row condensing during both the staging and Data Vault loading processes using BimlFlex.

Understanding Row Condensing

Row condensing involves processing data deltas to ensure that only genuine changes are loaded into historized tables. This process prevents the storage of unnecessary records, optimizing storage and query performance.

Consider the following scenario:

Integration Key RowEffectiveFromDate City Country
awlt~540 2025-02-19 Toronto Canada
awlt~540 2025-02-20 Montreal Canada
awlt~540 2025-02-21 Toronto Canada

In this example, the attribute City changes from 'Toronto' to 'Montreal' and back to 'Toronto'. A simple SELECT DISTINCT would not suffice to identify these changes accurately, as it would miss the reversion to 'Toronto'. Advanced row condensing logic is required to capture such 'A-B-A' changes effectively.

Implementing Row Condensing in BimlFlex

BimlFlex provides a robust framework to implement advanced row condensing through its metadata-driven approach. By configuring specific Settings and utilizing BimlFlex's templating system, you can automate the detection and processing of true data changes. BimlFlex provide you with an option Delta Collapse Rows to configure this for both the Persistent Staging and Satellite loads.

Sample Implementation

Below is a sample SQL implementation demonstrating how BimlFlex handles row condensing during the staging process:

SELECT
    STG.[Address_SK],
    STG.[Address_BK] AS [Integration Key],
    STG.[AddressID],
    STG.[AddressLine1],
    STG.[AddressLine2],
    STG.[City],
    STG.[StateProvince],
    STG.[CountryRegion],
    STG.[PostalCode],
    STG.[rowguid],
    STG.[ModifiedDate],
    STG.[RowEffectiveFromDate],
    STG.[RowAuditId],
    STG.[RowRecordSource],
    STG.[RowHash]
INTO
    [#TMP_AWLT_01_Sales_01_Address_STAGE]
FROM
(
    SELECT
        SRC.[Address_SK],
        SRC.[Address_BK] AS [Integration Key],
        SRC.[AddressID],
        SRC.[AddressLine1],
        SRC.[AddressLine2],
        SRC.[City],
        SRC.[StateProvince],
        SRC.[CountryRegion],
        SRC.[PostalCode],
        SRC.[rowguid],
        SRC.[ModifiedDate],
        SRC.[RowEffectiveFromDate],
        SRC.[RowAuditId],
        SRC.[RowRecordSource],
        SRC.[RowHash],
        LEAD(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LEAD_DATE_DIFF],
        LAG(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LAG_DATE_DIFF],
        COALESCE(LAG(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LAG_HASH_DIFF],
        COALESCE(LEAD(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LEAD_HASH_DIFF]
    FROM
        [#TMP_AWLT_01_Sales_01_Address_PRE_STAGE] AS SRC
    LEFT OUTER JOIN
        [#TMP_AWLT_01_Sales_01_Address_CURRENT] AS TGT
        ON SRC.[Address_BK] = TGT.[Address_BK]
        AND SRC.[RowEffectiveFromDate] <= TGT.[RowEffectiveFromDate]
    WHERE
        TGT.[Address_BK] IS NULL
) AS STG
WHERE
    STG.[LAG_DATE_DIFF] IS NULL
    OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_DATE_DIFF] IS NOT NULL)
    OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_HASH_DIFF] = STG.[LEAD_HASH_DIFF]);

In this script:

  • Window Functions: LEAD and LAG functions are used to compare the current row's RowHash with the previous and next rows, based on the RowEffectiveFromDate.
  • Change Detection: Rows are filtered to include only those where a genuine change is detected by comparing RowHash values.
  • Eliminating Redundancy: The WHERE clause ensures that only rows representing actual changes are processed further.

By incorporating this logic into your BimlFlex framework, you can automate the row condensing process, ensuring efficient and accurate data loading into your Persistent Staging and Data Vault.

Conclusion

Implementing advanced row condensing in BimlFlex streamlines your data integration processes by capturing only meaningful changes. Leveraging BimlFlex's metadata-driven approach and configurable templates, you can ensure your Data Vault remains efficient, accurate, and free from redundant data.

For more detailed information on configuring these attributes in BimlFlex, refer to the BimlFlex Settings Documentation.