Task 3: Month-End Balance History - Architecture Diagram¶
High-Level Query Flow¶
flowchart TD Start([Query Start]) --> Filter[Filter Transactions
Q1 2024] Filter --> Branch{Two Paths} Branch -->|Path 1| UniqueAccts[Extract Unique Accounts] Branch -->|Path 2| LastTx[Find Last Transaction
per Account per Month] UniqueAccts --> Months[Generate Month Spine
Jan, Feb, Mar 2024] Months --> CrossJoin[CROSS JOIN
Account × Month] CrossJoin --> Spine[Account×Month Spine
Complete Coverage] LastTx --> Window[Window Function
ROW_NUMBER] Window -->|PARTITION BY account_id, month
ORDER BY tx_date DESC, id DESC| Ranked[Ranked Transactions] Ranked --> FilterLast[Filter row_number = 1] FilterLast --> LastTxData[Last TX per Month Data] Spine --> Join[LEFT JOIN] LastTxData --> Join Join -->|Match on account_id
and month| Result[Final Result
account_id, balance, month] Result --> Order[ORDER BY
account_id, month_end] Order --> Output([Query Output]) style Start fill:#f57c00,color:#fff style Filter fill:#7b1fa2,color:#fff style Branch fill:#fbc02d,color:#111 style UniqueAccts fill:#7b1fa2,color:#fff style LastTx fill:#7b1fa2,color:#fff style Months fill:#7b1fa2,color:#fff style CrossJoin fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style Window fill:#7b1fa2,color:#fff style Ranked fill:#7b1fa2,color:#fff style FilterLast fill:#7b1fa2,color:#fff style LastTxData fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Result fill:#388e3c,color:#fff style Order fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff
Q1 2024] Filter --> Branch{Two Paths} Branch -->|Path 1| UniqueAccts[Extract Unique Accounts] Branch -->|Path 2| LastTx[Find Last Transaction
per Account per Month] UniqueAccts --> Months[Generate Month Spine
Jan, Feb, Mar 2024] Months --> CrossJoin[CROSS JOIN
Account × Month] CrossJoin --> Spine[Account×Month Spine
Complete Coverage] LastTx --> Window[Window Function
ROW_NUMBER] Window -->|PARTITION BY account_id, month
ORDER BY tx_date DESC, id DESC| Ranked[Ranked Transactions] Ranked --> FilterLast[Filter row_number = 1] FilterLast --> LastTxData[Last TX per Month Data] Spine --> Join[LEFT JOIN] LastTxData --> Join Join -->|Match on account_id
and month| Result[Final Result
account_id, balance, month] Result --> Order[ORDER BY
account_id, month_end] Order --> Output([Query Output]) style Start fill:#f57c00,color:#fff style Filter fill:#7b1fa2,color:#fff style Branch fill:#fbc02d,color:#111 style UniqueAccts fill:#7b1fa2,color:#fff style LastTx fill:#7b1fa2,color:#fff style Months fill:#7b1fa2,color:#fff style CrossJoin fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style Window fill:#7b1fa2,color:#fff style Ranked fill:#7b1fa2,color:#fff style FilterLast fill:#7b1fa2,color:#fff style LastTxData fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Result fill:#388e3c,color:#fff style Order fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff
Window Function Detail¶
flowchart LR Input[Transactions Table
Filtered to Q1 2024] --> Window[Window Function] Window -->|PARTITION BY| Part1[account_id] Window -->|PARTITION BY| Part2[DATE_TRUNC month] Window -->|ORDER BY| Order1[tx_date DESC] Window -->|ORDER BY| Order2[id DESC] Part1 --> Calc[Calculate ROW_NUMBER] Part2 --> Calc Order1 --> Calc Order2 --> Calc Calc --> Ranked[Ranked Results
row_number per partition] Ranked --> Filter[Filter row_number = 1] Filter --> Output[Last Transaction
per Account per Month] style Input fill:#f57c00,color:#fff style Window fill:#7b1fa2,color:#fff style Part1 fill:#bdbdbd,color:#111 style Part2 fill:#bdbdbd,color:#111 style Order1 fill:#bdbdbd,color:#111 style Order2 fill:#bdbdbd,color:#111 style Calc fill:#7b1fa2,color:#fff style Ranked fill:#388e3c,color:#fff style Filter fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff
Filtered to Q1 2024] --> Window[Window Function] Window -->|PARTITION BY| Part1[account_id] Window -->|PARTITION BY| Part2[DATE_TRUNC month] Window -->|ORDER BY| Order1[tx_date DESC] Window -->|ORDER BY| Order2[id DESC] Part1 --> Calc[Calculate ROW_NUMBER] Part2 --> Calc Order1 --> Calc Order2 --> Calc Calc --> Ranked[Ranked Results
row_number per partition] Ranked --> Filter[Filter row_number = 1] Filter --> Output[Last Transaction
per Account per Month] style Input fill:#f57c00,color:#fff style Window fill:#7b1fa2,color:#fff style Part1 fill:#bdbdbd,color:#111 style Part2 fill:#bdbdbd,color:#111 style Order1 fill:#bdbdbd,color:#111 style Order2 fill:#bdbdbd,color:#111 style Calc fill:#7b1fa2,color:#fff style Ranked fill:#388e3c,color:#fff style Filter fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff
Data Transformation Pipeline¶
flowchart TD T1[transactions table
100M rows] --> Filter[WHERE tx_date
BETWEEN 2024-01-01
AND 2024-03-31] Filter --> Split{Two Processing Paths} Split -->|Path A| Distinct[SELECT DISTINCT
account_id] Split -->|Path B| Window[Window Function
ROW_NUMBER] Distinct --> CrossJoin[CROSS JOIN with
months VALUES] CrossJoin --> Spine[(Account×Month Spine)] Window --> Rank[Rank Transactions
by account_id, month] Rank --> Filter1[WHERE row_number = 1] Filter1 --> LastTx[(Last TX per Month)] Spine --> Join[LEFT JOIN] LastTx --> Join Join --> Final[Final Result Set
acct, balance, month] style T1 fill:#f57c00,color:#fff style Filter fill:#7b1fa2,color:#fff style Split fill:#fbc02d,color:#111 style Distinct fill:#7b1fa2,color:#fff style Window fill:#7b1fa2,color:#fff style CrossJoin fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style Rank fill:#7b1fa2,color:#fff style Filter1 fill:#7b1fa2,color:#fff style LastTx fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Final fill:#2e7d32,color:#fff
100M rows] --> Filter[WHERE tx_date
BETWEEN 2024-01-01
AND 2024-03-31] Filter --> Split{Two Processing Paths} Split -->|Path A| Distinct[SELECT DISTINCT
account_id] Split -->|Path B| Window[Window Function
ROW_NUMBER] Distinct --> CrossJoin[CROSS JOIN with
months VALUES] CrossJoin --> Spine[(Account×Month Spine)] Window --> Rank[Rank Transactions
by account_id, month] Rank --> Filter1[WHERE row_number = 1] Filter1 --> LastTx[(Last TX per Month)] Spine --> Join[LEFT JOIN] LastTx --> Join Join --> Final[Final Result Set
acct, balance, month] style T1 fill:#f57c00,color:#fff style Filter fill:#7b1fa2,color:#fff style Split fill:#fbc02d,color:#111 style Distinct fill:#7b1fa2,color:#fff style Window fill:#7b1fa2,color:#fff style CrossJoin fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style Rank fill:#7b1fa2,color:#fff style Filter1 fill:#7b1fa2,color:#fff style LastTx fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Final fill:#2e7d32,color:#fff
Join Logic Detail¶
flowchart LR Spine[Account×Month Spine
All Combinations] --> Join[LEFT JOIN] LastTx[Last TX per Month
Filtered Data] --> Join Join -->|ON| Cond1[account_id = account_id] Join -->|ON| Cond2[DATE_TRUNC month_end
= tx_month_start] Join -->|ON| Cond3[row_number = 1] Cond1 --> Match{Match Found?} Cond2 --> Match Cond3 --> Match Match -->|Yes| WithBalance[Row with balance] Match -->|No| WithNull[Row with NULL balance] WithBalance --> Result[Result Set] WithNull --> Result style Spine fill:#f57c00,color:#fff style LastTx fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Cond1 fill:#bdbdbd,color:#111 style Cond2 fill:#bdbdbd,color:#111 style Cond3 fill:#bdbdbd,color:#111 style Match fill:#fbc02d,color:#111 style WithBalance fill:#388e3c,color:#fff style WithNull fill:#757575,color:#fff style Result fill:#2e7d32,color:#fff
All Combinations] --> Join[LEFT JOIN] LastTx[Last TX per Month
Filtered Data] --> Join Join -->|ON| Cond1[account_id = account_id] Join -->|ON| Cond2[DATE_TRUNC month_end
= tx_month_start] Join -->|ON| Cond3[row_number = 1] Cond1 --> Match{Match Found?} Cond2 --> Match Cond3 --> Match Match -->|Yes| WithBalance[Row with balance] Match -->|No| WithNull[Row with NULL balance] WithBalance --> Result[Result Set] WithNull --> Result style Spine fill:#f57c00,color:#fff style LastTx fill:#1976d2,color:#fff style Join fill:#7b1fa2,color:#fff style Cond1 fill:#bdbdbd,color:#111 style Cond2 fill:#bdbdbd,color:#111 style Cond3 fill:#bdbdbd,color:#111 style Match fill:#fbc02d,color:#111 style WithBalance fill:#388e3c,color:#fff style WithNull fill:#757575,color:#fff style Result fill:#2e7d32,color:#fff
Example Data Flow¶
sequenceDiagram participant T as transactions table participant U as Unique Accounts participant M as Month Spine participant S as Account×Month Spine participant W as Window Function participant L as Last TX Data participant J as JOIN participant R as Result T->>U: SELECT DISTINCT account_id
WHERE tx_date in Q1 U->>M: CROSS JOIN months M->>S: Generate Spine T->>W: Window Function
PARTITION BY account, month
ORDER BY tx_date DESC W->>L: Filter row_number = 1 S->>J: LEFT JOIN L->>J: ON account_id AND month J->>R: Final Result
(acct, balance, month)
WHERE tx_date in Q1 U->>M: CROSS JOIN months M->>S: Generate Spine T->>W: Window Function
PARTITION BY account, month
ORDER BY tx_date DESC W->>L: Filter row_number = 1 S->>J: LEFT JOIN L->>J: ON account_id AND month J->>R: Final Result
(acct, balance, month)
Performance Optimization Strategy¶
flowchart TD Table[transactions table
Partitioned by tx_date] --> Prune[Partition Pruning] Prune -->|WHERE tx_date
BETWEEN 2024-01-01
AND 2024-03-31| Scan[Scan Only
Jan-Mar Partitions] Scan --> Cluster[Clustered by
account_id] Cluster --> Fast[Fast DISTINCT
Fast Window Function] Fast --> Index[Indexed Columns
account_id, tx_date] Index --> Join[Efficient JOIN] Join --> Result[Optimized Result] style Table fill:#f57c00,color:#fff style Prune fill:#7b1fa2,color:#fff style Scan fill:#7b1fa2,color:#fff style Cluster fill:#7b1fa2,color:#fff style Fast fill:#7b1fa2,color:#fff style Index fill:#7b1fa2,color:#fff style Join fill:#7b1fa2,color:#fff style Result fill:#2e7d32,color:#fff
Partitioned by tx_date] --> Prune[Partition Pruning] Prune -->|WHERE tx_date
BETWEEN 2024-01-01
AND 2024-03-31| Scan[Scan Only
Jan-Mar Partitions] Scan --> Cluster[Clustered by
account_id] Cluster --> Fast[Fast DISTINCT
Fast Window Function] Fast --> Index[Indexed Columns
account_id, tx_date] Index --> Join[Efficient JOIN] Join --> Result[Optimized Result] style Table fill:#f57c00,color:#fff style Prune fill:#7b1fa2,color:#fff style Scan fill:#7b1fa2,color:#fff style Cluster fill:#7b1fa2,color:#fff style Fast fill:#7b1fa2,color:#fff style Index fill:#7b1fa2,color:#fff style Join fill:#7b1fa2,color:#fff style Result fill:#2e7d32,color:#fff
Output Structure¶
graph TD Result[Query Result] --> Col1[acct
account_id] Result --> Col2[balance
new_balance or NULL] Result --> Col3[month
YYYY-MM format] Col1 --> Sort[ORDER BY
account_id, month_end] Col2 --> Sort Col3 --> Sort Sort --> Output[Final Output] Output --> Example1[1001, 1500.00, 2024-01] Output --> Example2[1001, 2000.00, 2024-02] Output --> Example3[1001, NULL, 2024-03] style Result fill:#388e3c,color:#fff style Col1 fill:#bdbdbd,color:#111 style Col2 fill:#bdbdbd,color:#111 style Col3 fill:#bdbdbd,color:#111 style Sort fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff style Example1 fill:#bdbdbd,color:#111 style Example2 fill:#bdbdbd,color:#111 style Example3 fill:#757575,color:#fff
account_id] Result --> Col2[balance
new_balance or NULL] Result --> Col3[month
YYYY-MM format] Col1 --> Sort[ORDER BY
account_id, month_end] Col2 --> Sort Col3 --> Sort Sort --> Output[Final Output] Output --> Example1[1001, 1500.00, 2024-01] Output --> Example2[1001, 2000.00, 2024-02] Output --> Example3[1001, NULL, 2024-03] style Result fill:#388e3c,color:#fff style Col1 fill:#bdbdbd,color:#111 style Col2 fill:#bdbdbd,color:#111 style Col3 fill:#bdbdbd,color:#111 style Sort fill:#7b1fa2,color:#fff style Output fill:#2e7d32,color:#fff style Example1 fill:#bdbdbd,color:#111 style Example2 fill:#bdbdbd,color:#111 style Example3 fill:#757575,color:#fff
Error Handling & Edge Cases¶
flowchart TD Input[Input Data] --> Check1{Has Transactions
in Q1?} Check1 -->|No| Empty[Return Empty Result] Check1 -->|Yes| Check2{Has Accounts?} Check2 -->|No| Empty Check2 -->|Yes| Process[Process Query] Process --> Edge1{Multiple TX
Same Day?} Edge1 -->|Yes| TieBreak[Use id DESC
as tie-breaker] Edge1 -->|No| Continue TieBreak --> Continue Continue --> Edge2{No TX in
Specific Month?} Edge2 -->|Yes| NullBalance[Return NULL balance] Edge2 -->|No| ValidBalance[Return balance] NullBalance --> Output ValidBalance --> Output Output([Final Output]) style Input fill:#f57c00,color:#fff style Check1 fill:#fbc02d,color:#111 style Check2 fill:#fbc02d,color:#111 style Empty fill:#d32f2f,color:#fff style Process fill:#7b1fa2,color:#fff style Edge1 fill:#fbc02d,color:#111 style TieBreak fill:#7b1fa2,color:#fff style Continue fill:#7b1fa2,color:#fff style Edge2 fill:#fbc02d,color:#111 style NullBalance fill:#ffa000,color:#111 style ValidBalance fill:#388e3c,color:#fff style Output fill:#2e7d32,color:#fff
in Q1?} Check1 -->|No| Empty[Return Empty Result] Check1 -->|Yes| Check2{Has Accounts?} Check2 -->|No| Empty Check2 -->|Yes| Process[Process Query] Process --> Edge1{Multiple TX
Same Day?} Edge1 -->|Yes| TieBreak[Use id DESC
as tie-breaker] Edge1 -->|No| Continue TieBreak --> Continue Continue --> Edge2{No TX in
Specific Month?} Edge2 -->|Yes| NullBalance[Return NULL balance] Edge2 -->|No| ValidBalance[Return balance] NullBalance --> Output ValidBalance --> Output Output([Final Output]) style Input fill:#f57c00,color:#fff style Check1 fill:#fbc02d,color:#111 style Check2 fill:#fbc02d,color:#111 style Empty fill:#d32f2f,color:#fff style Process fill:#7b1fa2,color:#fff style Edge1 fill:#fbc02d,color:#111 style TieBreak fill:#7b1fa2,color:#fff style Continue fill:#7b1fa2,color:#fff style Edge2 fill:#fbc02d,color:#111 style NullBalance fill:#ffa000,color:#111 style ValidBalance fill:#388e3c,color:#fff style Output fill:#2e7d32,color:#fff