Task 3: Month-End Balance History - Pseudocode¶
Overview¶
This document provides a high-level pseudocode representation of the SQL query for generating month-end balance history. The full SQL implementation is available in the appendix.
Query Flow¶
flowchart TB Start[Generate Month-End Balances] --> Months[Create Month Spine
Jan, Feb, Mar 2024] Months --> Accounts[Get Unique Accounts
From Transactions] Accounts --> Spine[Generate Account×Month
Cartesian Product] Spine --> LastTx[Find Last Transaction
Per Account Per Month] LastTx --> Balance[Calculate Running Balance
Using Window Functions] Balance --> Result[Final Result
Account, Month, Balance] style Start fill:#f57c00,color:#fff style Months fill:#7b1fa2,color:#fff style Accounts fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style LastTx fill:#7b1fa2,color:#fff style Balance fill:#388e3c,color:#fff style Result fill:#2e7d32,color:#fff ```text ## Query Objective Generate month-end account balances for Q1 2024 (January, February, March), showing NULL for months with no transactions. ## High-Level Algorithm ```text FUNCTION generate_month_end_balances(): // Step 1: Generate Month Spine months = [ '2024-01-31', '2024-02-29', '2024-03-31' ] // Step 2: Get Unique Accounts unique_accounts = SELECT DISTINCT account_id FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' // Step 3: Generate Account×Month Spine (Cartesian Product) account_months = CROSS_JOIN(unique_accounts, months) // Result: All combinations of (account_id, month_end) // Step 4: Find Last Transaction per Account per Month monthly_last_tx = SELECT account_id, new_balance, DATE_TRUNC('month', tx_date) AS tx_month_start, ROW_NUMBER() OVER ( PARTITION BY account_id, DATE_TRUNC('month', tx_date) ORDER BY tx_date DESC, id DESC ) AS row_number FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' // Filter to only the last transaction (row_number = 1) last_tx_per_month = FILTER monthly_last_tx WHERE row_number = 1 // Step 5: Join Spine with Data result = SELECT account_months.account_id AS acct, last_tx_per_month.new_balance AS balance, TO_CHAR(account_months.month_end, 'YYYY-MM') AS month FROM account_months LEFT JOIN last_tx_per_month ON account_months.account_id = last_tx_per_month.account_id AND DATE_TRUNC('month', account_months.month_end) = last_tx_per_month.tx_month_start // Step 6: Order Results ORDER BY account_months.account_id, account_months.month_end RETURN result END FUNCTION ```text ## Detailed Step-by-Step Logic ### Step 1: Month Spine Generation ```text CREATE months AS ( VALUES ('2024-01-31'), ('2024-02-29'), ('2024-03-31') ) // Purpose: Define the reporting months explicitly ```text ### Step 2: Unique Accounts Extraction ```text CREATE unique_accounts AS ( SELECT DISTINCT account_id FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' ) // Purpose: Get all accounts that had activity in Q1 2024 // Optimization: Partition pruning on tx_date ```text ### Step 3: Account×Month Spine (Cartesian Product) ```text CREATE account_months AS ( SELECT unique_accounts.account_id, months.month_end FROM unique_accounts CROSS JOIN months ) // Purpose: Generate complete spine of (account, month) combinations // Ensures gaps (months with no transactions) are visible as NULL ```text ### Step 4: Last Transaction Identification ```text CREATE monthly_last_tx AS ( SELECT account_id, new_balance, DATE_TRUNC('month', tx_date) AS tx_month_start, ROW_NUMBER() OVER ( PARTITION BY account_id, DATE_TRUNC('month', tx_date) ORDER BY tx_date DESC, id DESC ) AS row_number FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' ) // Purpose: Identify the last transaction per account per month // Window Function: ROW_NUMBER() partitions by account and month // Ordering: tx_date DESC (most recent first), id DESC (tie-breaker) // Filter: Keep only row_number = 1 (last transaction) ```text ### Step 5: Join Spine with Data ```text SELECT account_months.account_id AS acct, monthly_last_tx.new_balance AS balance, TO_CHAR(account_months.month_end, 'YYYY-MM') AS month FROM account_months LEFT JOIN monthly_last_tx ON account_months.account_id = monthly_last_tx.account_id AND DATE_TRUNC('month', account_months.month_end) = monthly_last_tx.tx_month_start AND monthly_last_tx.row_number = 1 // Purpose: Match account×month combinations with their last transaction // LEFT JOIN: Preserves all account×month combinations (NULL for missing months) ```text ## Data Flow Diagram (Conceptual) ```text Input: transactions table ↓ [Filter: tx_date in Q1 2024] ↓ ├─→ [Extract Unique Accounts] → account_months (spine) └─→ [Find Last TX per Month] → monthly_last_tx ↓ [Window Function: ROW_NUMBER] ↓ [Filter: row_number = 1] ↓ [LEFT JOIN with spine] ↓ Output: (account_id, balance, month) - balance = NULL if no transaction in that month ```text ## Key Design Decisions 1. **Spine Generation**: CROSS JOIN ensures all account×month combinations are present, making gaps explicit 1. **Window Function**: ROW_NUMBER() efficiently identifies last transaction without multiple scans 1. **LEFT JOIN**: Preserves all account×month combinations, showing NULL for months with no activity 1. **Partition Pruning**: WHERE clause on tx_date enables partition pruning for 100M row tables 1. **Deterministic Ordering**: id DESC as tie-breaker ensures consistent results ## Performance Optimizations ```text FOR 100M row tables: 1. Partitioning Strategy: - Table partitioned by tx_date (month/day) - WHERE clause enables partition pruning - Only scans Jan-Mar 2024 partitions 2. Clustering/Indexing: - Data clustered by account_id within partitions - Speeds up DISTINCT and window function operations - Reduces shuffle operations 3. Window Function Efficiency: - Single pass over data (no multiple scans) - Partitioned by account_id and month - Ordered by tx_date DESC for last transaction 4. Join Strategy: - LEFT JOIN preserves spine (no data loss) - Join on account_id and month (indexed columns) ```text ## Expected Output Format ```text | acct | balance | month | | --- | --- | --- | | 1001 | 1500.00 | 2024-01 | | 1001 | 2000.00 | 2024-02 | | 1001 | NULL | 2024-03 | | 1002 | 500.00 | 2024-01 | | 1002 | NULL | 2024-02 | | 1002 | 750.00 | 2024-03 | ...
Jan, Feb, Mar 2024] Months --> Accounts[Get Unique Accounts
From Transactions] Accounts --> Spine[Generate Account×Month
Cartesian Product] Spine --> LastTx[Find Last Transaction
Per Account Per Month] LastTx --> Balance[Calculate Running Balance
Using Window Functions] Balance --> Result[Final Result
Account, Month, Balance] style Start fill:#f57c00,color:#fff style Months fill:#7b1fa2,color:#fff style Accounts fill:#7b1fa2,color:#fff style Spine fill:#f57c00,color:#fff style LastTx fill:#7b1fa2,color:#fff style Balance fill:#388e3c,color:#fff style Result fill:#2e7d32,color:#fff ```text ## Query Objective Generate month-end account balances for Q1 2024 (January, February, March), showing NULL for months with no transactions. ## High-Level Algorithm ```text FUNCTION generate_month_end_balances(): // Step 1: Generate Month Spine months = [ '2024-01-31', '2024-02-29', '2024-03-31' ] // Step 2: Get Unique Accounts unique_accounts = SELECT DISTINCT account_id FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' // Step 3: Generate Account×Month Spine (Cartesian Product) account_months = CROSS_JOIN(unique_accounts, months) // Result: All combinations of (account_id, month_end) // Step 4: Find Last Transaction per Account per Month monthly_last_tx = SELECT account_id, new_balance, DATE_TRUNC('month', tx_date) AS tx_month_start, ROW_NUMBER() OVER ( PARTITION BY account_id, DATE_TRUNC('month', tx_date) ORDER BY tx_date DESC, id DESC ) AS row_number FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' // Filter to only the last transaction (row_number = 1) last_tx_per_month = FILTER monthly_last_tx WHERE row_number = 1 // Step 5: Join Spine with Data result = SELECT account_months.account_id AS acct, last_tx_per_month.new_balance AS balance, TO_CHAR(account_months.month_end, 'YYYY-MM') AS month FROM account_months LEFT JOIN last_tx_per_month ON account_months.account_id = last_tx_per_month.account_id AND DATE_TRUNC('month', account_months.month_end) = last_tx_per_month.tx_month_start // Step 6: Order Results ORDER BY account_months.account_id, account_months.month_end RETURN result END FUNCTION ```text ## Detailed Step-by-Step Logic ### Step 1: Month Spine Generation ```text CREATE months AS ( VALUES ('2024-01-31'), ('2024-02-29'), ('2024-03-31') ) // Purpose: Define the reporting months explicitly ```text ### Step 2: Unique Accounts Extraction ```text CREATE unique_accounts AS ( SELECT DISTINCT account_id FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' ) // Purpose: Get all accounts that had activity in Q1 2024 // Optimization: Partition pruning on tx_date ```text ### Step 3: Account×Month Spine (Cartesian Product) ```text CREATE account_months AS ( SELECT unique_accounts.account_id, months.month_end FROM unique_accounts CROSS JOIN months ) // Purpose: Generate complete spine of (account, month) combinations // Ensures gaps (months with no transactions) are visible as NULL ```text ### Step 4: Last Transaction Identification ```text CREATE monthly_last_tx AS ( SELECT account_id, new_balance, DATE_TRUNC('month', tx_date) AS tx_month_start, ROW_NUMBER() OVER ( PARTITION BY account_id, DATE_TRUNC('month', tx_date) ORDER BY tx_date DESC, id DESC ) AS row_number FROM transactions WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01' ) // Purpose: Identify the last transaction per account per month // Window Function: ROW_NUMBER() partitions by account and month // Ordering: tx_date DESC (most recent first), id DESC (tie-breaker) // Filter: Keep only row_number = 1 (last transaction) ```text ### Step 5: Join Spine with Data ```text SELECT account_months.account_id AS acct, monthly_last_tx.new_balance AS balance, TO_CHAR(account_months.month_end, 'YYYY-MM') AS month FROM account_months LEFT JOIN monthly_last_tx ON account_months.account_id = monthly_last_tx.account_id AND DATE_TRUNC('month', account_months.month_end) = monthly_last_tx.tx_month_start AND monthly_last_tx.row_number = 1 // Purpose: Match account×month combinations with their last transaction // LEFT JOIN: Preserves all account×month combinations (NULL for missing months) ```text ## Data Flow Diagram (Conceptual) ```text Input: transactions table ↓ [Filter: tx_date in Q1 2024] ↓ ├─→ [Extract Unique Accounts] → account_months (spine) └─→ [Find Last TX per Month] → monthly_last_tx ↓ [Window Function: ROW_NUMBER] ↓ [Filter: row_number = 1] ↓ [LEFT JOIN with spine] ↓ Output: (account_id, balance, month) - balance = NULL if no transaction in that month ```text ## Key Design Decisions 1. **Spine Generation**: CROSS JOIN ensures all account×month combinations are present, making gaps explicit 1. **Window Function**: ROW_NUMBER() efficiently identifies last transaction without multiple scans 1. **LEFT JOIN**: Preserves all account×month combinations, showing NULL for months with no activity 1. **Partition Pruning**: WHERE clause on tx_date enables partition pruning for 100M row tables 1. **Deterministic Ordering**: id DESC as tie-breaker ensures consistent results ## Performance Optimizations ```text FOR 100M row tables: 1. Partitioning Strategy: - Table partitioned by tx_date (month/day) - WHERE clause enables partition pruning - Only scans Jan-Mar 2024 partitions 2. Clustering/Indexing: - Data clustered by account_id within partitions - Speeds up DISTINCT and window function operations - Reduces shuffle operations 3. Window Function Efficiency: - Single pass over data (no multiple scans) - Partitioned by account_id and month - Ordered by tx_date DESC for last transaction 4. Join Strategy: - LEFT JOIN preserves spine (no data loss) - Join on account_id and month (indexed columns) ```text ## Expected Output Format ```text | acct | balance | month | | --- | --- | --- | | 1001 | 1500.00 | 2024-01 | | 1001 | 2000.00 | 2024-02 | | 1001 | NULL | 2024-03 | | 1002 | 500.00 | 2024-01 | | 1002 | NULL | 2024-02 | | 1002 | 750.00 | 2024-03 | ...
Edge Cases Handled¶
- No Transactions in Month: Returns NULL balance (explicit gap)
- Multiple Transactions Same Day: Uses id DESC as tie-breaker
- No Transactions for Account: Account still appears in spine with NULL balances
- Transactions Outside Range: Filtered out by WHERE clause
See Also¶
- Full SQL Implementation:
APPENDIX_B_FULL_SQL_CODE.md - Schema Definition:
schema.sql - Test Data:
tests/test_data.sql