Appendix B: Full SQL Implementation Code¶
This appendix contains the complete SQL implementation for generating month-end balance history.
File: balance_history_2024_q1.sql
/*
Task 3: Month-End Balance History
Goal: Show account balance history at the end of each month for Jan-Mar 2024.
Constraint: 100M records (requires efficient indexing/partitioning logic).
History Logic: Month-end balance is the last `new_balance` observed within a month (ordered by `tx_date`, tie-broken by `id`).
This implementation keeps months with no transactions as `NULL` (gaps are explicit via the account×month spine).
If business requirements expect a carry-forward "as-of" balance, extend the final select with a forward-fill of the last known balance.
Time semantics note: this query uses `tx_date` as the ordering/reporting timestamp; if the domain distinguishes transaction vs posting date,
month-end reporting should use the posting/cutoff timestamp.
Engine Note: syntax is standard ANSI/Presto SQL.
*/
-- 1. Generate the spine of all (Account, Month) combinations to ensure gaps are visible
WITH months AS (
SELECT CAST(date_column AS DATE) AS month_end
FROM (
VALUES
('2024-01-31'),
('2024-02-29'),
('2024-03-31')
) AS t (date_column)
),
unique_accounts AS (
SELECT DISTINCT account_id FROM transactions
-- WHERE tx_date >= '2024-01-01' -- Optimization: filter partition scan if possible
),
account_months AS (
SELECT
a.account_id,
m.month_end
FROM unique_accounts AS a
CROSS JOIN months AS m
),
-- 2. Find the last transaction for each account in each month
-- Efficient strategy: Window function partitioning by account/month
monthly_last_tx AS (
SELECT
account_id,
new_balance,
DATE_TRUNC('month', tx_date) AS tx_month_start,
ROW_NUMBER() OVER (
PARTITION BY account_id, DATE_TRUNC('month', tx_date)
ORDER BY tx_date DESC, id DESC -- id tie-breaker ensures deterministic result
) AS rn
FROM transactions
WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01'
)
-- 3. Join spine with data to produce final report
SELECT
am.account_id AS acct,
tx.new_balance AS balance, -- Formatting string may vary by dialect (Presto: format_datetime)
TO_CHAR(am.month_end, 'YYYY-MM') AS month
FROM account_months AS am
LEFT JOIN monthly_last_tx AS tx
ON
am.account_id = tx.account_id
AND DATE_TRUNC('month', am.month_end) = tx.tx_month_start
AND tx.rn = 1
ORDER BY am.account_id, am.month_end;
/*
Performance Considerations for 100M rows:
1. Partitioning: The 'transactions' table should be partitioned by tx_date (month/day).
This allows the query engine to prune partitions outside Jan-Mar 2024 immediately.
2. Clustering/Indexing: Data should be sorted/clustered by 'account_id' within partitions.
This makes the 'DISTINCT account_id' and window function operations much faster (avoids heavy shuffle).
*/
/*
Query Explanation:
This query produces month-end balance history for Q1 2024 (January, February, March).
1. Month-end balance definition:
- The balance at the end of a month is defined as the last transaction's `new_balance`
observed within that month (ordered by `tx_date` descending, tie-broken by `id`).
2. Account×month spine:
- The query generates a complete spine of all (account, month) combinations using
CROSS JOIN to ensure gaps are visible.
- Accounts with no activity in a month return NULL for balance (explicit gaps).
3. Window function approach:
- Uses ROW_NUMBER() partitioned by account and month to identify the last transaction.
- Efficient for large datasets (100M rows) as it avoids multiple scans.
4. Partition pruning:
- The WHERE clause filters to Jan-Mar 2024, enabling partition pruning on `tx_date`.
- This minimizes data scanned and improves query performance.
5. Engine portability:
- Uses standard ANSI/Presto SQL syntax.
- Compatible with Athena, BigQuery, Snowflake, and other modern SQL engines.
- Avoids engine-specific features for maximum portability.
Ownership & Governance (Task 3):
| Aspect | Owner | Responsibility |
| --- | --- | --- |
| **Gold Tables** | Business (Finance) | Analysts query Gold tables only for reporting |
| **Silver Tables** | Domain Teams | Technical users may query Silver for ad-hoc analysis |
| **Bronze Layer** Data Platform Team Not queried for analytics (raw audit trail only)
Rules:
- Analysts query Gold tables only (business contracts, stable schemas).
- Silver tables used by technical users for ad-hoc analysis and debugging.
- Bronze is not queried for analytics (immutable raw data, platform team access only).
*/ See Also¶
- Pseudocode:
SQL_PSEUDOCODE.md - Architecture Diagram:
SQL_DIAGRAM.md - Schema Definition:
schema.sql - Test Data:
tests/test_data.sql