Data-Modeling & Transaction Mechanics Deck
Key trade-offs, patterns, and pitfalls every architect should recall on-demand.
Tap a question to reveal its answer.
1. Schema-on-write vs. schema-on-read — what’s the core distinction?
• Schema-on-write: Data must conform before it’s stored (e.g., relational tables).
• Schema-on-read: Structure is interpreted when data is consumed (e.g., JSON docs).
2. Why might you denormalize part of a relational schema?
Eliminate expensive joins and speed up read paths, accepting extra write complexity and possible inconsistencies.
3. Ledger vs. spreadsheet world-view?
• Ledger: Append-only log; flexible, audit-friendly; compute current state on demand.
• Spreadsheet: Current state table; reads cheap; harder to scale consistent updates.
4. Primary benefit of embedding sub-documents in a document DB?
All related data fetched or updated atomically in a single read/write.
5. What is a ‘hot partition’ and how do you defuse it?
Shard/key receiving disproportionate traffic; mitigate by re-hashing, adding key randomness, or tiered caching.
6. When are composite indexes preferred?
When common queries filter on the same ordered set of columns — one index covers multiple predicates.
7. Hash vs. B-tree index?
• Hash: O(1) equality lookup; no range scans.
• B-tree: Log-time equality & range; supports ORDER BY and BETWEEN.
8. Read Committed vs. Serializable isolation in one line each.
• RC: Blocks dirty reads only.
• Serializable: Guarantees the outcome equals some single-threaded order; prevents phantoms & lost updates.
9. Dirty read, non-repeatable read, phantom — which isolation levels stop each?
• Dirty read: blocked by RC+.
• Non-repeatable read: blocked by RR+.
• Phantom: blocked only by Serializable.
10. MVCC’s super-power?
Readers take a snapshot; they never block writers and vice versa—high read concurrency with snapshot consistency.
11. Two-phase locking (2PL) downside compared to MVCC?
Pessimistic locks can deadlock and stall readers; throughput drops under contention.
12. WAL: why ‘write ahead’?
Changes hit the sequential redo log before page flush; crash recovery replays the log to restore durability.
13. Group commit in a sentence.
Batch several transactions’ log flushes into one disk fsync, trading milliseconds of latency for higher throughput.
14. Checkpoint purpose?
Flush dirty pages & mark a log position so recovery only replays records after that point.
15. When to use triggers despite overhead?
Mandatory audit trails or integrity actions that must fire with every change, independent of application code.
16. Materialized view vs. ordinary view?
Materialized stores query results physically for fast reads; ordinary view is a stored query executed on demand.
17. Designing a Cassandra table: first rule?
Pick a partition key that evenly distributes data and matches primary query pattern; avoid multi-partition fan-out.
18. Why are secondary indexes rare in leaderless KV stores?
Index maintenance isn’t globally coordinated; queries would need scatter-gather across all nodes — expensive.
19. Top three transaction anomalies architects cite in interviews?
Dirty read, lost update, phantom read.
20. Single-sentence cheat sheet: Money transfer system?
Fully normalized ledger tables + account index, one Serializable transaction debiting and crediting rows atomically.
21. How do latches differ from locks?
Latches are short, in-memory, spin-based mutexes protecting internal data structures; locks enforce ACID isolation.
22. What is write amplification (index perspective)?
Each logical write also updates every affected index page, multiplying I/O cost.
23. Trigger cautionary tale?
‘Spaghetti triggers’: hidden cascades that balloon a tiny update into opaque, slow chain reactions — hard to debug.
24. Schema migration with zero downtime — headline steps?
Add new columns (backward-compatible), dual-write old+new, background backfill, flip reads, drop legacy fields.
25. Read replica caveat to call out?
Staleness — enforce read-your-writes or route user-critical reads to primary.