Database Foundations & Landscape

Jun 02, 2025

Great. I’ll create an executive-level overview of database foundations tailored to senior and staff-level software engineers preparing for system design interviews. It will integrate both foundational concepts and recent trends (like cloud-native or serverless databases), with an emphasis on conceptual clarity and practical reasoning.

I’ll let you know as soon as it’s ready.

Database Foundations & Landscape

Why We Need Databases (The Core Problem)

Imagine keeping a bank’s ledger in a notebook. It works for one teller, but as transactions grow and multiple tellers work in parallel, consistency falls apart. We need a system that multiple people (or services) can use at scale without tripping over each other. This is the core role of a database: to safely store data, handle concurrent updates, and retrieve information efficiently – even as usage explodes. Databases shine where naive solutions (like simple file storage) break down under heavy load, complex queries, or errors. They ensure correctness (you don’t lose money in your account due to a crash) and performance (your social feed loads quickly despite millions of users). The challenge is that achieving this is hard: you must juggle reliability, speed, and scale all at once. As data grows, issues like slow queries, lost updates, or partial failures become serious. A solid mental model of database fundamentals – before diving into vendor specifics – helps in reasoning through these trade-offs on a whiteboard.

Building an Intuitive Mental Model

Think of a database as a highly organized library or ledger clerk for your app’s data. It doesn’t just put data on disk; it also enforces rules (no two users with the same username if that’s a key), coordinates multiple readers/writers (so one client’s update doesn’t corrupt another’s read), and optimizes access (like a library index speeding up book searches). This mental model guides many design decisions. For example, if you picture data as a ledger of transactions, you intuitively understand why atomicity (all or nothing updates) is vital – you wouldn’t want to subtract money from Alice without adding it to Bob in a transfer. Before we name-drop “ACID” or “NoSQL,” it helps to reason in these real-world terms. The goal is to internalize concepts: e.g., eventually, multiple notebooks need to sync up (eventual consistency) or a librarian might categorize books by subject or author (like choosing different data models and indexes). With this intuition, we can now explore technical foundations.

Data Models: Relational vs. NoSQL Families

Relational databases (SQL databases) organize data into tables (think Excel sheets) with predefined schemas. This schema-on-write approach means you decide the structure upfront and every entry conforms. It’s like a well-defined form you must fill out – great for consistency and complex querying (joins), but inflexible if your data is highly varied. In contrast, NoSQL databases embrace a variety of models for flexible or large-scale needs, often using schema-on-read (store data first, define structure when reading). The NoSQL family spans several categories:

Key-Value stores: The simplest – think of a giant dictionary or hash map. You get or put values by key. This scales well (easy to distribute by key), but you only retrieve by primary key (e.g., retrieving a user profile by ID is fast; querying by age is not without scanning). Example: a cache or a shopping cart store, where each user’s cart is a blob under their user ID.
Document databases: Store self-contained documents, often JSON. It’s like key-value, but the value has internal structure that can be queried. These allow semi-structured data (each document (record) can have different fields) – useful for flexible data like user profiles or blog posts. There’s no rigid schema; new fields can be added on the fly (schema evolves as you read it). This is great for fast-moving development. Example: a product catalog where items have varying attributes fits well in a document store (each item’s JSON has its own fields).
Wide-Column (Column-Family) stores: Inspired by Google’s Bigtable, these organize data into rows and dynamic columns grouped into families. It’s like a hybrid of tabular and key-value: each row has a primary key and can have millions of columns spread across families. Often backed by an LSM-tree storage (more on that below), these excel at high write throughput and horizontal scale. They are used for time-series and logging data (e.g., billions of sensor readings partitioned by device and time). Cassandra is a popular example here.
Graph databases: These treat relationships as first-class. Data is nodes and edges (e.g., “Alice FRIEND_OF Bob”). If your problem is about traversing connections (social networks, recommendation paths), graph DBs shine. A relational DB can do this with joins, but graph DBs optimize for many hops queries. For example, finding friends-of-friends in a social network or tracing supply chain dependencies is more natural with a graph model.
Search engines: Not always classified as “databases,” but they index unstructured text and provide keyword search with ranking. Systems like Elasticsearch use inverted indexes to find documents by words (like how Google indexes web pages). They complement databases for features like full-text search (e.g., searching product descriptions or log messages).
Time-series databases: Optimized for data with a timestamp (server metrics, IoT readings). They often use columnar storage and compression (since adjacent time values are similar) and make querying recent ranges efficient. They might also auto-aggregate old data. Example: monitoring systems storing CPU usage per minute.
Vector databases: A newer category for ML and AI applications. They store high-dimensional vectors (embeddings) and allow similarity searches (find “items” most similar to a given vector). Use case: semantic search or recommendation (finding similar documents or images via their embedding vectors). This is a niche but growing area in the era of AI.

Each of these NoSQL types has different strengths – e.g., key-value and wide-column prioritize simple operations and scalability, sometimes at the cost of complex query capability or strict consistency. Relational databases, on the other hand, excel at structured queries and multi-object transactions, but historically were harder to scale out horizontally (though modern SQL engines and NewSQL have mitigated this). A useful way to think of it: Relational is schema-first, great for consistent, interrelated data (like a banking database with accounts and transfers), whereas NoSQL is often schema-flexible, great for fast-evolving or massive-scale data (like capturing billions of web click events or iterating quickly on a new app’s data model). Many systems today use a mix: e.g., user accounts in Postgres, activity logs in Cassandra, caching in Redis (a key-value), and maybe a search index in Elasticsearch – each chosen for what it does best.

Consistency and CAP: ACID vs. BASE Models

When we say a database transaction is ACID, we mean it guarantees a set of properties that keep data correct despite errors or crashes:

Atomicity: All-or-nothing execution. Using a banking analogy, either both the debit from one account and credit to another happen, or neither does – no half-completed transfers.
Consistency: Every transaction brings the database from one valid state to another, enforcing all rules (constraints). In simple terms, a transaction can’t break the invariants – e.g., a transfer that would create money from nowhere will be rolled back. (Note: In ACID, consistency refers to internal consistency with rules, not to be confused with the distributed sense of consistency in CAP).
Isolation: Concurrent transactions don’t interfere. It appears as if transactions ran one-by-one, even if under the hood they interleave. If two people edit a wiki page at the same time, isolation ensures each sees the data as if their changes applied in some order, not garbled together.
Durability: Once a transaction commits, it’s not lost, even if the power fails immediately after. The data will survive restarts (achieved via techniques like Write-Ahead Logging as discussed later).

Relational databases typically adhere to ACID, which makes life easier when correctness matters – you trust the DB to handle consistency. However, in distributed systems and some NoSQL stores, strict ACID is relaxed in favor of availability or performance.

This is where BASE comes in – an alternative philosophy often used in large-scale NoSQL systems. BASE stands for Basically Available, Soft state, Eventually consistent:

Basically Available: The system prioritizes being up and responsive (even if some data might be slightly stale). For example, during a traffic spike, an e-commerce site using a BASE system might allow reads and writes to continue for customers even if inventory counts aren’t immediately updated everywhere – the system doesn’t force everything to wait.
Soft State: Data may not be in a perfectly stable state at all times. It might be in flux due to asynchronous replication or updates. There isn’t an immediate guarantee that all replicas have the latest value – some tolerance for interim inconsistency is allowed.
Eventually Consistent: Given enough time (and no new updates), the system will converge to a consistent state. All replicas will eventually agree on the value. For instance, when you update your social media profile, not all users might see it immediately on a globally distributed network, but within a few seconds or minutes, everyone gets the updated profile. The data is eventually the same everywhere.

BASE sacrifices the “always consistent right now” guarantee for the sake of partition tolerance and uptime. This ties into the famous CAP Theorem: in a distributed system, you can’t have Consistency, Availability, and Partition tolerance all perfectly at once – you must choose to forgo one. Here, consistency in CAP means all nodes see the same data at the same time, availability means the system can respond (perhaps with older data) even when some nodes are down, and partition tolerance means the system tolerates network splits without total failure. CA systems (e.g., a single-instance relational DB or a sync-master setup) choose consistency+availability over partition tolerance – they might just fail or stop on a network split. AP systems (like Dynamo-style NoSQL stores) choose availability + partition tolerance, accepting stale reads during a partition. CP systems (like MongoDB or Redis in certain modes) choose consistency + partition tolerance, meaning during a partition they may sacrifice availability (e.g., some operations stall) to keep data consistent. The intuitive take-away: know your requirements. For a chat application, maybe it’s okay if a message order is slightly out of sync for a moment (BASE/AP) as long as the service is up; for a bank ledger, you likely want ACID/CP – it’s better to deny service or fail over than show incorrect balances. Good design is about picking the right spot in this spectrum – many modern databases even offer tunable consistency (for example, Cassandra can be configured how many replicas must confirm a write). In interviews, expressing this trade-off thinking (“what do we lose if we want zero downtime across regions?”) is key.

OLTP vs. OLAP Workloads (and the Rise of HTAP)

Not all database use cases are alike. Broadly, we differentiate OLTP (Online Transaction Processing) from OLAP (Online Analytical Processing). OLTP is the bread-and-butter of applications: transactional, fine-grained operations. Imagine an ATM system – each withdrawal is an OLTP transaction: small, quick, and touching only a few records, but the system handles thousands per second with low latency. Here, the emphasis is on latency (each operation must be fast) and concurrency (many operations at once). The data access pattern is typically random reads/writes of a few rows by key (e.g., fetch account by ID and update it).

OLAP, on the other hand, is about analytical queries on large data sets. Think of a business intelligence dashboard or a data warehouse query: “What was the total sales by region last quarter?” These queries crunch through millions of rows. Latency per query can be seconds or minutes, which is fine because they’re doing heavy lifting, and they’re often run by periodic reports or analysts, not live user actions. The emphasis is on throughput (scanning lots of data efficiently) and aggregations. OLAP queries often do full table scans or large joins but not very frequently, whereas OLTP does many small queries continuously.

Because of these differences, systems historically specialized. OLTP databases (e.g., MySQL, PostgreSQL) were row-oriented and optimized for writes and random reads, while OLAP databases (e.g., data warehouse appliances, or modern ones like ClickHouse) are often columnar storage and optimized for big scans and compressibility. It was common to extract data from OLTP systems into a separate OLAP system for analytics (the ETL process), because each is tuned so differently.

However, maintaining two systems is painful (data duplication, lag between OLTP and OLAP). This has led to HTAP (Hybrid Transaction/Analytical Processing) systems that try to do both. Modern cloud data platforms and some databases (TiDB, SingleStore, etc.) aim to let you run analytical queries on the same system that handles transactions – often by mixing storage engines or replicating data under the hood. For example, one approach is to have a row-store and a column-store behind a unified SQL interface, so OLTP queries hit the row storage and OLAP queries can automatically use a columnar replica. The rise of in-memory computing and distributed SQL (NewSQL) also feeds HTAP – using memory or fast networks to soften the traditional trade-off. The bottom line: understand the workload. If an interviewer says the system must handle user actions and heavy analytics, you might propose either splitting the workload or using an HTAP technology. At least mention how row vs. column storage could come into play: OLTP tends to favor row-based storage (all fields of a record together, good for writing a whole record or reading one user’s info) whereas OLAP loves columnar (each column stored contiguously, good for reading one attribute for many records, like summing sales for all rows).

Storage Engines and Indexes: Rows, Columns, and Trees

Physical storage layout has a huge impact on performance. A row-store keeps all columns of a row together on disk. This is great for OLTP – when you fetch a row by its primary key, you get all its info in one disk I/O. A column-store keeps values of each column together (e.g., all customer names in one segment, all customer ages in another). This shines for OLAP – if you run SELECT SUM(amount) FROM sales WHERE region='West', a column-store can read just the amount and region columns, greatly reducing I/O and using CPU-efficient vectorized processing. Column stores also compress data well (adjacent values are similar), so they can scan millions of values in-memory quickly. The trade-off is that writing a new record (which spans many columns) is costlier, and reading an entire entity (all columns) is slower if data is in pieces. Many analytical DBs mitigate this with tricks like batching writes and partitioning.

Within either layout, we have to consider heap files vs. indexed (clustered) storage. A heap file is an unsorted pile of records – new inserts just go to any free space. It’s simple but finding data requires an index or full scan. A clustered index means the main table storage is ordered by a key (e.g., sorted by customer_id). This can make range queries on that key very fast (all records with customer_id between X and Y are contiguous on disk) and also inherently gives you an index on that key (usually a B-tree). The downside is that maintaining that order on inserts can cost extra work (might need to split pages, etc.), and you typically only get one clustered order (others must be secondary indexes). For example, InnoDB (MySQL’s engine) clusters by primary key. If your access pattern aligns (range scans by primary key), it’s a win. If not, it’s no worse than a heap with an index.

Secondary indexes are additional structures that allow fast lookup by other columns. Think of an index in a book that maps a term to pages. Commonly, databases use B-tree indexes for this. A B-tree is a balanced search tree that keeps keys sorted and allows logarithmic time search. It’s optimized for the block-based storage of disks: each node (page) might contain many keys, minimizing disk hits. B-trees excel at range queries and point lookups on sorted data. If you query “give me all users with name starting ‘Sm...’”, a B-tree on the name field can quickly seek to the first Sm and then read in order. Most relational DB indexes are B+ trees (a variant where all data is in leaf nodes and leaves are linked for in-order traversal).

An alternative found in many NoSQL/newer systems is the LSM (Log-Structured Merge) tree. An LSM isn’t a single tree but a design that defers sorting: new data is first buffered (in memory, sorted structure), and periodically flushed to disk in sequential order. Disk storage ends up as multiple sorted runs that are merged in the background. Writes are very fast (sequential I/O) and can be buffered, which is why LSM-based stores (like Cassandra, RocksDB, Scylla) can sustain high write rates. Reads, however, may need to check multiple sorted files and merge results (mitigated by structures like Bloom filters and partition indexes). LSMs tend to have higher write throughput but potentially more read amplification (data might be in several places). B-trees do in-place updates (turning random writes on disk) which is slower on write but faster to read (data is in one structure). With modern SSDs (which handle random I/O much better than HDDs), the B-tree vs LSM trade-off line has moved: random writes are less costly than before, though LSMs still shine for write-heavy workloads or where sequential access is paramount.

Databases also use hash indexes in some cases. A hash index is like a hash table on disk: very fast point lookups (given exactly a key, it tells you where the data is), but no ordering – you can’t get ranges or sorted traversal easily. Some systems use hash indexes for primary keys (Oracle has optional hash clusters; some NoSQL caches are essentially distributed hash tables). In interviews, if you mention index structures, remember: B-trees (good general-purpose, ordered), LSM (optimized for heavy writes, used in many big NoSQL engines), and hash (fast equality lookups, but limited use). Also note index maintenance costs – every index must be updated on writes, so more indexes = slower inserts.

Query Planning and Optimization

When you issue a query (e.g., an SQL SELECT with joins), the database must decide how to execute it. This is the job of the query planner/optimizer. It’s like Google Maps for data: it finds possible routes to get your answer and chooses the fastest by estimated cost. The planner uses statistics about your data: how many rows in a table, value distributions, etc., to guess things like “this filter will hit ~100 rows” or “joining table A and B this way will yield ~1M results”. Based on these estimates, it picks an execution plan: which indexes to use, which table to scan first, which join algorithm (nested loop, hash join, etc.). Modern optimizers are cost-based: they assign a cost to each step (based on I/O, CPU, etc.) and seek the lowest total cost plan.

A simple example: Suppose you have SELECT * FROM Orders o JOIN Customers c ON o.cust_id = c.id WHERE c.region = 'West'. The optimizer might consider using an index on Customers.region to find Western customers, then looping through their orders, versus scanning Orders and looking up each customer, etc. If stats say only 5% of customers are West, using the index and then join might be cheapest. If 90% are West, maybe a different approach (like scanning orders and filtering customers) is chosen.

Cardinality estimation – estimating how many rows pass through each step – is where errors often occur. If the optimizer misestimates (e.g., thinks a filter is highly selective when it’s not, or fails to account for correlated conditions), it might pick a suboptimal plan. A common pitfall: multi-column correlations. Imagine a table with WHERE state = 'CA' AND city = 'San Francisco'. The optimizer might treat these independently and think “state=CA returns 10% of rows, city=SF returns 1%, combined maybe 0.1%” – but in reality SF is in CA, so the second condition doesn’t reduce after the first. Misestimate leads to wrong join choices or index usage. Other pitfalls include outdated stats (the data changed, but stats didn’t, so the planner’s estimates are off) and non-indexable predicates (using a function on a column can thwart index use). Interviewers might probe this by asking, for example, “why might a database not use an index even if one exists for a query?” – expected answer: the optimizer decides the indexed plan isn’t actually cheaper, perhaps due to low selectivity or misestimation, or because reading from the index then table is actually more I/O than a full table scan.

The key point: the database tries to automate query decisions. For a system design discussion, you should be aware that adding an index doesn’t guarantee speed – the query planner must choose to use it, and it might not if the cost model says not to. Also, the complexity of optimization is why database engines are hard to build – as one paper joked, it’s “harder than rocket science” to get right. But as a designer, focus on conceptual behaviors: e.g., a large join across distributed nodes might need data shuffling (expensive), so maybe pre-partition data to avoid that. Those kinds of insights show you understand what the optimizer wants to do.

Durability and Fault Tolerance: WAL, Checkpoints, and Hardware

To guarantee durability (the “D” in ACID), databases use a Write-Ahead Log (WAL). The idea is simple: before applying any changes to the main data files, record the intent in an append-only log on disk. This log is fsynced (force-written) to disk on commit. If the database crashes, it can replay the log to redo any operations that were committed but not fully applied to the data files. The WAL ensures that even if the process dies mid-flight, there’s a record of what was supposed to happen, which will be completed upon recovery. This is analogous to saving your work in a journal – if the system goes down, you open the journal and reconcile any incomplete entries with the main ledger.

Checkpointing complements WAL: periodically the database will flush dirty pages (in-memory changes) to disk and mark a checkpoint in the log. This way, the log doesn’t grow forever – on recovery you only need to replay from the last checkpoint. Checkpoints balance recovery time vs runtime overhead.

fsync (or similar calls) are important because operating systems cache disk writes. A commit that isn’t flushed to physical storage can be lost if the OS crashes. So databases explicitly request flushes at critical moments (like commit) to ensure data hit stable storage. Proper use of fsync (and hardware that honors it) is crucial; infamous bugs and discussions have occurred when disks lie about durability (write caching) or when using certain file systems.

Modern hardware has influenced these designs. With traditional spinning HDDs, sequential writes (like WAL appends) were much faster than random writes (seeking all over a disk). This made WAL + periodic flush (mostly sequential) a big win. SSDs have no seek penalty, drastically improving random I/O performance. This makes things like random write B-tree updates more palatable and also makes fsync less costly (SSDs have fast internal caches), but the fundamental need for WAL hasn’t gone away. However, persistent memory (NVM) and ultra-fast SSDs (NVMe drives) are starting to blur memory and storage. Some systems can memory-map NVM and treat it as an always-persistent store, potentially eliminating the WAL (since you could update data “in place” durably if done carefully, or have a memory-speed log). These trends haven’t removed the need for careful durability logic, but they do change the performance calculus. For example, the cost of flushing 8KB to disk is millions of CPU cycles – but on an NVMe or NVM, it’s much less, so fsync is not as bottleneck-y as before. Also, SSDs reduce the read/write gap, so approaches like LSM vs B-tree become more workload-specific rather than strictly hardware-driven (an LSM on SSD still shines for massive writes, but the read penalty is less severe with a fast SSD, making B-trees competitive for many workloads).

In summary, durability in design interviews: mention WAL for sure (“on a transaction commit, the DB writes to a log on disk to guarantee it can recover”), mention that data pages are written later (and thus the need for checkpointing), and perhaps note that on modern cloud storage and SSDs, these operations are quite fast – and that cloud-native databases often take it further by replicating data to multiple nodes or regions (sometimes instead of or in addition to local fsync). For instance, some cloud DBs will replicate the WAL to another zone before confirming commit, adding a network durability layer (beyond our scope here, but good to hint at if high availability is a goal).

Bottlenecks and Golden Signals in Databases

Operating a database, you’ll want to watch certain key metrics (often called golden signals) to catch performance issues:

Cache Hit Ratio: This is the fraction of reads served from memory cache vs. going to disk. A high cache hit ratio (e.g. 99%) means most reads are fast (from RAM). A low ratio means the DB is frequently hitting disk, which is slow. If you see a drop in cache hit ratio, it suggests the working set doesn’t fit in memory or there’s sudden data growth. This often correlates with latency spikes. For example, in SQL Server a buffer cache hit ratio below ~90% is a red flag – the engine is under memory pressure and doing extra I/O.
Buffer Pool / Memory Pressure: Related to the above, if the buffer pool (database cache) is too small for the workload, you’ll see symptoms like high eviction rates, increased disk reads, or OS swapping. Monitoring how full the cache is and how often pages churn helps identify if you simply need more memory or better indexing (to touch less data).
Lock Contention: In transactional workloads, if many transactions compete for the same rows or tables, they queue up – this shows up as increased query time, timeouts, or specific metrics like lock wait time. It’s like too many shoppers and not enough checkout lines: some are waiting. High contention can often be solved by redesign (e.g., partitioning data so different transactions hit different “hot spots”) or by reducing the transaction scope. As a golden signal, if average latency is fine but P99 latency (see below) is high, it could be due to lock waits on a few unlucky transactions.
P99 Latency: Instead of just looking at average query time, engineers look at the 99th percentile latency (or 95th). This “tail latency” tells you how slow the slowest 1% of requests are. In a DB, you might have P99 = 500ms while average is 5ms. That indicates most queries are quick, but some are hitting a snag – perhaps a full table scan or waiting on a lock or disk stall. P99 latency is crucial in SLAs because users who hit that 1% still experience a slowdown. It often uncovers issues like occasionally poor query plans or background tasks (e.g., checkpoint I/O spikes) that affect some queries. By monitoring tail latency, you ensure the system is robust, not just fast “on average.” A good design answer will mention thinking about not just throughput but also predictable latency.

Other typical signals include throughput (QPS), error rates (e.g., deadlocks or failed transactions), and resource saturation (CPU, disk IOPS). But the ones above (memory hit ratio, locks, P99 latency) are great to mention for relational systems. For a distributed NoSQL, you might also mention replication lag or split brain occurrences as things to watch, but that’s beyond core fundamentals.

Database Landscape at a Glance (Engines and their Sweet Spots)

Below is a comparison of some popular database engines, highlighting their primary data model and the workloads they fit best:

Database Engine (Type)	Primary Data Model	Best-Fit Workload & Strengths
PostgreSQL, MySQL (Relational RDBMS)	Relational tables (SQL, schema-on-write, ACID); row-store by default	Traditional OLTP (transactions, structured data). Great for strong consistency and complex queries/joins. PostgreSQL shines with complex analytics and extensibility; MySQL is popular for read-heavy web workloads (often with read replicas).
MongoDB (Document NoSQL)	Document store (JSON-like documents, schema-flexible)	Flexible schema use cases and rapid development. Suited for semi-structured data (e.g., content management, user profiles). Can handle high read/write throughput with eventual consistency; now also offers ACID transactions on a smaller scale.
DynamoDB (Key-Value / Wide-Column NoSQL)	Key-value with optional document support (partition-key and sort-key model; schema-on-read)	Serverless, cloud-native scale-out store. Ideal for simple key-based access patterns at massive scale (e.g., user preferences, IoT data). Excels in availability and auto-scaling. Less optimal for complex queries beyond key lookups or aggregates.
Cassandra (Wide-Column NoSQL)	Wide-column store (partitioned rows with many columns; tunable consistency)	High write throughput, multi-datacenter replication. Great for time-series, logging, and use cases requiring append-heavy workloads. Trades strict consistency for availability (AP by default). Best when data model can be denormalized into partition-centric queries.
Neo4j (Graph DB)	Graph model (nodes, relationships with properties)	Connected data queries (e.g., social networks, recommendation engines). Optimized for traversals like “friends-of-friends” or graph analytics. Suitable when relationships are central and you need flexible, depth-first queries that are hard to express in SQL.
ClickHouse (Analytical Columnar DB)	Columnar storage, SQL interface (distributed OLAP engine)	OLAP / analytical workloads. Excels at aggregating large datasets in real-time (e.g., user event analytics, telemetry). High compression and vectorized execution give speed on big scans. Not designed for heavy transactional updates (append-oriented).

(Note: Many other notable engines exist; this table focuses on the ones requested. Each shines in a particular niche – choosing one in design depends on the data model and access patterns needed.)

What Interviewers Often Probe (Follow-up Questions)

When to Choose SQL vs NoSQL? – Interviewers may ask something like, “Given project X, would you use a relational DB or a NoSQL solution, and why?” Frame your answer around the data structure and requirements: If the data is highly relational and consistency is crucial (e.g., financial data), lean SQL. If the use case is massive scale or flexible schema (e.g., event logs, social media feed), a NoSQL store might fit better. Highlight trade-offs: NoSQL for horizontal scale and speed at the cost of complex querying or consistency, SQL for rich queries and transactions but requiring schema and vertical scaling (or more complex sharding).
How Would You Ensure Data Consistency and Availability? – This probes CAP theorem understanding. For example, “what happens if a network partition occurs?” or “how do we design for high availability?” A strong answer discusses replication and trade-offs: “In a partition, we might choose to still serve reads/writes in each partition (availability) and reconcile later (eventual consistency) – an AP approach, good for use cases where uptime matters more than absolute freshness. Or we pause writes on one side (CP approach) to keep global consistency for something like a bank ledger.” Tailor the strategy to the scenario given, and mention tools like leader-follower replication, consensus protocols, or multi-master with conflict resolution as needed.
Explain How Indexes or Queries Work Under the Hood. – You might get a question like “If this query is slow, what might be wrong?” or “How does an index speed up query X?” Here, demonstrate your mental model: talk about how a full table scan is O(n) but an index (B-tree) can do O(log n) lookup, how a missing index or an overly broad index can cause the optimizer to not use it. Also mention that too many indexes slow down writes. If asked about a specific scenario (say a JOIN is slow), say you’d check if join keys are indexed, or if the planner chose a bad join order due to stale stats. Show awareness of query plans – e.g., using EXPLAIN to troubleshoot.
What Happens on a Database Write? – A classic probing question to test depth. A great answer: “When we commit a transaction, the DB will first append the change to the WAL (to secure durability). It will then update the in-memory pages (and eventually those pages will be flushed to disk, either at commit in some systems or later at a checkpoint). The WAL ensures that if a crash happens, the DB can replay the log and not lose the committed transaction. Also, any necessary locks are obtained to ensure isolation, and are released after commit. The client gets an acknowledgment only after the log is safely persisted (and maybe replicated to backups, in some systems).” This kind of answer touches on locking (for isolation), logging (for durability), and perhaps mentions replication or f-sync behavior. Tailor the depth to your comfort – the goal is to show you understand there’s a lot happening (not just “the data is written to disk”).
Performance Tuning Signals – An interviewer might ask, “If the database is slowing down, what do you look at?” Use this chance to mention the “golden signals” above: e.g., “I’d check if we’re bottlenecked on I/O – is our cache hit rate dropping? That might mean our working set grew and we’re thrashing the disk. I’d also check for lock waits – maybe a long-running transaction is causing others to queue. And I’d definitely look at P99 latency and see if a few slow queries are dragging the average – those outliers might indicate a missing index or an expensive query plan.” This shows a systematic approach rather than guessing.

Final tip: Interviewers at the senior/staff level care about conceptual clarity. They might not expect you to recall obscure syntax, but they will probe if you truly understand the why behind databases designs. Focus on explaining trade-offs (e.g., “we choose X over Y because it improves throughput at the cost of latency, which in this scenario is acceptable”) and use analogies when appropriate. By mastering these fundamentals – from data models to durability – you’ll be equipped to tackle system design questions with confidence and authority. Good luck!

system-design

SerialReads

Database Foundations & Landscape

Database Foundations & Landscape

Why We Need Databases (The Core Problem)

Building an Intuitive Mental Model

Data Models: Relational vs. NoSQL Families

Consistency and CAP: ACID vs. BASE Models

OLTP vs. OLAP Workloads (and the Rise of HTAP)

Storage Engines and Indexes: Rows, Columns, and Trees

Query Planning and Optimization

Durability and Fault Tolerance: WAL, Checkpoints, and Hardware

Bottlenecks and Golden Signals in Databases

Database Landscape at a Glance (Engines and their Sweet Spots)

What Interviewers Often Probe (Follow-up Questions)