SerialReads

Cache Consistency & Invalidation (System Design Deep Dive)

Jun 07, 2025

Cache Consistency & Invalidation (System Design Deep Dive)

Caching is a powerful technique to improve system performance and scalability, but it introduces challenges in consistency and invalidation. In a system design interview, understanding how to keep cache data in sync with the source of truth (usually a database) is crucial. This deep dive covers key concepts from the consistency spectrum to practical invalidation strategies, helping intermediate-to-advanced engineers recall important points and avoid common pitfalls.

Consistency Spectrum in Caching

In distributed caches, different consistency models determine how fresh the cached data is relative to the source. At one end, strong consistency ensures every read returns the most recent write – essentially, the cache behaves like the source of truth. On the other end, eventual consistency allows reads to get stale data temporarily, with the guarantee that if no new updates occur, all caches will eventually converge to the latest state. Between these extremes lie models like bounded staleness and monotonic reads. Bounded staleness guarantees that data in cache won’t be older than a certain time window or number of updates (a defined “staleness” limit). Monotonic reads ensure a client never sees data move backwards in time – once you’ve read a newer value, you won’t later read an older value from cache. There are also session-based guarantees such as read-your-writes (after you update something, your subsequent reads see your update) and causal consistency (observing cause-effect ordering of writes). Achieving stronger consistency often requires more complex cache update schemes or sacrifices in latency/availability, whereas weaker (eventual) consistency is easier but allows stale reads. In an interview, be ready to clarify which consistency level a caching approach provides, and the implications for users seeing up-to-date data.

TTL vs. Active Invalidation Strategies

A fundamental choice in cache design is between relying on expiration (TTL) and using active invalidation on data changes. Cache expiration uses a time-to-live on entries – once the TTL passes, the data is evicted or marked stale. This approach is simple and guarantees a bound on staleness (the TTL duration), but in the interim the cache may serve outdated data. Cache invalidation, on the other hand, actively removes or updates cache entries when the underlying data changes. This typically yields fresher data at the cost of more complexity. For instance, an e-commerce inventory cache should probably invalidate items as they sell (real-time changes), whereas a cache of blog posts can just expire every few hours. A quick comparison: invalidation is triggered by data changes and tends to be very accurate, while TTL expiration is time-based and may serve slightly stale data. Expiration is lighter-weight and easier to implement (the cache auto-expires items), whereas active invalidation requires coordination (such as sending messages or hooks on updates) and thus is more complex. In practice, many systems combine both approaches – for example, using long TTLs but also sending an invalidation event for critical changes. That way, the cache refreshes proactively on updates, but if an invalidation is missed, the TTL ensures the data isn’t stale forever. When discussing caching, mention whether you’d use a simple TTL (and how long) or hook into data change events via mechanisms like pub/sub, database triggers, or webhooks.

Active invalidation mechanisms often involve a publish/subscribe or callback system. For instance, a service might publish an “object X updated” event to a Redis Pub/Sub channel or Kafka topic, and all cache nodes subscribe and invalidate that key on hearing the event. Another approach is using change data capture (CDC): tools like Debezium can tap into database commit logs and automatically broadcast cache invalidation events on data change. This method is reliable and fast – since it reads the DB redo log, it won’t miss events, and it evicts caches nearly in real-time after each commit. Alternatively, external systems might send webhooks to your service when data updates, prompting your code to clear or update specific cache entries. The key trade-off: active invalidation minimizes stale data but adds complexity and potential points of failure (e.g. what if an invalidation message is lost?). Thus, design robustly: use fallbacks like TTL and consider monitoring for cache-stale reads.

Cache Write Strategies and Coherence

How writes propagate to the cache and database greatly affects consistency. Common write strategies include:

Each strategy has implications. In an interview answer, you might say: For strict consistency requirements, a write-through cache or immediate cache update might be used to ensure the cache is always up to date. If optimizing for performance over consistency, a write-back cache could be used, accepting some delay in database persistence. Many real systems simply choose to invalidate cache on write (cache-aside) to keep design simple – this ensures no stale data is served because any change causes removal of the cache entry. The next read either misses and fetches current data or repopulates the cache with up-to-date information. Remember to address the race condition problem: if a cache invalidation and a cache fill (from nearly simultaneous reads) interleave poorly, you could end up with stale data written to cache after the invalidation. One solution is to use versioning on cache entries. For example, include a version number or timestamp with cached data so that older writes or fills can be detected and ignored. If a stale value arrives with an earlier version, the cache can reject or overwrite it with the newer value. (This is the approach described in some Facebook engineering blogs to reach 10⁹-level consistency in caches.)

Cache inconsistency race condition. This diagram illustrates a race where an older value overwrites a newer value in the cache. The database initially has x=42, which is cached (frame 1). A client updates x to 43 in the DB (frame 2), and an invalidation event for the new value x=43 is sent to the cache, updating the cache to 43 (frame 3). However, a prior read for x=42 was still in flight; its delayed response arrives and is written to the cache after the invalidation (frame 4), reverting the cache to the stale value 42. Without extra measures, the cache now serves x=42 indefinitely while the database is x=43 – a serious inconsistency. Techniques like version stamps (noting “x @ version 2” vs “version 1”) prevent this by ensuring the stale write is ignored.

Stampede Protection Techniques

Cache “stampedes” (or the thundering herd problem) occur when a cache item expires or is invalidated, and suddenly many concurrent requests go to the database to regenerate it. This surge can overwhelm the database. Several strategies can mitigate this:

Cache locking to prevent stampede. In this illustration, multiple concurrent requests (R1, R2, R3) ask for the same data. R1 acquires the lock and fetches fresh data from the database, while R2 and R3 are blocked (waiting) during that time. Once the data is fetched and the cache updated, the lock is released and R2/R3 can proceed to get the now-cached data. This mechanism ensures only one expensive database query happens for a given key, rather than three in this example.

In an interview, mention that cache stampedes can be devastating under load (the cache hit rate drops to 0% and the database is hammered). To prevent this, you’d implement one or more of the above: e.g., a mutex per key to serialize regeneration, and/or allow serving stale data while updating. Also consider graceful degradation: for less critical data, it might be fine to serve slightly old content rather than take down the system.

Validation Tokens: ETags, Conditional Gets, and CAS

Another aspect of cache consistency is using validation tokens or version tags to verify freshness and coordinate updates. In HTTP caching, for example, responses often include an ETag (entity tag) which is a hash or version identifier for the content. Clients can then do a conditional GET: “Give me the data only if it has changed (If-None-Match: )”. If the data is unchanged, the server replies with a 304 Not Modified, and the client can continue using its cached copy. This mechanism reduces bandwidth and ensures the client’s cache isn’t serving outdated content beyond its allowed TTL. Similarly, a Last-Modified timestamp header can be used with an If-Modified-Since conditional request. These are ways to keep caches (especially browser or CDN caches) in sync without always fetching the full data.

Within back-end caches like Redis or Memcached, we have analogous concepts. CAS (Check-And-Set) tokens in Memcached, for instance, provide an optimistic locking mechanism. When you read an entry, you get a CAS token; when updating, you send the token back – if someone else modified the data in the meantime, the token will have changed and your update can be rejected. This prevents overwriting a newer value with an older one by accident. It’s useful in race conditions where multiple processes might try to update the same cache entry. Using CAS, you ensure only the client with the correct last-seen version can update, others will notice a conflict and retry (or at least not clobber fresh data). In effect, this is like the version stamp method mentioned earlier but provided as a feature by the caching system.

More generally, storing a version number or timestamp with each cache entry allows validation. The source of truth (database) can also carry version numbers (e.g., a row last_updated timestamp or a revision number). A cache invalidation message might include the new version. Cache nodes can then skip invalidation if they already have a newer version, etc. The Facebook example earlier used a version to solve the ordering race. This concept extends to conditional updates: only apply this cache update if the cache still has version N (otherwise do nothing because a newer version is present). By leveraging such tokens/versions, caches achieve a higher consistency without full serialization of writes.

Distributed Invalidation and Replication

In a distributed system (multiple servers or a cluster of caches), keeping caches coherent across nodes is challenging. If each app server has its own local cache, one server handling an update must somehow tell all the other servers to evict that item. Two common approaches are broadcast invalidations and centralized caches:

The invalidation latency – the time between a DB write and all caches being notified – is a crucial metric. Ideally it’s low (a few ms), but if the message bus is slow or backlogged, caches might hold stale data longer. Monitoring this latency helps ensure your cache system meets consistency expectations. Another consideration is ordering of invalidations. In complex flows, if messages can arrive out of order, you might drop a newer update then process an older one. Usually, using a single source (like DB log or a sequential broker like Kafka) can preserve order per key.

When designing, also consider how to handle missed messages. A pattern is to combine approaches: use pub-sub for speed, but also have a periodic cache reconciliation job or short TTL as insurance. For example, if an invalidation event didn’t reach a server, the TTL will eventually expire the entry so it won’t be stale forever. Some systems also employ gossip protocols or pull-based checks where nodes periodically hash or sample their cache entries to detect divergence.

Hot Keys, Skew, and Bouncing Invalidations

A hot key is a cache entry that gets disproportionately high traffic. In a multi-sharded cache cluster, hot keys can cause load imbalance – e.g., if one cache node is responsible for a very popular item, it can become a bottleneck. This is the hot-key skew problem. An example might be a cache of a trending news article: one shard of your cache cluster might get hammered with requests for that article, affecting its ability to serve other keys. To mitigate this, strategies include replicating hot items across multiple nodes or using a two-tier cache. A two-tier (near/far) cache involves a local in-memory cache on each app server (near cache) with a very short TTL, in front of the distributed cache (far cache). The near cache absorbs the read traffic bursts with slightly stale data, while ensuring that the far cache doesn’t get overwhelmed. Essentially, each application server caches the hot item for, say, a few seconds locally, smoothing out the load, and periodically refreshes from the central cache. Another strategy is to spread the load by key variation – e.g., using consistent hashing that evenly distributes keys. However, by definition a single “very hot” key can’t be split by hashing (since it’s one key), so replication or request-level load balancing is needed.

Bouncing invalidations” refers to a pathological scenario usually with rapidly changing data. If a particular key is being updated so frequently that the cache hardly ever has a stable value, you might see continuous thrashing: constant invalidation messages, cache evictions, and reloads. The data is “bouncing” in and out of the cache. This can happen with, say, a real-time counter or a popular stock price that updates every second. The cache might invalidate it every second, so effectively it’s rarely serving from cache (or if it does, it’s almost always stale). In such cases, the cache is not very effective and just adds churn. A couple of approaches: you might not cache such rapidly-changing values at all, or design the cache to tolerate a bit of staleness (e.g., update in place in cache and only invalidate if an out-of-order update would violate monotonicity). The earlier race condition example is also a form of inconsistency that can be described as the cache bouncing between values if updates interleave poorly. As mentioned, using versioning or atomic updates (e.g., only invalidate if the cached version matches) can prevent the worst inconsistency issues.

Hot keys and bouncing invalidations are “edge cases” that interviewers might poke at. They want to see if you consider load imbalance and how to handle a value that’s updated so often that caching it is problematic. For hot keys, mention things like sharding, replication, or adjusting TTLs. For very high churn keys, mention possibly disabling caching for that key or coalescing updates (e.g., only publish invalidation at most once per X seconds), because constant invalidation can negate the benefits of caching. The goal is to show you’re aware that one size doesn’t fit all – some data might be cached differently based on its access patterns.

Observability: Metrics for Cache Freshness

To manage cache consistency in production, you need good observability. It’s not enough to measure the overall hit rate; you should measure how “fresh” your cache hits are. Key metrics and concepts include:

Other helpful metrics are cache penetration rate (how often do reads go all the way to DB – which could indicate stale or missing cache), and eviction rate, etc. High eviction or invalidation rates might point to bouncing values or inefficiencies. Logging and tracing can also help – for example, log when a request had to wait for a lock to regenerate cache, or when a stale-while-revalidate served stale data. These give insight into how often your stampede prevention is activating and whether it’s effective.

In summary, demonstrate that you would monitor cache behavior. If an interviewer asks “how do you ensure the cache is working correctly?” you could mention tracking the fresh-hit ratio and invalidation delays, not just the hit rate. This shows you understand the end-user impact of caching (serving correct, up-to-date data) and the importance of feedback loops in system design.

Interview Gotchas Checklist

Finally, here’s a quick checklist of “gotchas” and key points to recall in caching questions:

By keeping these points in mind, you can confidently design a caching system that balances speed with correctness. Caching is often the make-or-break in system design interviews – demonstrating mastery of consistency and invalidation will show that you’re aware of more than just the basics of caching, but also its corner cases and failure modes. Good luck, and happy caching!

system-design