SerialReads

CDN Foundations & Core Routing (System Design Deep Dive)

Jun 07, 2025

Content Delivery Networks (CDNs) are a cornerstone of modern web systems, enabling fast and reliable content delivery at global scale. This deep dive explores CDN fundamentals and core routing, covering why CDNs are needed, how requests are routed worldwide, the anatomy of edge caches, cache key design pitfalls, caching policies (TTL and revalidation), invalidation strategies, transport optimizations (TLS, HTTP/2, HTTP/3), key performance metrics, and an example touchpoint with AWS CloudFront (including Lambda@Edge). The discussion is vendor-agnostic with CloudFront and others as illustrative examples.

Why CDNs Exist (Latency, Offload, Scalability, Resiliency)

CDNs exist primarily to improve performance and reliability by bringing content closer to users. By caching content on servers distributed around the globe, CDNs drastically reduce latency — the time it takes for data to travel from server to user. Rather than every user reaching all the way to a distant origin server, a CDN serves many requests from a nearby Point of Presence (PoP), which improves load times and reduces buffering. This also offloads traffic from the origin: fewer requests hit the origin server, preventing it from becoming a bottleneck or single point of failure. The result is greater scalability: a CDN can handle massive spikes by spreading load across its network, ensuring content stays available under high demand. CDNs inherently add redundancy and resiliency as well. If one edge server goes down, others can serve the content, and routing systems simply direct users to an alternative location. In fact, CDNs like Cloudflare use anycast routing (discussed below) so that if one node fails, traffic transparently falls over to the next nearest node. Additionally, because the CDN intercepts requests, it can provide security benefits (absorbing DDoS attacks, shielding the origin, handling TLS, etc.) without exposing the origin directly. In summary, CDNs exist to make content delivery faster (lower latency), lighter on origins (caching/offload), more scalable (distributed load), and more resilient (redundant global infrastructure).

Global Request Flow: DNS, Anycast, and Routing to the Nearest Edge

When a client requests content served via a CDN, the request is typically directed to an optimal edge location through clever DNS resolution or networking. The process often goes like this: the user’s DNS lookup for the content’s domain is answered by the CDN’s authoritative DNS. CDNs use either geo-aware DNS (returning an IP of a nearby edge) or anycast IP addressing. With geo-DNS, the CDN’s DNS server uses the user’s resolver location to hand back an IP of a close-by PoP. The downside of basic DNS load routing is it relies on the resolver’s location (which might not exactly reflect the client’s location) and has DNS TTL delays if changes are needed. With anycast, a single IP address is advertised from many locations; the Internet’s routing protocol (BGP) then automatically directs the client to the “nearest” point announcing that IP (generally in terms of network distance). Many modern CDNs (e.g. Cloudflare, Google) use anycast so that user requests are dynamically routed to the closest data center without complex DNS logic.

Once the client’s request reaches a CDN edge PoP, the CDN’s layer kicks in. The edge server receiving the request will check its local cache for the content. If the content is cached (a cache hit), it is immediately returned to the user from the edge, yielding a fast response. If it’s a cache miss at that edge, the CDN must fetch it – but rather than always going straight to the origin, CDNs often have an intermediate step: the edge may query a higher-tier cache or regional PoP first. For example, AWS CloudFront uses Regional Edge Caches between the outermost edge locations and the origin. In such a design, an edge location that misses will ask a regional cache; if the regional cache has it, it responds to the edge, avoiding an origin trip. If the regional cache also misses, then the CDN goes to the origin server to retrieve the content. This multi-tier request flow means only the first request in a region has to hit the origin; subsequent requests, even from different nearby cities, can be served by the cached copy in the regional hub. For instance, Alibaba Cloud’s CDN has L1 and L2 caches – the L1 are many distributed city-level nodes and the L2 are regional nodes. An L1 miss will fetch from an L2; if L2 hits, the content is returned to L1 and then to the user; if L2 misses, only then is the origin contacted (and both caches store the result). In all cases, the goal is to route the user to a proximal cache and minimize how often the origin must be touched.

Edge PoP Cache Anatomy: Multi-Layer Caches, Coalescing, and Shielding

Once a request arrives at a CDN node, what does the caching infrastructure look like? At the level of a single PoP (Point of Presence), there are typically multiple layers of caching in both architecture and storage. On each cache server, content may be stored in a hierarchy: a hot object cache in RAM for very frequently accessed items, and a larger but slower cache on SSD or HDD for less-frequent or large objects. Studies note that a CDN server often has a two-tier cache: a small, fast in-memory layer for hot content, and a large disk-based cache for everything else. The memory cache (HOC: Hot Object Cache) serves the most frequent content with minimal latency, while the disk cache (DC) provides high capacity for the long tail content with slightly higher latency. This approach maximizes both speed and cache hit ratio by using expensive memory for the few objects that account for a lot of traffic, and cheaper storage for the bulk of content.

CDNs also organize caches hierarchically across the network. We touched on L1 (edge) and L2 (regional) cache hierarchy above – this is sometimes called a shielding architecture. A shield PoP (or regional cache) is a designated cache that sits between many edge nodes and the origin. It shields the origin from bursts of traffic by ensuring that each piece of content is fetched from the origin at most once (per region) even if dozens of edge locations are requesting it. In practice, after a cache miss at an edge, the request goes to the shield; the shield, if it has the content, responds, otherwise it fetches from origin and stores it. Thereafter all those edges will get the content from the shield instead of hitting origin individually. This greatly reduces origin load, at the cost of one extra hop on a cache miss (edge to shield). Many CDN providers support an optional shield layer (for example, Fastly allows you to designate a specific PoP as a shield for your service). KeyCDN similarly describes “Origin Shield” as an extra caching layer that reduces the load on your origin server even further, adding a header like X-Shield: active when enabled. The benefit is especially seen for widely-distributed traffic: rather than 100 edge nodes each pulling the same new file from origin, perhaps they all fetch from 1-2 regional caches, drastically cutting down origin fetches.

Another important aspect of cache behavior is request coalescing (also known as collapsed forwarding). This is a technique to handle the scenario where many users concurrently request a resource that is not yet cached. Instead of letting each request trigger a separate origin fetch, the CDN will collapse them such that only one request (the first one, or a designated leader) goes to the origin, and the other requests wait for that response. Once the origin response arrives, it is stored in cache and served to all the waiting users. This prevents an “origin stampede” where a cache miss could have caused a flood of identical origin requests. Request collapsing is often enabled by default in CDNs. For example, Google Cloud CDN notes that it actively collapses multiple user-driven cache fill requests for the same key into a single origin request per edge node. Without collapsing, 10 users missing the same object at the same time could trigger 10 origin fetches; with collapsing, the origin might see just 1 request, and the 9 others get the content with a slight delay but much less load on the backend. This technique, combined with shielding, ensures that origin servers are protected from thundering herds when cache misses do happen.

Illustration: Multi-tier CDN caching. An edge (L1) cache that misses will query a regional (L2) cache before going to origin. This hierarchy, as in Alibaba Cloud’s CDN, ensures popular content is served from nearby caches and the origin is only hit on an L2 miss. The L2 cache acts as a shield for the origin, consolidating multiple L1 requests into one origin fetch.

Cache Key Design: URL, Query, Cookies, and Avoiding Variant Explosions

A cache key is the identifier a CDN uses to determine if two requests are for the same cached object. Designing what goes into this key is crucial: it must include any parts of the request that can affect the content, but exclude anything that doesn’t, in order to maximize cache hits. By default, many CDNs use a key that consists of the host and path (and sometimes query parameters) of the request URL. For example, Amazon CloudFront’s default cache key includes the domain and URL path; query strings, headers, and cookies are not included unless configured, meaning /images/cat.png?size=large and /images/cat.png?size=small would be treated as different objects, but requests with different user-agents or cookies would still map to the same cache entry by default. If your content varies by certain headers (say, Accept-Language for localization or a custom header for A/B testing), or by cookie (say, a user login vs logout state), then the CDN must include those in the cache key (or use the HTTP Vary mechanism) to keep responses separate. Failing to do so could serve the wrong content to users (a logged-in view to someone not logged in, for example). On the other hand, including too many request attributes in the cache key can dramatically reduce cache efficiency — this is the cache variant explosion problem.

Consider cookies: many applications set cookies (session IDs, analytics, etc.) on requests. If the CDN by default treated each unique cookie value as part of the cache key, essentially every user would get a different cache entry and your hit ratio would drop to near zero. In fact, some CDNs explicitly avoid caching responses that vary on the Cookie header for this reason. For instance, Google Cloud CDN does not cache content when Vary: Cookie is present, since cookie values are often unique per user and would prevent any cache hits. The guidance is to minimize unnecessary variance. If your HTML or API response is the same for all users who are not logged in, you should strip or ignore cookies for those requests so that they can be cached as a single object for everyone. Similarly, if you have irrelevant query parameters (like tracking query strings), you might configure the CDN to ignore them in the cache key. Many CDN platforms let you define a custom cache key or policy: you can specify which query parameters to include or exclude, which headers to include, and whether to consider cookies. The rule of thumb is to include only what truly changes the response. As AWS CloudFront documentation puts it, if a value in the request influences the origin’s response, it should be in the cache key; if not, including it might lead to unnecessary duplicate cache entries.

To avoid variant explosion, developers employ techniques: strip or hash volatile cookies, use separate caching for logged-in vs logged-out users, and use the Vary header carefully. For example, instead of varying on an entire User-Agent string (which is almost unique per browser version), you might vary on a simpler header like Accept-Encoding (for compression differences) or add a custom normalized device type header at the edge. The goal is to maximize the number of requests that can reuse the same cached asset. A high cache hit ratio (discussed later) depends on consolidating requests into as few cache variants as possible while still serving the correct content. Monitoring your CDN’s metrics can help spot an explosion of cache keys — e.g. if you inadvertently included a timestamp or request ID in the cache key, you’d see a 0% hit ratio because every request is a miss. Thus, designing cache keys requires balancing granularity (serving the correct variant) with generality (achieving cache reuse). By keeping cache keys as broad as correctness allows, CDNs can serve many users from one cached copy, dramatically improving performance and offload.

Cache TTL and Revalidation: Cache-Control, max-age, stale-while-revalidate, ETags, etc.

Caching wouldn’t work without a mechanism to eventually expire or refresh content. TTL (Time To Live) is the duration that content is considered fresh in cache. It’s typically controlled by the origin server via HTTP headers like Cache-Control or Expires. A common directive is Cache-Control: max-age=N (or its sibling s-maxage=N for shared caches specifically), which tells caches they can consider the content fresh for N seconds. During this time, caches will serve the content without checking back with the origin. Once the max-age expires, the cache must decide how to revalidate or fetch a new copy.

Revalidation is the process of checking if the cached content is still valid or if it has changed on the origin. HTTP provides two main validators for this: the Last-Modified timestamp and the ETag. An ETag is an opaque identifier (often a hash or version ID) that the origin attaches to a resource (e.g., ETag: "v1.abc123"). When the cache’s copy expires or the client explicitly refreshes, the cache (or client) can send an conditional request with If-None-Match: "v1.abc123". The origin will compare the ETag with the current version; if unchanged, it replies 304 Not Modified, which tells the cache that its copy is still good (and typically refreshes the TTL). If the ETag doesn’t match, the origin sends a fresh copy (200 OK with new content) which the cache stores. Similarly, the Last-Modified header can be used with If-Modified-Since in a conditional GET: “Give me the resource only if it has changed since this date.” A 304 response means it hasn’t changed. These mechanisms greatly reduce bandwidth and origin load, because a 304 response has no body – the cache just continues using its stored body.

The Cache-Control header supports more nuanced directives for cache refresh behavior. One important one is stale-while-revalidate (introduced by RFC 5861). This directive allows a cache to continue serving an expired resource (stale) for a certain extra time while it asynchronously revalidates it in the background. For example, Cache-Control: max-age=600, stale-while-revalidate=30 means the cache considers the item fresh for 600 seconds, and if a request comes in at 610 seconds (which is 10 seconds past freshness), it can immediately return the stale content to the client, and at the same time fetch a new copy from origin to update the cache (so the next request gets fresh data). The user gets a fast response (with possibly slightly stale data) instead of waiting for the origin fetch. This is great for content that can tolerate slightly out-of-date data in favor of speed. Not all CDNs originally honored stale-while-revalidate (Cloudflare and others do; historically CloudFront did not without custom Lambda logic), but it’s increasingly common in modern CDN behavior as it aligns with improving hit rates and latency. Another directive, must-revalidate, indicates that once the TTL is up, the cache must revalidate with origin before serving the content (i.e. do not serve stale). This is used for more sensitive data that should not go stale. There’s also stale-if-error which allows serving stale content if the origin is unreachable or returns an error, improving resilience.

In practice, origin servers set caching headers based on content type and desired freshness. Static assets (like versioned JS/CSS files or images) often have very long max-age (e.g. a year) and immutable indicating they never change (often paired with content fingerprinting in file names). Dynamic HTML pages might have Cache-Control: no-cache or a short s-maxage for CDNs with revalidation. It’s worth noting that browsers have their own cache, separate from the CDN edge. The CDN’s headers can also control browser caching (e.g. a public max-age for browsers vs a separate s-maxage for CDNs). But the focus here is the CDN-level caching. CDNs will honor origin Cache-Control unless overridden by configuration. Some CDNs allow you to set default TTLs and ignore or cap what the origin says (for example, to avoid caching something too long by mistake).

In summary, TTLs and revalidation policies govern when the CDN considers content stale and how it refreshes it. The combination of freshness lifetime (max-age) and validator-based revalidation (ETag/If-None-Match or Last-Modified/If-Modified-Since) allows CDNs to efficiently keep content up-to-date without unnecessarily hammering the origin. Features like stale-while-revalidate further improve user experience by hiding refresh latency and keeping hit ratios high, since the CDN can keep serving old content until the new content arrives. A well-designed cache strategy uses these tools to strike a balance between content freshness and the performance gains of caching.

Cache Invalidation & Purging: Soft vs Hard Purges, Tags, and Conditional Fetches

Even with caching and TTLs, there are times when you need to explicitly purge or invalidate cached content – for example, after a deploy or when user data is updated. CDNs provide APIs or tools to purge cached objects. Two general approaches are used: hard purge and soft purge (sometimes also called invalidate vs expire). A hard purge (immediate invalidation) means the CDN will remove the object from cache outright. After a hard purge, the next request will be a miss and force a fetch from origin (because the cache no longer has the item at all). In contrast, a soft purge marks the cached item as stale but doesn’t completely evict it. The stale object might still be served if allowed by cache policy (for example, under stale-while-revalidate or if the origin is slow, some CDNs can serve stale on error). Essentially, soft purging expires the item so that a new fetch will be triggered on next request, but it keeps the old copy around just in case. Fastly’s docs explain that after a soft purge, the content “will not automatically be available to serve... but remains available to use in some circumstances”. One of those circumstances is that if a user request comes in right after and the origin fetch is in progress or fails, the CDN could potentially serve the stale content (depending on configuration). Another benefit of soft purge is zero downtime updates: some CDNs (e.g. Fastly) with soft purge combined with serve-stale can effectively ensure users either get the cached old content or the new content, but never a gap or error. As one article quipped, “soft-purge means the old object serves as STALE until the new one is fetched — zero downtime, zero cold starts.”. The tradeoff is that a truly out-of-date item might briefly still be seen. For critical content, a hard purge ensures it’s gone immediately.

CDNs also differentiate purging by scope. The most straightforward is purging by URL (i.e. by cache key). You specify an exact resource (or a wildcard pattern) to invalidate. Many providers support wildcards like “purge all items under /images/”. The more advanced method is tag-based invalidation. In this scheme, content is tagged with one or more labels (sometimes called surrogate keys). For example, you might tag all pages related to “Product 123” with a tag product-123. When that product is updated, you can purge by that tag and thereby invalidate dozens of URLs (product page, related images, API endpoints, etc.) in one go. Fastly pioneered this with Surrogate-Key headers that the origin can send with responses. The CDN stores the mapping of tags to cached objects. Then a surrogate key purge invalidates every object with that key, typically very quickly (Fastly says such purges take on the order of ~150 milliseconds across their network). This is extremely useful for content management systems or APIs where you don’t want to track every URL manually but you know logical groups. Other CDNs like Cloudflare have similar concepts (Cache-Tag header for enterprise plans). Purge-all (flushing the entire cache) also exists but is rarely desirable except in emergency, as it essentially empties the CDN and all requests will start hitting your origin (a potential thundering herd). Some CDNs might disallow purge-all or require special confirmation because of that impact.

Apart from explicit purges, conditional fetching is another mechanism to keep cached content fresh without purging. We discussed conditional revalidation (If-Modified-Since / ETag) earlier which is one kind of conditional fetch the CDN can do automatically when content expires. But you can also programmatically trigger validations. For instance, some CDNs let you send a PURGE request (an HTTP method) to an object which either invalidates it or causes the next request to be conditional on origin. In a way, soft purge is like telling the CDN “treat the content as expired now,” which will make the next request revalidate it. There’s nuance here: a revalidation (If-None-Match) will still burden the origin slightly but uses a 304 response if unchanged, whereas a purge with refetch will force a full refetch.

Use cases: If you deploy a new version of your JS bundle, you might purge the old URL so that users immediately get the new file (assuming the URL is the same). If you instead use versioned filenames (e.g. app.abc123.js for the new version), you don’t need to purge – you just start using a new URL (this is a common cache invalidation strategy to avoid the need for purges). For user-specific data that was cached accidentally, a purge might be needed to remedy any data leak. Purging is often integrated into CI/CD workflows or admin dashboards for content publishers. The key is to use it sparingly and logically: excessive purging defeats the purpose of caching. Tag-based purges help target just the content that changed.

In summary, CDNs offer both fine-grained and broad invalidation tools. A hard purge dumps content immediately, ensuring new requests go to origin, whereas a soft purge expires content but lets the CDN potentially serve stale in the interim. Tags (surrogate keys) allow purging by semantic group, which is very powerful for large sites. And any purge can often be done globally within seconds, reflecting the CDN’s ability to update thousands of edge servers quickly. By combining careful cache key design, proper TTLs, and selective purging, one can achieve a caching strategy that is both performant and correct, even as the underlying content changes.

Transport Optimizations: TLS at the Edge, HTTP/2 Multiplexing, HTTP/3 (QUIC)

Beyond caching, CDNs also optimize the transport layer to further accelerate content delivery. One major optimization is TLS termination at the edge. When a client connects to a website over HTTPS, the TLS handshake (and cryptographic setup) normally would happen on the origin server. With a CDN, the TLS session is terminated on the edge server instead. That means the client’s secure connection is established with the nearby CDN node, not the distant origin. The edge node then typically communicates with the origin via its own secure connection (often over the CDN’s private backbone or optimized routes). Terminating TLS at the edge has two big benefits: performance and offload. Performance improves because the TLS handshake (which can involve multiple round trips, especially in older TLS versions) is done with a close server, reducing latency. Modern TLS 1.3 has only 1-RTT for handshake, but even that is noticeable over long distances; doing it within, say, 20 ms of the user (to the edge) instead of 150 ms to an origin is a win. The edge can also cache TLS session parameters (session resumption tickets) to allow subsequent connections or parallel connections to avoid full handshakes. The offload aspect is that the heavy computation of encryption/decryption is handled by the CDN’s infrastructure (which is built to do this at scale), relieving the origin from doing TLS handshakes with every user. For example, Azure Front Door (a CDN-like service) explicitly offloads TLS at the edge, decrypting client traffic at the POP and only then routing it to the origin. The origin can even be configured to only accept traffic from the CDN, focusing on backend processing while the CDN handles all client-facing encryption.

CDNs also make extensive use of persistent connections and protocol improvements. When an edge needs to talk to an origin, it often reuses long-lived TCP connections or even uses its own optimized protocols to avoid the overhead of establishing new connections for each request. On the client side, CDNs fully support HTTP/2 and HTTP/3 to better utilize each connection. HTTP/2 introduced multiplexing, allowing many requests and responses to be in flight simultaneously over a single TCP connection. With HTTP/1.1, browsers used to open 6-8 parallel connections to work around request blocking; HTTP/2 removes that need by interleaving streams within one connection. The result is fewer connection setups and better utilization of bandwidth. As noted in one source, HTTP/2 “allows multiple requests and responses over a single connection to improve speed and efficiency.”. CDNs typically terminate HTTP/2 at the edge (speaking HTTP/2 to clients), and then communicate to origins using HTTP/1.1 or HTTP/2 depending on the origin’s capability. Some CDNs also maintain connection pools to origins so that even if 1000 users hit an edge for a cache miss at nearly the same time, the edge might use just a few persistent connections to stream data from the origin for all of them, rather than 1000 separate TCP handshakes.

The latest evolution, HTTP/3 over QUIC, is also being embraced by CDNs and browsers. HTTP/3 uses QUIC, which is a transport protocol on top of UDP, bringing several improvements: it eliminates head-of-line blocking at the transport layer and includes built-in multiplexing (each QUIC stream is independent, so packet loss on one stream doesn’t halt others), and it can achieve 0-RTT connection setup for repeat connections (meaning data can sometimes start flowing immediately without waiting for even one handshake round trip). QUIC also has connection migration (the connection can survive network changes, like switching Wi-Fi to mobile data). For users, this means even faster handshakes than TLS 1.3/TCP and more robust performance on flaky networks. HTTP/3’s impact is lower latency, especially in high latency or lossy scenarios — it was designed to solve issues that remained in HTTP/2 (which still ran over TCP, and a lost packet could stall the whole connection due to TCP’s order requirements). Many CDNs (Cloudflare, Fastly, etc.) have enabled HTTP/3 on their edges, allowing clients that support it to take advantage automatically. It’s important to note that HTTP/3 is primarily a client↔edge improvement; the connection from the edge to origin is often still TCP (since QUIC to origin would require origin support, which as of now is uncommon). But since that leg is over data center networks or long-haul links that the CDN can manage, the focus is on optimizing the client last-mile.

In essence, CDNs optimize transport by bringing the end-of-connection closer to users and using the most advanced protocols available. They handle TLS efficiently (often with custom hardware or highly tuned software), and leverage protocol features like multiplexing, header compression, and parallelism to ensure the network link is fully utilized. The result is not just that bytes travel a shorter distance (caching), but also that the transit is as fast as current technology allows (e.g. no head-of-line blocking, minimal handshakes). This is why a well-tuned CDN can significantly outperform a single origin server even purely in terms of network latency and throughput, aside from caching. A user connected to a nearby CDN node might have a round-trip time of <20ms, whereas to the origin it could be 100ms+. Additionally, CDNs often have high-capacity backbone networks connecting their PoPs, meaning content can traverse the globe on optimized paths (avoiding congested public internet links), sometimes called “private backbone” or, in Cloudflare’s marketing, Argo Smart Routing which finds faster routes to the origin.

Core Metrics: Cache Hit Ratio, Miss Penalty, Origin Fetches, Latency (RTT, P99)

To evaluate and tune CDN performance, engineers look at several core metrics. One fundamental metric is the Cache Hit Ratio (CHR) – the percentage of requests (or bytes) served from cache versus going to origin. A high hit ratio means the CDN is effectively caching content. For example, a 95% cache hit ratio means only 5% of requests go to origin. This is huge for offload: increasing CHR even slightly has a big nonlinear impact on origin load. Fastly notes that going from 90% to 95% hit ratio is not just a 5% improvement – it actually halves the miss rate (from 10% down to 5%), thereby halving the origin requests. In other words, the origin went from seeing 10 out of 100 requests to only 5 out of 100. That can translate to significant origin infrastructure savings and more headroom. CHR can be measured request-wise or byte-wise (sometimes called Origin Offload when measured in bytes). Byte-wise gives weight to large files; if you cache a few huge video files, your byte offload might be high even if request hit ratio is lower. Both are useful to watch.

Another important concept is the miss penalty – this is the extra latency incurred when a request is a cache miss. A cache hit might be served in a few milliseconds from a nearby edge. A miss means the request had to go to origin (possibly far away, plus origin processing time) before the user gets a response. The difference in response time is the penalty users pay on a miss. For example, consider a CDN with an edge in Frankfurt and an origin in New York: a hit from Frankfurt might have ~20–50ms latency to a user in Germany, whereas a miss requires maybe 100ms (edge to origin) + 100ms (origin to edge to user) = ~200ms or more. Real-world data shows a dramatic gap: in one study of Tencent’s CDN, the 99th percentile latency for cache hits was about 9.7 ms, whereas for misses it was about 218 ms. That implies tail-end users (p99) will experience a ~0.21 second delay when content isn’t cached, versus essentially instantaneous responses when it is cached. Thus, misses disproportionately affect the tail latency. This is why beyond average response times, P99 latency is closely watched – it captures those worst-case scenarios (often corresponding to cache misses or other outliers). In performance-sensitive systems, having 1% of requests take an order of magnitude longer can hurt user experience (think of a page where most images load fast but one image takes a second due to a miss).

The origin fetch rate (or origin request rate) is closely related to hit ratio and miss rate. It measures how many requests (or bytes) are being forwarded to the origin. If your CDN is doing its job, this number should be as low as possible relative to total traffic. A low origin fetch rate means you are getting good offload – the origin is only seeing new or uncached content requests occasionally. High origin fetch rate might indicate lots of misses (maybe due to uncacheable content or too short a TTL or inefficient cache key usage). Operators often aim to improve hit ratio through techniques like longer TTLs (if acceptable), better cache key normalization, and enabling features like stale-while-revalidate to serve more requests from cache. Origin bandwidth is another angle: CDNs reduce origin egress bandwidth which can save costs if your origin is in the cloud with bandwidth charges, for example.

Edge RTT (Round-Trip Time) is a metric more on the networking side, measuring the latency from user to the edge. This can be approximated by measuring ping or TCP handshake times to the CDN’s edge server. CDNs pride themselves on having a dense network that yields low edge RTTs globally. Cloudflare, for instance, has claimed that they reach 95% of the world’s population within ~50 milliseconds. A lower edge RTT means faster initial response and handshake. If you see users in a certain region with high edge RTT, it might mean the CDN needs a presence there or is not effectively routing them to a closer location. Edge RTT is often averaged or shown in a latency heatmap in CDN analytics.

Finally, tail latency metrics like P90, P95, P99 are crucial to monitor. A CDN might have an average response time of, say, 50 ms, but a P99 of 500 ms if a small fraction of requests are missing or encountering issues. This could still frustrate 1% of your users. By monitoring tail metrics, you can identify if those misses are causing unacceptable delays. Oftentimes, improving cache hit ratio improves tail latency since many of those long requests get eliminated. CDNs also can mitigate tail latency by serving stale on errors or prefetching content before it’s requested (some have prefetch capabilities for next assets).

In summary, metrics like Cache Hit Ratio (how often you hit cache), Origin Offload (bytes or requests not hitting origin), Miss Penalty (latency impact of misses), Origin Fetch Rate (how frequently origin is called), Edge RTT (user to edge latency), and P99 latency (worst-case user experience) collectively tell the story of your CDN’s performance and effectiveness. A well-optimized CDN setup will show high hit ratios (typically >90% for static content heavy sites), low origin request rates, and a tight latency distribution where even P99 isn’t drastically worse than the median. Engineers preparing for system design should be comfortable explaining these metrics and how to improve them (e.g., tweak caching rules to raise hit ratio, use a closer CDN POP to reduce RTT, etc.). These metrics also feed into cost – higher offload means less origin infrastructure and bandwidth cost, and better performance means happier users.

AWS CloudFront Examples: Edge Network and Lambda@Edge

To solidify concepts, let’s touch on how AWS CloudFront (Amazon’s CDN) reflects these principles. CloudFront’s global edge network consists of hundreds of edge locations worldwide, backed by Regional Edge Caches (mid-tier caches) in each AWS region. When you use CloudFront, you get a domain like d123.cloudfront.net. User DNS lookups to that domain route them to the nearest CloudFront edge location, using AWS’s routing infrastructure (which is a mix of anycast and an AWS DNS service). The request flow in CloudFront is: user → nearest edge; if miss, edge → regional cache; if miss, regional → origin. The content then bubbles back to the user and is stored at both the regional cache and the edge for next time. This is essentially the shield POP concept implemented as a managed feature called Origin Shield (recently, CloudFront allows you to designate a specific regional cache as an origin shield for even more origin protection). CloudFront caches by default based on URL path and does not include query strings, headers, or cookies unless configured (via Cache Policies). This ties back to cache key design – AWS gives you fine control to forward or not forward query params, specific headers, and cookies, and correspondingly include them in the cache key or not. A common AWS interview scenario is explaining how you would set CloudFront behaviors to maximize cache hits (e.g. not forward unnecessary cookies, or forwarding Accept-Language if you have localized content, etc.).

CloudFront also exemplifies edge computing capabilities with its Lambda@Edge and newer CloudFront Functions. Lambda@Edge allows you to run custom JavaScript/Node.js code on CloudFront’s edge servers in response to events (like each incoming viewer request, or before the response is sent). This is a powerful extension point: you can modify headers (for A/B tests, generate custom responses, rewrite URLs, add security headers, etc.), implement authentication checks, or even do things like generate dynamic content right at the edge. For example, you could use Lambda@Edge on a Viewer Request to inspect the User-Agent and add a custom header that influences caching or routing, or on an Origin Request to rewrite the path to point to a specific bucket/key. In the context of our topics: Lambda@Edge is often used to customize cache keys or routing beyond what the default settings allow. The AWS docs mention that you can alter the cache key via Lambda@Edge—for instance, removing a query param or normalizing a header value before CloudFront caches it. This can help avoid variant explosion by, say, stripping volatile query strings. Another use is header manipulation: CloudFront by default strips certain sensitive headers, but with Lambda@Edge you could forward some after processing. It’s essentially serverless functions at the CDN edge. CloudFront Functions (a newer, more lightweight JavaScript option) similarly allow quick header transforms or URL redirects entirely at the edge, but with some differences in capability and performance profile.

CloudFront integrates with other AWS services, but at its core it’s doing what we’ve discussed: caching content close to users, using a multi-tier cache infrastructure for efficiency, and optimizing transport (CloudFront supports HTTP/2 and HTTP/3 to users, and uses optimized network routes on the AWS backbone to reach origins). It also provides detailed metrics: you can see cache hit rate, total requests, error rates, etc., in CloudWatch. For instance, enabling Origin Shield on CloudFront can improve cache hit ratio by adding that extra regional layer, which is directly reflected in lower origin fetch counts.

When designing a system, one might cite CloudFront (or similar CDNs) as a way to offload static content, reduce load on servers, and improve global latency. It’s important, however, to remain vendor-agnostic in principle: all CDNs operate with similar fundamentals, even if implementation details (like what they call their regional caches or how they let you write edge logic) differ. CloudFront’s Lambda@Edge shows how CDNs are not just static caches now but are becoming programmable at the edge, which opens possibilities beyond simple caching (like personalized content assembly at edge, request filtering, etc., all without hitting the origin). This trend is seen across the industry (Cloudflare Workers, Fastly Compute@Edge, etc.), but that’s an extension beyond the core caching/routing focus of this article.


Conclusion: A solid understanding of CDN foundations is crucial for system design interviews and real-world architecture. We’ve covered why CDNs are indispensable for performance and reliability, how a request travels through DNS or anycast to an edge node, what happens inside the CDN caching layers, how to carefully design cache keys and TTLs, and how to invalidate or update content. We also looked at how CDNs optimize connections via TLS and modern protocols, and which metrics help evaluate a CDN’s effectiveness. With this knowledge, you can confidently discuss how a CDN fits into a system design: it accelerates delivery, reduces load on origin servers, and provides a layer of resilience and security. Whether you use AWS CloudFront, Cloudflare, Akamai, or any other provider, the core concepts remain the same. Understanding these will help you make trade-off decisions (e.g., cache duration vs. freshness, what to vary on, when to purge) and diagnose issues (cache misses, latency spikes) in large-scale systems. In an interview scenario, mentioning CDNs as part of your design for a global service (and elaborating on these points) demonstrates an advanced grasp of distributed system performance optimization and is often key for impressing on scalability and reliability aspects.

system-design