Amazon DynamoDB Primary Key Design – A Comprehensive Guide
May 13, 2025
Great. I’ll begin compiling a comprehensive deep-dive into Amazon DynamoDB primary key design strategies, tailored for developers. This will include architectural principles, implementation patterns, optimization strategies, and practical best practices. I’ll also draw from a mix of AWS documentation, blog articles, whitepapers, and real-world case studies from a variety of industries.
I’ll let you know as soon as the report is ready for review.
Amazon DynamoDB Primary Key Design – A Comprehensive Guide
Introduction
Amazon DynamoDB is a fully managed NoSQL database known for its fast and predictable performance at scale. A crucial aspect of harnessing DynamoDB’s power is the design of its primary keys. Unlike traditional relational databases where you can flexibly query any column, in DynamoDB the primary key design largely dictates your data access patterns and performance. Developers must model data with the “query-first” approach, structuring tables and keys to answer specific application queries efficiently. This guide provides a deep dive into DynamoDB primary key design – covering fundamentals, architectural strategies, data modeling, key selection criteria, scalability considerations, performance optimizations, and real-world best practices. We’ll explore how to choose effective partition and sort keys, leverage single-table design, use secondary indexes, tune performance, ensure consistency, and apply advanced patterns. Diagrams for entity modeling, partitioning, and indexing are included to illustrate concepts. The target audience is software developers looking to build scalable applications on DynamoDB with well-designed primary keys that maximize throughput and minimize cost. Let’s begin with the basics of DynamoDB keys and why they matter for performance.
1. DynamoDB Primary Keys: Definitions and Fundamentals
In DynamoDB, every item in a table is uniquely identified by its primary key. There are two types of primary keys:
- Simple primary key (Partition Key only): A single attribute that acts as the partition key. Each item’s partition key value is hashed to determine which physical partition (storage node) it belongs to. All items with the same partition key are stored together (in the same logical partition).
- Composite primary key (Partition Key + Sort Key): A two-part key where the first part is the partition key and the second part is the sort key (sometimes called a range key). The partition key decides the partition placement, and within each partition, items with the same partition key are ordered by the sort key. Each composite key must be unique; that is, the combination of partition and sort key is unique for every item.
Partition Key (PK): Also known as a hash key, this is used to distribute data across partitions. DynamoDB applies an internal hash function on the partition key value and maps the item to a physical partition. A well-chosen partition key achieves uniform distribution of data and load; this prevents any single partition from becoming a bottleneck. For example, using a user ID as the partition key in a large user table is effective (high cardinality ensures many distinct keys), whereas using a status flag (e.g., “active” or “inactive”) would concentrate all items into a few partition key values – a bad practice leading to “hot” partitions. The impact on performance is direct: DynamoDB allocates throughput and storage at the partition level, so one hot partition key can consume disproportionate resources and throttle other keys.
Sort Key (SK): The optional second part of a composite key that defines item ordering within a partition. It allows range queries – you can Query
a partition for items where the sort key is between two values, begins with a substring, etc. Sort keys enable modeling one-to-many relationships by grouping related items under the same partition key. For example, in a table of user orders, partition key could be UserID
and sort key the OrderDate
, so all of a user’s orders are in one partition, sorted by date. Including a sort key does not change how partitions are chosen (that’s solely based on the partition key’s hash), but it provides more flexible querying within that partition. A composite key still requires unique PK+SK pairs, so the sort key ensures uniqueness when multiple items share the same partition key (e.g., a user can have many orders distinguished by different sort key values).
Performance Implications: DynamoDB’s performance (single-digit millisecond reads/writes) is achieved by distributing data across many servers (partitions) and using primary keys to locate items with O(1) hash lookups. A request that specifies the partition key (and optionally sort key) can be routed directly to the appropriate partition, avoiding full table scans. This is why well-designed keys are critical – if you can use a Query
on a key, it’s efficient; if you have to Scan
the table because your key design doesn’t support a needed query, it will be much slower and consume more throughput. Every partition can serve a finite amount of capacity. By default, each partition can handle up to 3,000 read capacity units (RCU) or 1,000 write capacity units (WCU) per second. (One RCU = one strongly consistent read of up to 4 KB per second, or two eventually consistent reads; one WCU = one write of up to 1 KB per second.) If all requests target the same partition (i.e., same partition key), they are limited by these per-partition max rates, regardless of overall table throughput. In contrast, if requests spread across many keys (partitions), the table can achieve higher total throughput by parallelizing across partitions.
Hot vs. Cold Partitions: A hot partition is a partition receiving a disproportionate amount of traffic (reads/writes) or storing an outsized portion of data. Bad key design – e.g., using a timestamp as the partition key so all recent data piles into one partition – can cause hot partitions that lead to throttling and degraded performance. DynamoDB’s adaptive capacity can mitigate hotspots by automatically shifting capacity to hot partitions and even splitting partitions that exceed sustained throughput (a mechanism known as “split for heat”). For example, if one partition key is extremely popular, DynamoDB may transparently split that partition so that the key’s items are spread over two partitions, doubling the throughput available to that key. However, adaptive capacity is not a cure-all – it works best with short-term spikes and still assumes a mostly even key distribution. Relying on it to fix a bad schema can lead to unpredictable performance. The better approach is to design partition keys for uniform workload upfront.
Physical Data Distribution: The figure below illustrates how DynamoDB distributes items across partitions based on the hash of the partition key. Items with different partition key values ("Fish"
, "Lizard"
, "Bird"
, "Cat"
, "Dog"
, etc.) are stored on different partitions determined by DynamoDB’s hash function. Note that items are not sorted across the entire table – only within a given partition (if a sort key is used). A new item’s partition is selected by hashing its partition key, ensuring an even spread when keys are diverse.
DynamoDB partitions data based on the partition key’s hash. A partition key like "AnimalType"
is hashed by DynamoDB’s internal function f(x)
to decide which partition the item goes to. Here, items with AnimalType
“Dog”, “Cat”, “Bird”, etc., map to different partitions. Good key design yields many partition key values to spread items and requests evenly.
In summary, the primary key is the chief factor in DynamoDB’s scalability and performance characteristics. A well-chosen primary key (both partition and sort components) distributes data to avoid hotspots, supports your core query patterns directly, and thereby enables DynamoDB to deliver consistent low-latency operation at any scale. Next, we discuss how to model your data and choose keys in the context of your application’s data architecture.
2. Data Modeling and Architectural Considerations
Traditional entity-relationship (ER) modeling must be approached differently for DynamoDB. In relational databases, we normalize data into separate tables and use foreign keys to link them, then perform JOINs at query time. DynamoDB (and NoSQL in general) encourages a nearly opposite approach: denormalization and single-table design, optimizing data layout for your access patterns upfront. This means potentially storing related entities together and duplicating data to avoid expensive cross-table operations. Key design is central in this because your partition and sort keys will often encode relationships. Let’s break down some architectural strategies:
Single-Table vs. Multi-Table Design
Single-Table Design: AWS often recommends using one table for all (or most) of an application’s data. In a single-table schema, different entity types (e.g., Users, Orders, Products) are stored in the same table, distinguished by item attributes and carefully chosen keys. The partition key might include an entity type prefix, and the sort key might encode relationships. For example, a single table could have items where the partition key is Customer#<ID>
for customer records and Order#<ID>
for orders, but orders also carry a sort key that contains the customer ID, allowing orders to be queried by customer via a Global Secondary Index (we’ll cover GSIs later). The main idea is to co-locate related data: items that would be joined in a relational model are put into the same partition (with the same partition key) in DynamoDB so that a single Query
can retrieve everything, instead of multiple queries or scans across tables.
A classic use-case is a many-to-many relationship. In SQL, you’d have three tables (two entity tables and a join table). In DynamoDB single-table design, you can often represent this with two types of items in one table using an adjacency list pattern. One common approach: use a partition key that can represent either entity, and a sort key that distinguishes the linked items. For example, consider an e-commerce scenario with Customers and Orders (one-to-many relationship), plus perhaps Products (many-to-many with Orders). Instead of separate tables, you might design a single table where:
- Partition key = CustomerID, Sort key = prefixed with
ORDER#
and the OrderID for order items. This way, querying by a customer’s ID returns all their orders (all items in that partition). - Additionally, you could store a customer’s profile info as an item in the same partition (e.g., sort key =
PROFILE#CustomerID
) so that a single query fetches the customer and all their orders together.
Another example is a forum application (from AWS documentation): A table Thread
could store forum threads. If each thread item’s PK is ForumName
and SK is ThreadSubject
, all threads of a forum are in one partition. Now to also store replies, you might put replies in the same table with PK = ForumName
and SK = ThreadSubject#ReplyTime
so that replies are listed under their thread. This is a single-table approach (threads and replies in one table) where the sort key hierarchy encodes the relationship.
The benefit of single-table design is performance and scalability: You fetch or update related items with one targeted query, and you maintain DynamoDB’s O(1) key-value lookup nature for all access patterns. It also simplifies having multiple query dimensions without duplicating data in separate tables – you use GSIs to create alternate key access (more on that later). Amazon’s own usage of DynamoDB for massive systems (e.g., Amazon.com’s order pipeline) relies on single-table designs to achieve constant-time queries at scale.
Multi-Table Design: Sometimes, it can make sense to use multiple tables, especially if you have very distinct datasets or access patterns that don’t overlap. For example, an application might keep a user profiles table separate from a transactions table, if they are accessed differently and not needed together. DynamoDB imposes no join or foreign key constraints, so separate tables mean you’ll handle relationships in your application logic. Multiple tables are simpler to reason about (each table = one entity type) and may resemble a relational layout, but be cautious: if your use-case ever needs to combine data from those tables, you’ll end up doing multiple round-trip queries or scans, which is slower and costlier. In general, choose multiple tables only when the data really has no usefully overlapping access patterns. A common reason might be multi-tenant systems where each microservice or bounded context manages its own table for isolation. Another reason might be drastically different workloads (e.g., a high-write timeseries table vs. a read-heavy reference data table) – separating them could allow tuning capacity modes individually.
Denormalization and Redundancy: DynamoDB’s design favors duplication over complex querying. It is often worthwhile to store the same piece of data in multiple items if it spares you from a query that would otherwise scan or join data. For instance, if you have an Orders table and you need quick access to orders by Product, you might create a GSI with partition key as ProductID. But that only gives you order IDs perhaps; if you frequently need product name or price with those queries, it might be better to embed product name and price into each order item (denormalize it) so that queries by ProductID can retrieve all needed info in one go, rather than requiring a follow-up read to a Products table. This kind of redundancy is acceptable because DynamoDB writes are idempotent and you can use transactions (or careful ordering) to keep the duplicates in sync. The goal is to avoid multi-step lookups at read time – disk is usually cheaper than computation, and DynamoDB is optimized for storing and retrieving data quickly, not for relational joins. By eliminating the need for JOINs through thoughtful data duplication, you keep your queries simple and fast at scale.
Entity Relationship (ER) Modeling in DynamoDB: Instead of ER diagrams of normalized tables, you model based on access patterns. Start by listing the questions your application will ask (e.g., “Get a user’s profile and recent posts”, “Find all books in category X published in 2021”, “List top 10 scores for game Y”). Then design your table’s keys such that each of those questions can be answered with a Query
or GetItem
on a primary key or secondary index. In practice, this often means grouping different types of entities into the same item collection (same partition key) or creating composite keys that concatenate multiple pieces of identifying information. Adjacency list and materialized graph patterns are common techniques to model one-to-many and many-to-many relations. In the adjacency list pattern, you create “edge” items that establish relationships between entities by referencing their keys. For example, to model that Employee X works in Department Y, you might have an item with PK = EMP#X
and SK = DEPT#Y
(or vice versa) as a pointer. The presence of that item effectively links the two. Your queries can then fetch employees by department or departments by employee by leveraging these relationship items. The DynamoDB Developer Guide provides a rich example of modeling a complex relational schema (with employees, departments, orders, products, etc.) in a single DynamoDB table using such patterns.
Example single-table design (simplified). Here a many-to-many relationship between “racers” and “races” is modeled in one table using composite keys. The Partition Key (PK) and Sort Key (SK) are designed to intermingle two entity types: racer
and race
. Items with PK starting with racer-
represent “race results by racer” (each racer’s performance in a specific race), while items with PK starting with race-
represent “race results by racer” from the race perspective. By using such keys (e.g., PK=racer-1
, SK=race-1
for Racer1’s result in Race1, and PK=race-1
, SK=racer-1
for the same data indexed by race), the design supports querying all races for a racer, or all racers in a race. This single table replaces multiple normalized tables and enables efficient one-stop queries.
Designing a single-table schema requires more upfront thinking (what Alex DeBrie calls a “steep learning curve”), but it pays off with simpler, faster queries once in production. However, keep in mind the downsides: single-table designs can be inflexible if your access patterns change significantly (adding a new query often means backfilling data or rethinking keys), and they can make ad-hoc analytics difficult because the data is highly denormalized. If your application’s query patterns are well-known and stable, the single-table approach is usually superior. If you anticipate very dynamic querying needs or simply cannot determine usage upfront, you might lean towards more tables or indexes which sacrifice some efficiency for flexibility. The general advice is: understand single-table design principles even if you choose not to use them everywhere, so you can consciously decide when multiple tables are justified.
Item Collections and Aggregation Strategies
Item Collections: In a DynamoDB table with a composite key, an item collection refers to all items with the same partition key (i.e., a partition’s worth of items, all sharing the PK). Modeling relational data often means creating item collections that represent a logical entity and its related sub-entities. For example, for an online store, you might have an item collection per Order (with the Order header as one item and each Order Item as separate items sharing the OrderID as partition key). This way, retrieving an Order and all its Order Items is a single Query
on that partition key. Item collections can be a mix of different item types – the only thing tying them is the partition key. A practical tip: use sort key prefixes to group item types within a collection. E.g., in the Order collection, have sort keys like ITEM#1
, ITEM#2
for order line items, and maybe ORDER#INFO
for the order summary. Then a query on PK = Order123 AND begins_with(SK, 'ITEM#')
can fetch all line items only, while PK = Order123
with no sort condition fetches everything (including the summary). This kind of scheme is very powerful for modeling one-to-many hierarchies.
One-to-Many and Hierarchical Data: DynamoDB can model hierarchical data (like organizational charts, category trees, file systems) by cleverly using keys. A common approach is the materialized path pattern: store the path of the node as the sort key, so that a begins_with
query can fetch all descendants. For example, for a category hierarchy, you could use PK = “CATEGORY” (just a constant or a top-level id) and SK = “Electronics#Computers#Laptops” to represent that path. Then to get all subcategories under “Electronics”, you query begins_with(SK, 'Electronics#')
. Another approach is using separate item types for parent and child relationships (like storing an edge item linking parent and child as mentioned earlier). The design choice depends on how you need to query the data – by entire subtree vs. immediate children, etc. Sort keys are your friend in encoding hierarchy because they maintain order and allow range queries.
Many-to-Many: As touched on, adjacency list items are used. For instance, to model that a student is enrolled in a class (many students to many classes), you might create two types of items: PK = StudentID
, SK = ENROLLED#ClassID
and PK = ClassID
, SK = ENROLLED#StudentID
. These “link” items allow you to query by student to find classes or by class to find students. Both items might contain minimal data (just references) or some cached info (like the date of enrollment). This doubles the writes (you write two items for one enrollment action), but then reading either direction is efficient. It’s a conscious space-time tradeoff.
Single-Table vs Multi-Table Summary: Use a single-table (denormalized) design to maximize efficiency for known access patterns – it reduces the number of queries and leverages DynamoDB’s strengths. Use multiple tables only for truly unrelated data or when different parts of the data have vastly different usage profiles that can’t be accommodated in one schema. Even in multi-table setups, each table’s internal design will often mimic the above patterns for that subset of data. Designing your keys around access patterns is paramount: “Data is explicitly stored the way the application needs to use it, increasing query efficiency.”.
Having set the stage for modeling approaches, we now focus on the first part of the primary key – the partition key – and how to choose one that will scale.
3. Selecting Partition Keys and Distributing Data
The partition key (PK) choice is arguably the most important decision in DynamoDB data modeling. A good partition key ensures your workload is evenly spread across DynamoDB’s infrastructure; a poor choice can concentrate load and cause throttling or hot spots. Here are the key criteria and strategies for partition keys:
-
High Cardinality: Choose a partition key that has many possible values (ideally, much larger than the number of partitions DynamoDB will use for your table). The more distinct partition key values, the easier it is for DynamoDB to spread your data. For example, if your application has millions of users,
UserID
is a high-cardinality key – each user generates a separate partition key value. On the other hand, a boolean flag or a small set of categories is low cardinality and not suitable as a primary partition key. The AWS best practices guide explicitly labels things like “Status code (few possible values)” or “Item creation date (rounded to day)” as “Bad” partition key choices due to low uniformity. A good mental check: if you can list all possible PK values off the top of your head or on a single page, it’s probably not high-cardinality enough for a large table. -
Even Access Distribution: Not only should there be many key values, your application’s access patterns should ideally touch them relatively evenly. For instance, a partition key of
DeviceID
for an IoT table is high-cardinality (millions of devices), but if one device sends 90% of the traffic and others send 0.1% each, you still have a hotspot on that one device’s key. In an ideal scenario, each partition key is accessed at a rate roughly proportional to others. Uniform access may not always be achievable, but avoid an extreme skew where one or a few keys dominate traffic. If inherent data skew exists (e.g., one customer is responsible for majority of transactions), consider mitigating strategies (we’ll discuss sharding patterns shortly). -
Avoiding Hot Partitions: A hot partition occurs when a single partition key value receives a disproportionate amount of read/write traffic, exceeding what a single partition can handle, thus causing throttling. The classic example of a bad partition key is “date” – if you partition by today’s date, all of today’s data goes to the same key/partition. At midnight, it switches to a new partition. This design will cause the current day’s partition to be extremely hot, especially if you have a high volume of writes (and likely idle capacity on older dates). Similarly, using something like a product category as a partition key could be bad if one category (say “Electronics”) has far more items/traffic than others. Rule of thumb: If a single partition key might need to handle more throughput than the per-partition limit (3000 RCUs / 1000 WCUs) or store more than 10 GB of data (the item collection limit for LSIs), it will become a scaling pain point. Design to split such data across multiple keys.
-
Use Composite Keys to Your Advantage: If one dimension of your data doesn’t provide enough uniqueness or distribution on its own, combine it with another. For example, if you were tempted to use “Country” as a partition key (not good, since a few populous countries will dominate), you might combine Country with another attribute, or more effectively, avoid using Country and pick something like
UserID
orOrderID
which inherently has more variety. A composite primary key (with sort key) also doesn’t necessarily solve distribution (since partition distribution is only based on PK), but it gives you flexibility to include multiple aspects in key design. For instance, a partition key that is a compound of multiple fields (e.g.,CustomerID-ProductCategory
combined) can sometimes spread access better if, say, different customers focus on different categories. However, this approach can complicate querying (you need both parts to query). Often a better approach for uneven workloads is explicit sharding, described next. -
Write Sharding for Hot Keys: If you identify an access pattern that inherently slams one partition key, you can artificially shard it by adding a random (or calculated) element to the key. The DynamoDB Developer Guide suggests adding a random suffix to partition key values to spread writes. For example, instead of partition key “2014-07-09” for a date, use “2014-07-09#<random 1-200>” creating 200 distinct partition keys for the same date. Now, writes for that date will be evenly spread across 200 partitions, giving 200x throughput headroom. The downside is when reading, you must query all 200 and merge results, since you don’t know which shard a particular item might be on. This trade-off is worthwhile if write throughput is the bottleneck and read after writing is either infrequent or can tolerate doing parallel queries. A smarter variant is deterministic sharding: use a hash of some identifier to pick a shard number. For example, hash the OrderID mod 10 and append that to the partition key. Now writes are still spread (semi-randomly), but if you need to read a specific order and you know its ID, you can calculate which shard it went to. This way, single-item reads don’t require scatter-gather. You still need scatter-gather for range queries over a whole partition key group (like “all orders on 2014-07-09”), but depending on use case that might be acceptable. Sharding is essentially a manual way to tell DynamoDB: treat what would logically be one partition key as many. Use it only when necessary (i.e., you’ve proven a key is hot or will be hot).
-
Partition Key as Part of Composite Natural Key: Sometimes the data naturally has a composite identifier (e.g.,
TenantID
+UserID
). In such cases, you might designate the high-level component (TenantID
) as the partition key and the lower-level (UserID
) as sort key, but if you anticipate one tenant to be huge, that tenant’s data all lives in one partition (hot!). Instead, consider flipping them or mixing them (some folks use a deterministic hash of TenantID to multiple partitions). Alternatively, if cross-tenant queries aren’t needed, you could include theTenantID
in the key in a way that one tenant’s data spans multiple partition keys. For instance,PartitionKey = TenantID#UserGroup
(where UserGroup could be the first letter of username or something) – this splits a big tenant’s users into sub-partitions. -
Cardinality and Data Size Consideration: Each partition key corresponds to an item collection. If a single partition key has too many items (or very large items), that can hit DynamoDB’s limits. There’s a soft limit: an item collection can only be 10 GB if you plan to use a Local Secondary Index on that collection. Even without LSIs, extremely large collections can be unwieldy to query (a query on one PK could return millions of items, which might strain client or require pagination). If you find an item collection growing without bound over time (e.g., IoT sensor that keeps accumulating new readings under one device ID partition), consider strategies like time-based partitioning – e.g., include a time bucket in the partition key, such as
DeviceID#2025
for data in year 2025, so each partition key covers a limited timeframe. This ensures old data rotates out to a different key (or table) instead of one key growing indefinitely. -
Examples of Good vs Bad Keys: AWS’s guide gives simple examples: “User ID” – good (many users); “Device ID with uniform access” – good; “One device far more popular than others” – bad; “Status code (few values)” – bad; “Date (rounded)” – bad. Always ask: How many distinct partition key values will I have, and how is traffic spread among them? If the ratio of accessed keys to total keys is low, that’s fine as long as it’s not always the same key being accessed. For instance, even if you have millions of users, if your app usage on a given day only touches thousands of them (not all million), you’re still okay because it’s unlikely all those accesses concentrate on one user – it’s spread by nature. It’s problematic if your pattern inherently funnels most operations to one key (like an “aggregator” key). One anti-pattern is a “singleton” key that everyone touches (e.g., a single partition storing a global counter or global feed). For such cases, you must shard or redesign the access pattern.
-
Anticipating Growth: Think not just about current usage but future scale. A partition key that works for 100 GB of data might not for 10 TB. For example, using
City
as key might work if you have modest data and cities evenly used. But if you grow, “NewYork” partition could exceed 10GB or throughput limits. It’s safer to design keys that scale linearly with your data volume (like user IDs, order IDs, etc., which naturally increase cardinality as data grows). Remember you can’t change a table’s primary key after creation; you’d have to migrate data to a new table if it’s wrong. So choose keys with long-term scalability in mind.
Key Takeaway: An effective partition key maximizes the ratio of distinct keys to total data, and your workload touches those keys in a way that no single key is overwhelmed. If you find yourself with a use-case that inherently violates this, apply patterns like write sharding to divide the hot key’s workload among multiple keys. Keep an eye on CloudWatch metrics (e.g., ConsumedCapacity and ThrottledRequests with per-partition key insight) to detect any emerging hot keys. By ensuring even data and request distribution, you unlock DynamoDB’s full performance potential without surprises.
Next, we’ll examine strategies for the sort key (when using composite keys), which opens doors for more advanced querying patterns like hierarchies and time-series data.
4. Advanced Sort Key Strategies
While the partition key handles distribution, the sort key provides powerful capabilities to model relationships and query data flexibly within a partition. Designing the sort key requires thinking about how you need to query or access items that share the same partition key. Here are advanced patterns and best practices for sort keys:
-
Hierarchy and Composite Sort Keys: Sort keys can encode a hierarchy or multiple attributes concatenated. By using delimiters or prefix strings, you can pack meaningful data into the sort key. For example, suppose you have an application with Users and each user has multiple items: profile, posts, comments. You might choose a partition key of
UserID
and a sort key like<Type>#<SubID>
. So user 123’s data might have sort keysPROFILE#123
,POST#<postId1>
,POST#<postId2>
,COMMENT#<commentId1>
and so on. By doing this, you achieve two things: 1) All items for user 123 are in one partition (easy retrieval of all user data), and 2) You can query or filter by type by using abegins_with
on the sort key (begins_with(SK, "POST#")
to get only posts, for instance). This technique of prefixed sort keys is extremely common in single-table designs. It creates a logical grouping within the partition. In general, think of the sort key as a way to concatenate multiple fields you might want to query by. If your access pattern is “get all orders of a customer in a given year”, you could have SK =YEAR#2025#ORDER#<OrderID>
which allows a begins_with or between onYEAR#2025
. -
Sorted Order and Range Queries: One primary benefit of sort keys is to retrieve items in sorted order without additional processing. If you want to fetch the most recent entry or sort results by a timestamp or score, incorporate that into the sort key. Time-Series Data: A common pattern is PK = some entity (device ID, user ID, etc.) and SK = timestamp (ISO-8601 string or epoch) so that items are sorted by time. You can then easily query the last N items (by descending order query with Limit) or a time range (
between
start and end timestamps). This is great for event logs, sensor readings, or any temporal data per entity. One caution: If data is unbounded on time and a single device or entity will accumulate a huge number of records, that partition can get huge. Solutions include breaking the partition by time buckets (e.g., includeYearMonth
in PK or in SK as prefix). For example, PK = DeviceID#2025-05, SK = 2025-05-13T12:00:00Z (full timestamp). This keeps partition sizes manageable per month. -
Reversed Order Queries: DynamoDB sort order is ascending by default, but you can set the
ScanIndexForward
parameter to false to get results in descending order. This is useful for “latest first” queries (like latest posts, recent events). If you store timestamp as sort key, you can quickly get the newest item by descending query limit 1. Alternatively, some designs use a reverse timestamp (e.g., negative epoch or a bitwise NOT of timestamp) as sort key so that lexicographically it’s already in descending order. But that’s an old trick; usingScanIndexForward=false
is simpler. -
Prefixes and Suffixes in Sort Key: Besides prefixing the sort key with a type or category, you can also suffix it with additional info if needed. Essentially, the sort key can be a composite of multiple data elements. For example: SK =
OrderDate#OrderID
. This way, items are primarily sorted by date, but if two orders have the same date (or for uniqueness), the OrderID is included. It also means you could query all orders in a date range regardless of customer by using a GSI that inverts these (but focusing on sort key usage: a composite allows queries like “all orders on 2023-05-01” by doingbegins_with(SK, "2023-05-01")
). Another scenario: modeling a filesystem – PK = FolderID, SK =FILE#<filename>
orFOLDER#<subfoldername>
. Sorting then groups all files and subfolders together and alphabetically by name. If you wanted to enforce ordering by type (folders first, then files), you could prefix sort keys with something likeA_
for folders andB_
for files to control sort priority. -
Numeric Sort Keys and Short Padded Keys: If you want numeric sorting (e.g., sort by a score or rank), you might store the score as part of the sort key. DynamoDB sorts bytes lexicographically, which means if you store numbers as plain strings they might not sort numerically as expected (“100” comes before “99” lexicographically). To fix this, you either pad numbers to a fixed width (like always 5 digits, so 00100 vs 00099) or invert the number (if doing high-to-low sort, maybe subtract from a max). Another approach is to store numeric values in a separate attribute and use a GSI that projects that attribute as the index sort key, because GSIs can sort by numeric attribute if you design the GSI’s sort key as that numeric field. But without GSIs, within a partition you can only sort by the primary sort key which is always bytes (string or binary or number – numbers are sorted in numeric order actually if the sort key attribute is of Number type). So actually, if you make the sort key a Number data type, DynamoDB will sort in numeric order. This can be used for things like a leaderboard per game (PK = GameID, SK (Number) = Score, sorted descending). However, if multiple items can have the same score, you’d need to include a tiebreaker (like a unique ID) to make composite sort key unique. In such a case, a common trick is to use a high-resolution timestamp as sort key for ordering events.
-
Hierarchical Sort Keys for Tree Structures: Suppose you have an org chart with employees and managers. You might use PK = Company, and SK as something like
ManagerID#EmployeeID
or even the full chainCEO#VP#Manager#Employee
. If you store SK = full path, you can query all under a certain manager by a prefix match. There’s a well-known pattern where items store a path string and you query withbegins_with(path, 'A>B>C>')
to get everything under node C. This works but watch out for path length and item sizes if hierarchy is deep. Alternatively, storing parent pointers and doing recursive queries is another way (but that might require iterative queries, not as ideal). -
Sort Key for Heterogeneous Items (Overloading SK): In single-table, sort keys are often “overloaded” to serve multiple item types. For example, an item that represents a Customer’s profile might use SK =
PROFILE#
(some constant or the customer’s own ID), whereas an Order item in the same partition might use SK =ORDER#<OrderDate>#<OrderID>
. They don’t have the same structure, but that’s okay – DynamoDB does not enforce sort key formatting. The only requirement is if you query with a condition, the condition must apply to the SK data type in those items. So designing SK such that a prefix or range condition isolates just one type of item is a way to query that type. GSI Overloading (discussed later) is similar but at the index level. Within the base table’s sort key, you might also encode multiple pieces to allow different queries. Be consistent with delimiters and order of components in the sort key, as this defines what kind of range queries you can do. -
Time-to-Live and Sort Key Order: If you use DynamoDB’s TTL (Time to Live) feature to expire items (common for session data, temporary logs, etc.), those items will be deleted after a certain time. If your sort key is a timestamp, you might naturally be aging out old sort key values. TTL operates on an attribute you specify (not necessarily the sort key), but many use cases store an “ExpiresAt” timestamp attribute. This doesn’t directly affect sort keys except that if you are removing old items, your item collection stays fresh. Keep in mind TTL expirations are not immediate (could be up to 48 hours delay). If order matters, you might design such that even stale data doesn’t interfere – e.g., query only the last 100 items etc., chances are the expired ones are older than that window.
-
Maximizing Query Patterns with Sort Keys: A single sort key can often be used to answer multiple query patterns by careful design. For example, if SK =
Status#Date
for an order (likeSHIPPED#2025-05-13
orPENDING#2025-05-14
), you can query a customer’s orders by status by using begins_with on status, or by date range regardless of status, etc. But sometimes combining fields limits you (you might not easily query just by status unless you know date or vice versa). In those cases, that’s where secondary indexes come in (to create another key path). But still, think about whether a clever sort key could support a needed query. The general guidance: if a query is always scoped to a single partition (e.g., “find an order by ID for a given customer” – customer is partition, order ID could be part of sort key) then no secondary index needed; but if you need to query across partitions, that’s when GSIs show up.
Recap & Best Practices for Sort Keys: Use sort keys to reflect how you’ll query within a partition – common patterns are chronological (timestamps), alphabetical (strings), or composite (prefixes for types). Always ensure the combination of PK+SK is unique for distinct items (if not, you might have data clobbering). If you encode multiple parts in SK, document the format clearly for your team. Additionally, leverage DynamoDB’s condition expressions on sort keys: e.g., you can do Query PK=X AND SK > Y
(for greater than, less than on range) to do open-ended queries like “everything after this ID” or “newer than date”. This is great for paginating or incremental processing.
As an example of advanced sort key usage, imagine a logging system: PK = HostID
, SK = <LogLevel>#<Timestamp>
. This allows queries like “get all ERROR logs for host within last hour” (begins_with(SK, "ERROR#")
plus a time filter expression) and naturally sorts logs by time. Or a game leaderboard: PK = GameID
, SK (Number) = Score, and you write scores as negative values so lowest number = highest score when sorted ascending, or simply sort descending. Then a query on PK=GameID with ScanIndexForward=false
gives top scores. You might include UserID in SK to make it unique (Score and User combined).
Sort keys, when used cleverly, can eliminate the need for scanning or filtering inside the partition – the more you can do with a direct key condition, the cheaper and faster your queries will be. In the next section, we will expand on query practices and how to avoid falling back to inefficient scans by fully utilizing keys and indexes.
5. Query Optimization and Throughput Efficiency
Optimizing data access in DynamoDB means making the most of key lookups and avoiding expensive operations. The primary operations to retrieve data are GetItem
(by primary key), Query
(by primary key with optional sort key conditions, or on secondary index keys), and Scan
(which reads all items in a table or index). To achieve consistent high performance, follow these query optimization principles:
-
Prefer Query over Scan: A
Query
operation narrows the data access by partition key, and optionally by sort key, so it only touches the items that match the key criteria. In contrast, aScan
operation reads through every item in the table (or index) looking for those that match filter conditions. Scans are far less efficient and should be avoided on large tables whenever possible. The DynamoDB documentation explicitly says: “For faster response times, design your tables and indexes so that your applications can use Query instead of Scan.”. Scanning a big table can consume a huge amount of read capacity and time – for example, a Scan can return up to 1 MB of data per call, which might be hundreds of items, and if you have to filter out most of them, that’s wasted throughput. If you find yourself needing to scan regularly with filters, that’s a red flag in your data model – likely you need an index that supports that filter as a key, or you need to incorporate that attribute into your primary key. -
Filter Expressions vs. Key Conditions: DynamoDB allows a
FilterExpression
in Query and Scan to further refine results. But crucially, filtering happens after items are read. This means if you Query a partition and 100 items are retrieved, and then your filter keeps only 5, you still consumed the read capacity for 100 items. Filter expressions are fine if the data you filter out is relatively small or if using them sparingly, but they don’t save throughput on the items read from disk. The most efficient queries are those that rely purely on key conditions (partition key equality and sort key equality/range) to narrow down to exactly the items you want. Use filter expressions only when adding that attribute to the key schema isn’t feasible but you need to do some minor trimming of results. As a rule: if a filter would drop, say, more than 90% of items read, that’s very inefficient – better design an index to avoid reading them in the first place. -
Projected Attributes (Reducing Item Size): When you only need a few attributes from items, use a projection expression to retrieve only those attributes. This reduces the amount of data transferred and lowers read capacity consumption (since DynamoDB charges RCUs based on item size read). For instance, if your items have a large JSON blob attribute but a particular query only needs a small summary attribute, do
SELECT summaryAttribute
(via ProjectionExpression in SDK). This way, each item read might count as, say, 1KB instead of 20KB towards your RCUs, a significant savings if you’re reading many items. On GSIs and Scans, this is especially helpful to cut down unnecessary data. (Note: DynamoDB’s calculation of RCUs for strongly vs eventually consistent reads and sizes is such that rounding up to 4KB chunks – so reading 1KB vs 3KB both cost 1 RCU, but reading 5KB costs 2 RCUs, etc. So projection can help keep each item’s read size within a lower bracket.) -
Batch Operations: If you need to retrieve multiple specific items by primary key (not range queries, but discrete items), use
BatchGetItem
which can request up to 100 items in a single call. This can be more efficient than 100 separate GetItem calls because it reduces network overhead. DynamoDB will retrieve them in parallel internally (though note that BatchGet is eventually consistent reads by default and doesn’t guarantee all items returned if some keys were not found – but missing ones simply don’t appear). Similarly,BatchWriteItem
can write or delete multiple items in one call (with atomicity per item but not across the batch). Use these batch operations to amortize network latency when doing bulk load or bulk fetch operations. -
Beware of Sudden Spikes (Scan Bursts): If you do have to run a Scan (say for an occasional full table analysis or export), be mindful of the impact. A scan of a large table can consume a lot of RCUs quickly and possibly throttle regular traffic. The docs mention that a single 1 MB Scan can use up to 128 eventually consistent read operations if items are 4KB each (1MB/4KB=256 items, but since eventually consistent counts half, 128 RCUs). That’s like 128 reads happening in one go, potentially on one partition if the data’s contiguous. This can starve other queries to that partition and cause ProvisionedThroughputExceeded errors. To mitigate, DynamoDB provides Parallel Scan and Segmented Scan, where you split the scan into N segments that can be processed in parallel (each segment is basically a separate worker scanning a distinct subset of partitions). This can expedite scanning large tables using multiple threads or lambda functions concurrently. However, parallel scans will increase throughput consumption linearly with the number of segments, so you must have the capacity to sustain it. A safer approach for constant loads is to throttle your scan – e.g., only scan one segment at a time or sleep between scan requests – or better yet, use DynamoDB Streams or AWS DataPipeline / Glue to replicate data to an analytics store instead of scanning frequently. In summary, scanning in production, if needed, should be done carefully off-peak or in a controlled manner.
-
Use Secondary Indexes Appropriately: If you find yourself needing to query by some attribute that isn’t part of the primary key, that’s exactly what secondary indexes are for. Instead of scanning and filtering on that attribute, create a GSI or LSI so you can Query on it as a key. We’ll cover indexes next in depth, but from the query perspective: Always try to turn what would have been a scan+filter into a direct index query. For example, “find all orders with Status = DELAYED” – doing this via scanning the whole Orders table is terrible if only 1% are delayed. Instead, if this is a common query, consider a GSI where partition key = Status (and maybe sort key = OrderDate or something). Then query the index for PK=“DELAYED” to get only those orders. Even if “Status” has low cardinality (like a few states), as a GSI partition key it’s acceptable if those queries are needed (the trade-off is that such a GSI might suffer hot partition if one status is extremely common; a workaround would be to combine status with another attribute to increase cardinality).
-
Throughput and Cost Tuning: DynamoDB pricing and limits revolve around throughput and data size. Some tips:
- If using Provisioned Capacity mode, enable Auto Scaling so that if your traffic increases, DynamoDB can adjust RCUs/WCUs to meet demand (within set limits). This helps avoid throttle errors on unexpected traffic. Monitor the CloudWatch metrics “ConsumedReadCapacityUnits” vs “ProvisionedReadCapacityUnits” to see if you’re under or over-provisioned.
- If you have a very spiky or unpredictable workload, On-Demand capacity mode might be better. It costs more per request than provisioned (at steady state), but you don’t have to manage capacity and it can instantly serve double or triple traffic without prior scaling. Some users find on-demand saves money if their traffic is very low most of the time with occasional bursts, because you pay only for actual reads/writes, no continuous provisioning. On-demand still uses partition throughput limits internally, but it has more flexibility to add partitions on the fly. It’s also backed by adaptive capacity, which in on-demand mode can be quite responsive to isolate hot keys.
- If using Provisioned, be mindful of the burst capacity – DynamoDB can consume unused capacity credits (5 minutes worth of unused throughput) to serve temporary spikes. This is good for smoothing short bursts, but not a plan for sustained high load.
- Strongly vs Eventually Consistent Reads: Strong consistency gives you fresh data but costs more in read capacity (1 RCU can only read 4KB once per second, whereas eventually consistent can do it twice per second because it might return stale data from replicas). If you can tolerate eventual consistency (which many use-cases can, especially non-critical or non-latest data queries), use the default eventually consistent reads to effectively double your read throughput. If strong consistency is needed (like reading your own write in a single-region scenario), then plan capacity accordingly.
- Reduce Data Transfer: Large attributes (blobs, images) ideally should not be stored directly in DynamoDB if they are frequently accessed or updated – consider storing them in S3 and just keeping a reference in Dynamo. If they are stored in DynamoDB, try not to fetch them every time if not needed (use projections to fetch only when necessary). Also, keep item sizes small when possible, since smaller items = more items per partition capacity unit and better cache utilization in Dynamo’s backend.
-
Index Only Query Patterns: If you design a Global Secondary Index with the exact data you need (projected attributes), you can sometimes satisfy queries from the index alone without touching the main table. This is good because it offloads read activity to the index which might have different capacity provision. For example, you have a large table but you create a sparse GSI that tracks only “active items” with just a few attributes, queries on active items can hit this slim index very cheaply.
-
Paginating and Deep Queries: DynamoDB has a limit of 1 MB of data per Query/Scan response. If you expect more data, you must paginate using LastEvaluatedKey. Plan your access so that you don’t often have to retrieve enormous datasets in one go. If you need to process a whole table (for analytics or maintenance), consider enabling DynamoDB Streams or exporting to S3 via DataPipeline rather than scanning repeatedly. For deep pagination (say, page 100 of query results), DynamoDB isn’t great if you have to skip many items – you’d have to read through them or use the LastEvaluatedKey from prior page. A trick for time-ordered data: you can use sort key ranges to jump to a particular point. Or maintain a “pointer” in your own application state.
-
Avoiding Small Anti-Patterns:
- Avoid using
Scan
with a filter on the partition key itself – some think scanning withFilterExpression PK = X
is a way to query without specifying the key in the API. But that literally scans every item and returns those with PK X – which is much worse than just using Query API with PK X. Always use Query when looking for specific keys. - Don’t abuse
FilterExpression
for things that could be part of the key. E.g., if you find code like Query PK=Constant and filter on Type=ABC, that means you’re effectively scanning one huge partition of all items and filtering type. Instead, incorporate “Type” into the sort key or use a GSI where that type is part of key. - If you need to query by a prefix of sort key that isn’t easily done by
begins_with
(maybe complex patterns), consider storing a duplicated attribute for that. For example, if SK is a composite “Type#Value” but you often need to query by Value alone for one partition, you might keep an attribute Value too and a local secondary index on it (if within same partition) or a global index. Always weigh adding an index vs. doing multiple queries vs. data duplication – choose based on frequency and cost.
- Avoid using
By adhering to these principles, you ensure your DynamoDB usage remains efficient. Most performance issues can be traced to doing things DynamoDB isn’t optimized for – like scanning large volumes or filtering in the client. With the right key design and query patterns, DynamoDB queries run in predictable time (typically milliseconds) regardless of table size. This predictability at scale is one of DynamoDB’s biggest advantages, and it’s achieved by keeping almost all operations O(1) via hashed keys and avoiding “full table” operations in normal workflows.
Having covered how to optimize queries, let’s move to secondary indexes, which are crucial for creating additional access patterns without duplicating entire tables.
6. Secondary Indexes: GSIs, LSIs, and Index Design
DynamoDB offers two types of secondary indexes to accommodate queries beyond the primary key: Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs). These indexes are essentially alternative key definitions that DynamoDB maintains on your data, allowing efficient queries using different keys. Understanding when and how to use GSIs/LSIs – and their cost/performance implications – is key to flexible DynamoDB data models.
Global Secondary Index (GSI): A GSI is an index with a partition key and optional sort key that can be different from the base table’s primary key. “Global” means the index spans all partitions of the base table (i.e., index items can map to any table partition regardless of the base table’s partitioning). You can think of a GSI as a separate table under the hood: it has its own partitions, storage, and throughput settings (if using provisioned mode). When you write to the base table, DynamoDB will automatically and asynchronously propagate the item to any GSIs where the item has the indexed attributes. Each GSI must have a partition key (different or same as base PK) and can have a sort key. The primary key of the GSI item is this (GSI-PK, GSI-SK). You can then Query the GSI just like a table (using the GSI keys). For example, if your main table key is UserID (PK) and OrderID (SK), but you also want to query orders by ProductID, you could create a GSI with partition key = ProductID, sort key = OrderID (or OrderDate). Now DynamoDB will maintain that index – whenever an order item is written, it will add an entry under the product’s key in the index. Querying the GSI for ProductID = X will give all orders for product X without scanning the main table.
Important properties of GSIs:
-
Different Key Schema: The index’s PK and SK can be any top-level attributes from the table (String, Number, or Binary types). They do not need to include the base table’s PK at all (they can, but not required). Indeed, GSIs allow completely new query dimensions. For instance, base table might use OrderID as PK, but a GSI might use CustomerID as PK, enabling direct queries by customer.
-
Eventually Consistent by Nature: GSIs are asynchronously updated. That means there’s a slight propagation delay from the moment a base table write occurs to when it appears in the GSI. All GSI queries are eventually consistent (you cannot request strong consistency on a GSI). In practice the delay is usually small (under a second), but it’s possible a read after write on a GSI might not see the item if done immediately. Plan accordingly: if you need read-after-write consistency on an alternate key, consider instead using a transaction or reading the base table directly.
-
Provisioned Throughput: GSIs have separate throughput capacity (unless table is on-demand). You can provision read/write capacity for each GSI independently of the base table. This is powerful: if you have a heavy query pattern on an attribute, you can isolate it to an index and scale that index’s RCUs/WCUs appropriately, without affecting the base table’s capacity. Conversely, writes to the base table now cost extra: if an item goes to N GSIs, it will consume additional WCUs for each index write. Specifically, when you put or update an item, DynamoDB will write to the GSI if the item has the index key attributes. This additional write consumption can double or triple your cost if not careful. For example, adding one GSI means every table write might become two writes (one to table, one to GSI). With provisioned capacity, you must allocate WCU for the GSI as well. With on-demand, you’re just billed for the additional write throughput used. So only create GSIs that you truly need for critical queries.
-
Sparse Indexes: One beneficial aspect is that an item only appears in a GSI if it has all the attributes for that index’s keys. This means you can have a GSI that is intentionally “sparse”. For instance, suppose you have an
Orders
table and only some orders have an attributeDelayedReason
(present if order is delayed). You could create a GSI with partition key =DelayedReason
. Orders without that attribute simply won’t exist in the index. So the GSI ends up being an index of “delayed orders by reason”. This is very efficient to query all delayed orders of a certain reason (the index contains only relevant items). This sparseness is useful for filtering data: it’s often better to make an index that only includes the subset of items of interest (by using a field that only those items have) than to have a dense index and filter at query time. Another example: In a single-table design with multiple entity types, you could have a GSI where the PK isEntityType
and SK is some attribute – only items of that type will have that attribute and show up. Or simply add a boolean attribute like “IsActive=true” to certain items and index that to quickly fetch only active ones – inactive ones won’t have the attribute and won’t bloat the index. -
Duplicate Storage: A GSI will store a copy of any attributes you project into it. At minimum, it always stores the primary keys of the base table (so it can fetch other attributes if needed). But any projected attributes (either all, or a set you specify) count towards storage and item sizes in the index. If your table item is, say, 1KB but you only project 100 bytes of it to the GSI, then the GSI item is 100 bytes (plus keys). So you can save storage by not projecting unneeded attributes. However, if you frequently need some data from the index that isn’t projected, DynamoDB will fetch it from the base table at read time (called an Index Fetch), which adds latency and RCU cost. Thus, for performance-critical index queries, project all attributes that you will need in the query result so that it’s an index-only query. For less critical or to save space, project only keys and do an eventual fetch of others (but note: fetching uses eventually consistent reads on the main table, doubling that query’s cost perhaps).
Local Secondary Index (LSI): An LSI is an index that shares the same partition key as the base table, but has a different sort key. “Local” means it’s local to the partition – it doesn’t introduce a new partitioning, it just gives an alternate sort key for the same partition. LSIs must be created at table creation time (you can’t add later) and you can have up to 5 per table. Because LSIs use the same PK, queries on an LSI must specify the partition key, just like the base table. They are useful when you have different ways to query items within the same partition. For example, suppose your table’s primary key is UserID (PK) and Timestamp (SK), but you also want to query user’s data by Category
. You could define an LSI with PK = UserID, SK = Category. Now for each user partition, DynamoDB will maintain an index sorted by Category. This would let you quickly query all items of a user in a certain category (with a Query on the LSI specifying UserID = X and Category = Y as key condition). Under the hood, LSIs are implemented by storing another sorted copy of the item’s keys within the same partition. The item’s data is not copied separately in storage; rather, the index is an alternate view on the same partition’s data. This leads to a special limit: All items in a partition, including copies in LSIs, count towards a 10 GB limit per partition key. If you exceed 10GB of items for one partition key, you can’t add more, which is why heavy usage of LSIs on a very large partition is not recommended.
Properties of LSIs:
- Strongly Consistent Reads Possible: Because LSIs are local and essentially part of the base partition, you can request strong consistency when reading from an LSI (unlike GSIs). Also, updates to LSIs occur transactionally with the base item (they’re part of the same partition write), so there’s no replication lag issue – reads from LSI will always reflect the current state of that partition’s items. This is one advantage if you need an alternate sort key query with absolute consistency (though many find eventual consistency acceptable in most cases).
- Same Throughput as Base Table: LSIs share the provisioned throughput of the table; they do not have separate capacity. A read or write on an LSI-indexed attribute consumes capacity from the table’s allocation. Writes to items will update all LSIs synchronously, counting against the table’s WCU. So LSIs don’t let you offload throughput like GSIs do; they instead ensure strong consistency and atomicity.
- When to use LSI: LSIs are good if you need to query by different sort keys on the same partition frequently. A common scenario: a table of orders with PK=CustomerId, SK=OrderDate, and you want another query “find latest orders by amount” – you could define LSI with PK=CustomerId, SK=OrderTotal. Now you can query a customer’s orders sorted by amount (or filter top N by not scanning the whole partition). Because you must specify the same partition key, LSIs are not for querying across partitions (that’s GSI’s role). They’re more like giving you multiple sorted views of a partition’s data.
- Trade-offs: LSIs add storage overhead (they duplicate the sort key and any projected attributes for each item in the index) and add some write overhead (slightly larger item to store). Also, they restrict some scalability: since all of a partition’s items (and their LSI entries) share partition throughput, if you frequently query the LSI and the table at the same time for the same partition, you might contend on that partition’s resources. If your partition keys are well-distributed, that’s not a problem. If one partition key is huge (lots of items) and you use LSIs on it, scans or heavy queries on that partition via LSI could throttle things (plus the 10GB limit).
- Alternative to LSI: If you find you want an LSI but your partition might be too large, one approach is to incorporate that attribute into the primary key itself or use a GSI. For example, instead of LSI on Category, you could make Category part of the primary sort key (like SK = Category#Timestamp). Then to get all items by category, you still have to scan within the user’s items though unless you also query by category prefix. Or use a GSI with PK = UserID#Category (combined) and SK = Timestamp – effectively partition the user’s items by category at the GSI level. That would circumvent the 10GB limit by splitting into multiple partitions on the GSI.
Index Cost and Performance Considerations:
-
Writes: Every index (GSI or LSI) adds overhead to writes. On a write (Put/Update/Delete), DynamoDB will do additional write operations for each index. For GSIs, these are additional capacity units consumed. E.g., writing a 1KB item to table with 2 GSIs that each index this item will cost 1 WCU for table + 2 WCU (one for each GSI) = 3 WCU total. Additionally, if an item is updated such that an indexed attribute changes value or an item gains/loses an indexed attribute, that can result in delete+insert in the index (if the item’s indexed key changed, DynamoDB will remove the old index entry and add a new one). This means write amplification. High-write scenarios should minimize the number of GSIs especially on frequently updated attributes. LSIs, being local, don’t have separate WCU but still double-write to storage in the background.
-
Reads: Querying an index consumes RCU from the index’s capacity (for GSI) or table capacity (for LSI). If you project only a subset of attributes, queries can be cheaper since less data is read. However, if your use-case requires strong consistency or transactional behavior, remember GSIs can’t serve strongly consistent reads and also aren’t included in transactions (transactions operate on base table items only, though they can target multiple items including those that would appear in indexes). If you absolutely need an atomic consistent read of two different key access patterns, you might have to store that data redundantly in one item or use careful transaction logic instead of relying solely on indexes.
-
Index Key Design (Pitfalls): When designing GSI keys, apply similar thinking as table keys. Don’t create a GSI with a very low-cardinality partition key (e.g., a GSI where PK is “Type” which can be say 3 values). That index will have at most 3 partition key values and likely be unbalanced/hot if accessed frequently. For instance, a GSI on “IsActive” true/false – if 90% items are active = almost all index items under “true” partition -> hot index partition. If you needed that, you might shard it (like GSI PK = “ACTIVE#region” or something to break it up). Similarly, for GSI sort key, consider what query patterns you need (e.g., do you need range queries on the GSI?). If not, sometimes just a GSI partition key is enough (you can create a GSI with only a partition key, no sort key). For example, to query by email address to find a user, you could have GSI PK = Email (no sort key, since email is unique presumably). Then
Query
on that GSI with Email value returns at most one item. -
Index Projections: You have three projection types: KEYS_ONLY, INCLUDE (only specific attributes), and ALL (all attributes). Choosing KEYS_ONLY makes the index lightweight (just storing primary keys of table in the index) – useful if you only want to check existence or do keys-only operations. But usually you’ll want at least some attributes to avoid fetching from the base table (which doubles read cost for any missing attribute, as Dynamo will fetch non-projected attributes on-demand). Best practice is to project the attributes that are frequently needed by that index’s queries and nothing more. That strikes a balance of not duplicating entire item, but also not needing frequent fetches.
-
GSI vs LSI – when to use which: LSIs are somewhat niche due to their constraints. Use an LSI if:
- You need an alternate sort key for querying items that will always share the same partition key you already have.
- You need strong consistency on that alternate query.
- You can design the table from the start with that LSI (since can’t add later).
- Your partition keys won’t accumulate over 10GB of data (or you accept that limit).
Example: You have an IoT device table PK=DeviceID, SK=Timestamp for readings. You also want to query by reading type (say Temperature vs Humidity readings) for a given device. An LSI on reading Type could be appropriate (PK=DeviceID, SK=Type). Then you can Query for DeviceID=123, SK begins_with “Temperature” – DynamoDB can rapidly return all temperature reading items (because internally it keeps a hidden index sorted by Type). However, note you might get them unsorted by time now (because LSI sort key is type, not time). If you needed both, you might even embed time in the LSI sort key or use multiple LSIs (one for Type, one for something else). But remember the 5 LSI limit.
Use a GSI if:
- You need to query across different partition keys or need a totally different partitioning.
- You want to add an index after table creation (only GSIs can be added later).
- The query doesn’t need to be strongly consistent (most typical).
- You want to scale the throughput of that access pattern independently (e.g., heavy read queries on index with eventually consistent fine).
- You potentially want to index only a subset of items (sparse index technique) by making the index key only present on those items.
Example: In a multi-tenant app with table PK=TenantID#UserID, SK=UserData, you might want to quickly get “user by email” regardless of tenant. A GSI with PK=Email, SK=TenantID could allow a query by Email to find the associated tenant and user. Or if you have Orders table and want to query by Product, GSI PK=ProductID makes sense (like the earlier example). LSIs wouldn’t help for that cross-entity query.
Overloading and Combining Indexes:
A powerful technique mentioned earlier is GSI Overloading, where a single GSI is used to answer multiple query types by using a generic key that can represent different things. For instance, in the AWS example, they had an index where some items used EmployeeName
as the index sort key and others used DepartmentID
as that sort key, enabling the same GSI to support queries by employee name and by department (distinguished by an attribute in the index key). They did this by making sure the index partition key had a fixed set of values (like “HR” vs “OE” indicating HR data vs OrderEntry data) and the index sort key was used differently per type. Essentially, one index can be “overloaded” to serve different query needs if you carefully design the indexed attributes and their values. This is advanced and requires planning so that the index doesn’t mix data in a way that breaks queries. The benefit is you stay within the limit of 20 GSIs per table by consolidating where possible. If one GSI can do the job of two by indexing a more general attribute, use it. But ensure that query patterns don’t conflict (like you wouldn’t want an index that sometimes has numeric sort keys and sometimes string in a way that queries overlap incorrectly).
Another note: Each GSI is essentially a table – so you could design one GSI that covers a subset of functionality and another for different queries. Don’t create a GSI per query if you can reuse. But also don’t try to jam everything into one GSI if it complicates more than it helps – it’s a balance.
Finally, consider index maintenance costs: If your table is extremely write-heavy and the additional write latency or cost of GSIs is a concern, lean towards designing primary keys that naturally cover most queries (so you need fewer indexes). Or consider splitting certain data into another table if that yields simpler indexing. But generally, DynamoDB can handle quite a number of GSIs fine – just monitor the replication lag (there’s a CloudWatch metric for it) if you do extreme volumes.
In summary, GSIs and LSIs provide the necessary flexibility to query by alternate keys, at the expense of additional writes and storage. Use GSIs for global query needs (different partition key) and LSIs for local alternative sort orders. Use sparse indexes to your advantage to index only what’s needed. Keep an eye on costs and cardinalities to avoid hot spots in indexes as well (the same principles of key design apply within indexes). Properly utilized, indexes will make your application’s access patterns much richer without resorting to scans or complex client-side logic.
Next, let’s focus on performance and scaling tactics – how to tune throughput, handle large scale growth, and keep costs in check as your DynamoDB usage increases.
7. Performance Tuning and Scaling Tactics
One of DynamoDB’s selling points is scalability without performance degradation, but achieving that requires mindful provisioning and key design to avoid bottlenecks. In this section, we discuss how to tune DynamoDB for high performance and scale, including capacity provisioning, partition management, and cost-aware design choices.
Throughput Provisioning and Auto-Scaling
DynamoDB offers two capacity modes:
- On-Demand Capacity: DynamoDB automatically accommodates your traffic up to certain limits and charges per request. There’s no need to specify RCUs/WCUs, making it essentially maintenance-free. On-demand mode is great for unpredictable or bursty workloads. It leverages Dynamo’s internal adaptive capacity heavily to absorb bursts. Keep in mind if your traffic doubles, your cost roughly doubles – so for consistently high traffic, on-demand might be pricier than provisioned capacity (beyond a threshold, provisioned with autoscaling might be cheaper).
- Provisioned Capacity: You specify an allocation of read and write capacity units. You can adjust these as needed (manually or via auto-scaling rules). This mode can be more cost-effective for steady workloads or if you want to ensure reserved throughput. However, you must monitor usage to avoid throttling if you underestimate, or overspend if you over-provision.
Auto-Scaling: Use auto-scaling on provisioned tables to automatically increase/decrease capacity based on utilization. Typically, you set a target usage percentage (say 70% utilization) and min/max limits. Auto-scaling reacts to load patterns (with some delay) to keep you within target. It’s not instantaneous for sudden spikes, which is why even with auto-scaling some designs choose on-demand for spiky scenarios (auto-scaling can increase capacity within a few minutes, but if you have a sudden 10x spike for a short period, on-demand would handle it without any prior setup whereas provisioned might throttle until auto-scaling catches up).
Reserved Capacity: If you know you will use a certain baseline throughput for a long time, AWS offers reserved capacity pricing (like commit to X RCUs/WCUs for 1 or 3 years) at a discount. This is a pure cost optimization and doesn’t affect design except budgeting.
Partition Behavior and Adaptive Capacity
As your table grows in size or traffic, DynamoDB will automatically split partitions to maintain performance. Initially, when you create a table with a certain provisioned capacity, DynamoDB will allocate enough partitions to meet that capacity (each partition 1000 WCU/3000 RCU). If you increase provisioned throughput beyond what current partitions can handle, DynamoDB adds more partitions. Also, if any partition’s data size exceeds ~10 GB, it splits on size. This splitting is transparent; however, splits do not redistribute existing items’ keys – they divide the key space. Usually, if one partition is too full or hot, splitting helps because some of the keys (like half the hash range) move to a new partition, balancing load.
Adaptive Capacity: DynamoDB’s adaptive capacity is a behind-the-scenes mechanism that shifts unused throughput to partitions that need it and can even isolate “hot” keys to dedicated partitions quickly. For example, if one particular partition key is hot, Dynamo might decide to give that partition more share of capacity (borrowing from idle partitions) or even split that partition so that key ends up alone. This process, sometimes termed “split for heat”, can occur in minutes for sustained overloads. A demonstration from an AWS blog showed how a single hot partition handling 9000 reads/sec (with eventually consistent reads leveraging all 3 internal replicas) was split after 10 minutes, doubling the throughput capacity (since now the hot key was on two partitions). Adaptive capacity ensures that short-term or partial key skews don’t immediately throttle your application – it finds unused capacity and applies it where needed on a best-effort basis.
However, as noted earlier, adaptive capacity doesn’t absolve you from good key design. It’s there to help with uneven but not pathologically bad distributions and sudden shifts. If you have a single key that uses 90% of traffic consistently, DynamoDB will likely split that to the max, but if after splitting a few times that key alone still requires more than a partition’s max, you will hit a wall. In short, design for gradual scaling across keys, and consider adaptive capacity as a safety net, not a primary strategy.
Handling Throttling and Hot Keys in Production
Even with planning, you may encounter throttled requests (ProvisionedThroughputExceeded exceptions). AWS re:Post and docs have guidance to identify and fix hot keys. Use CloudWatch metrics at the table and possibly at the partition level (they introduced some metrics to see per-partition usage in AWS contributor insights or CloudWatch if enabled). If a particular key is the culprit, you might implement write sharding after the fact: e.g., start writing new data for that key with random suffices and adjust reads to aggregate. This can be tricky to retrofit but is doable. Another approach if possible is to introduce caching (like DynamoDB Accelerator, DAX) in front of reads to reduce load on a hot key if reads outnumber writes. For writes, you might queue or batch them.
If throttling is due to reaching account limits (like sudden large ramp-ups on on-demand tables can hit account-level limits), you might need to request a limit increase or design a gradual warm-up.
Cost-Aware Key and Data Design
Performance and cost go hand in hand in DynamoDB. Some strategies to keep costs down while maintaining speed:
-
Avoid Overfetching: As mentioned, don’t read 1MB if you only need 1KB. Key design helps here (targeted queries) and using projections to fetch only necessary attributes. This reduces RCU usage (cost) directly.
-
Use Smaller Items or Compressed Data: If you have large blobs of data that are rarely needed in full (e.g., user profile with an avatar image or a large preferences JSON), consider splitting them into separate items or compressing them. For example, a user’s main item might have all small fields, and a secondary item (same PK, different SK like “#LARGE”) might hold the big JSON blob. The application reads the main item frequently (cheap) and only reads the blob item when absolutely needed (like editing profile). This way, typical reads avoid incurring cost of the large field. You can also compress JSON strings (in the application) before storing to cut down size (which saves storage and RCUs, at the expense of some compute for compression).
-
TTL (Time to Live) to remove stale data: If you have data that only matters for a time (e.g., session records, ephemeral events), set a TTL attribute so DynamoDB auto-deletes it after expiration. This frees storage and reduces the size of partitions, which keeps queries faster and cheaper long-term by not having to sift through expired items. TTL deletions are free (no WCU charged for the deletion).
-
Be mindful of indexes cost: Each GSI not only adds write cost, but also storage cost (GB-month) for storing index items. If an index isn’t providing enough value (maybe a query pattern changed and you no longer use it often), consider removing it to save cost. Or consolidate indexes as discussed. Also note DynamoDB now supports table class (Standard vs Standard-Infrequent Access). If you have an index (or table) that is big but rarely read, you could mark it as infrequent access class to save on storage cost (at expense of higher RCU cost). This is similar to S3 storage classes. For example, an index that exists purely for occasional analytics might be cheaper as infrequent access.
-
Partition Key and Attribute Sizes: Remember the primary key attributes count towards item size and are stored in each index. If you have very long string keys (like a 500-byte customer ID string), that gets stored in every index entry as well. If possible, use shorter keys (maybe a hash or numeric surrogate) to reduce size. Also, binary keys might be slightly more compact than string if you’re encoding something.
-
Batched Writes and Reads: Using BatchWrite and BatchGet can reduce the number of API calls (which also matter if using through API Gateway or something with per-request costs). So from a cost and throughput perspective, batch up operations when feasible (without exceeding DynamoDB’s limits per batch of course).
-
Leverage Streams for Asynchronous Work: DynamoDB Streams can be enabled to capture changes and process them asynchronously (e.g., to update aggregations, push notifications, etc.). Using streams doesn’t directly lighten DynamoDB’s own load, but it can allow you to keep your core tables lean and move heavy post-processing to consuming the stream. For instance, instead of updating a counter attribute in dozens of items whenever an event happens (which would be a lot of writes), just insert one event item and use a Lambda on the stream to increment counters elsewhere or in a separate summary table. This way the main table writes remain minimal and one per event, and the expensive aggregation is offloaded.
-
Scaling Beyond Limits: In extremely large systems (say > 100k writes/sec), you should distribute the load as much as possible. That might mean partitioning your data model horizontally into multiple tables if one table’s key space becomes a bottleneck (though Dynamo can scale single tables impressively high if keys are well-distributed; many AWS customers handle tens of millions of RCU/WCU on single tables by design). Also consider multi-region replication (Global Tables) – not exactly a cost saving, but for performance to serve globally, you might use multiple regions to keep latency low for users, which implicates key design if you want to avoid conflicts (we’ll touch on that).
Large Items and Dataset Scaling
As the dataset grows:
-
Table Partitions Increase: Dynamo will keep splitting partitions as needed. No action needed as long as keys are good, except be aware of the maximum items per partition key with LSIs (10GB). If you approach that, redesign to include more partition keys (like time part in PK as mentioned). If storing extremely large attribute values (close to 400KB each), ensure you truly need to; you might shift them to S3 if random reads/writes to that data are needed (S3 might be cheaper for very large payloads and streaming).
-
Data Modeling for Size: Some use cases require storing huge historical data (like logs for years). Rather than one table with a giant retention, consider archiving old data (to S3 or to a backup or a separate “cold” table with infrequent access class). For example, a time-series table could annually roll over to a new table each year or move older partitions to a different table or delete them via TTL. This keeps working set smaller.
-
Global Tables for Distribution: If you use Global Tables (multi-region), each region’s capacity is separate and you pay per-region. Key design remains the same per region, but note that global tables use last-writer-wins conflict resolution. If two regions write to same key at near time, the one with later timestamp wins and the other is lost. We mention this here because to scale globally and avoid conflicts, some designs route writes of certain keys to a “home region” or include region in the sort key if merging data. However, if not designing a multi-master scenario, you might not need special key changes aside from awareness that identical primary keys in different regions represent the same logical item.
Monitoring and Tuning
Regularly monitor:
- Consumed vs Provisioned Capacity: If consumed is consistently much lower, you may scale down to save cost. If approaching provisioned often, maybe scale up or check for hot keys causing uneven usage.
- Throttling Events: Even a few indicate something to look at (could be traffic spikes beyond config).
- Latency: If some queries are slower, check if they are using scans or fetching from base table (if an index isn’t projected fully). Perhaps redesign those queries.
- Error rates: Conditional check failures vs throttles vs other errors can tell if maybe transaction or consistency assumptions are okay.
By combining these scaling tactics – adaptive key design, proper capacity mode, and cost optimizations – you can run DynamoDB at high scale efficiently. Many AWS customers run at millions of requests per second on DynamoDB by using these principles (e.g., ensuring partition keys are super-distributed and using GSIs where needed to avoid hot partitions, plus maybe caching for read-heavy scenarios).
Now that we’ve looked at general performance, let’s explore some advanced techniques that utilize primary keys in creative ways for additional capabilities like multi-tenancy, versioning, and handling unusual requirements.
8. Advanced Design Techniques and Patterns
In this section, we cover several advanced patterns that developers can employ in DynamoDB primary key design to solve particular problems or optimize further. These include overloading keys, using sparse attributes, tracking historical versions, and managing unusually large items or attribute sets.
Key Overloading and Multi-Purpose Keys
Key Overloading refers to using the same attribute or key for multiple purposes depending on context. We touched on this concept with GSI overloading; it can also be applied to primary keys in single-table designs. For instance, you might have a single table storing different entity types and use a composite partition key like EntityType#EntityID
. Here the prefix indicates what type of entity (e.g., "USER#100", "ORDER#200", "PRODUCT#XYZ"). This is overloading the partition key with an embedded type indicator. It allows different entity types to live in one table without clashing keys, and your access patterns can target specific prefixes. Similarly, the sort key could be overloaded: for a user partition, you might use sort keys like "ORDER#" to list orders and "PROFILE#" for the profile. The sort key attribute’s content and meaning vary by item type.
Another form is using a single GSI to serve multiple query types by overloading keys. Recall the example where the same GSI had to support querying by employee name and by department – the solution was to use an attribute that takes on different forms for different items. Some items put employee name into that attribute, others put department ID, and a "Type" field indicated how to interpret it. A query on GSI for "Name = John Smith" would return only the item where that attribute is an employee name, and a different query "DeptID = 5" would return items of type department. By designing values carefully (maybe prefixing them like "NAME#John Smith" vs "DEPT#5"), you ensure queries don’t overlap incorrectly. This effectively multiplexes multiple indexes into one.
Overloading keys must be done with care to avoid ambiguity, but it’s a powerful pattern to minimize the number of distinct indexes/tables. It relies on the application to assign and interpret key values appropriately (essentially a convention that “if PK starts with X#, it’s of type X”). DynamoDB itself doesn’t care – it treats them as opaque strings – but your application logic and query patterns enforce meaning.
Use Case: In multi-tenant apps, you might encode tenant in the key. E.g., PK = TenantID#UserID
and use a GSI on UserEmail
for login. If you want the GSI to be global (across tenants unique email), you might actually use GSI PK = Email
and include Tenant in another attribute to fetch if needed. Or if emails can repeat across tenants (not unique globally), you might have GSI PK = Email#TenantId
. That’s key overloading as well (combining two pieces in one). It ensures uniqueness and allows queries like "find this email in this tenant".
Benefit: Overloading can eliminate the need to store separate attributes for type or category if you encode it in key, which can save space. It also can reduce index count. The drawback is it can be less intuitive and requires consistency in key construction.
Sparse Attributes & Indexes
We discussed sparse indexes (only items that have certain attributes show up). More generally, sparse attributes refer to attributes that are not present on all items, only on those where relevant. DynamoDB’s schema-less nature means you can have “sparse” data easily. This property can be exploited in design:
-
Polymorphic Items: In single-table design, different item types can have totally different sets of attributes. E.g., an Order item might have
OrderDate, TotalAmount
while a Product item might havePrice, Supplier
. They can coexist because each item need not have attributes that don’t apply. Your GSIs can target attributes only present on one type. For example, create GSI “OrdersByDate” with PK =OrderDate
– only order items haveOrderDate
, so only they appear in the GSI. Your table might contain customers, products, etc., none of which pollute that index because they lack that field. This is an example of purposeful sparsity to keep indexes focused. -
Soft Deletes / Flags: Instead of immediately deleting an item, some designs add a flag attribute like
Deleted=true
. If you have a GSI that only indexes items withDeleted=false
(or only indexes items without theDeleted
attribute), then “deleting” an item by setting the flag will cause it to vanish from the index (because it now fails the sparse condition). This can be used for “recycle bin” behavior or archiving without actually removing data from the main table. -
Indexing Subsets: If you want to index only certain items that meet criteria, you can do it by controlling which items have the index key attribute. For instance, an e-commerce might index only orders over $100 in a “HighValueOrders” GSI by having an attribute
HighValue
on those items, and using that as GSI partition key (likeHighValue=YES
). Alternatively, you can directly set an attribute likeValueCategory = HIGH
vs not present for others. This way, your GSI stays small and quick to query (“show me high value orders”).
Note: You cannot directly use a condition like "attribute exists" as an index key in Dynamo (the key either exists or not per item). But the existence itself acts as membership. In design, sometimes people add a dummy attribute (e.g., InIndex = 1
) on items to force them into a particular GSI, and omit it on others to exclude them.
Historical Data and Versioning
If you need to keep historical versions of items (audit trail, temporal data, or multi-version concurrency), DynamoDB primary keys can help:
-
Using Sort Keys for Versions: A straightforward method is to include a version or timestamp in the sort key, so each version of an entity is a separate item in the same partition. For example, PK = DocumentID, SK = VersionNumber (or timestamp of update). The latest version could have a special sort key (like an alias “LATEST”) or you always query for highest sort key. To get the current state, you query and take the last item; to get history, you query the whole partition. This is append-only versioning. It’s great for audit logs (each change is recorded). Write costs increase because each update produces a new item rather than updating in place. If you do this, consider TTL on older versions if indefinite retention isn’t needed.
-
Optimistic Locking with Version Attribute: Another approach is to store a version number attribute and use conditional writes (
ConditionExpression
with expected version) to ensure no lost updates. This doesn’t necessarily involve primary key, but sometimes people incorporate version into sort key to guarantee uniqueness of each write (like using a new sort key each time, which is basically the first approach). -
Soft Deletes and Shadow Items: If you want to keep “deleted” items for history, instead of deleting, you might mark them as deleted and perhaps move them in the key space. Some do patterns like on deletion, set sort key to something like
DELETED#timestamp
(which moves it out of normal query ranges) or simply keep as is with a flag. Or have a separate archive table where a deleted item is copied. -
Event Sourcing: Some advanced designs treat DynamoDB not as a current state store but as an event store (like append-only log of events). Partition keys could be aggregate/entity ID, and each event (change) is an item with sort key as sequence. Then current state is derived by reading the events. This is heavy on read side though. But for strict audit and complex business logic, it’s an approach. If doing event sourcing, ensure to control partition key size (e.g., snapshot events or periodic compaction if an event list grows huge).
Large Item Strategies
DynamoDB item size max is 400 KB. When dealing with large data:
- Chunking: If you have large blobs or arrays, you can chunk them into multiple items. For example, an item with PK = FileID and an attribute that’s large (like file content) could be split into items with sort keys
part1, part2, ...
. You then reconstruct in application. This is somewhat manual and typically if you reach that scenario, storing in S3 and just linking might be cleaner because S3 doesn’t have such small limits and is cheaper for large binary storage. - Pagination in Data Modeling: For user-facing features like a timeline or feed, rather than having one item with an ever-growing list, break that list into items by time or index. E.g., a user’s notifications – instead of one item with 1000 notifications in a list attribute, have partition = User, sort key = NotificationId for each notification. That’s more items but smaller. DynamoDB performs best with many small items rather than few huge items.
- Attribute Explosion: If you have a lot of attributes (wide item), note that retrieving the item will read all by default unless projection used. If only a subset is usually needed, consider splitting into two items keyed by same PK: one core item with frequently accessed attrs, one extended item with rarely accessed attrs. Or use an LSI if it helps to project only some attributes for certain queries (not typical; LSIs mostly about alternate sort key).
- Compression: Already mentioned, but especially for text-heavy data (like logs, JSON), compressing (gzip) can often cut size by 70% or more. The trade-off is you can’t query inside compressed data (so this is good for data you treat as blob). But if it’s, say, a large Markdown description that’s rarely updated, storing compressed saves capacity every read/write. There’s no native support, but it’s easy to compress in your app layer before put and after get.
Handling Relational Constraints without Joins
Some advanced techniques address the lack of joins:
- Duplication and Two-Way Links: You might store pointers in both related items. E.g., an Order contains CustomerID, and maybe you also store a list of OrderIDs on the Customer item (for quick reference or quick access in one fetch). But updating that list for every new order might be heavy – a better way is often an index or just querying orders by customer. However, if you needed to get customer and their latest order in one go, you can use a clever trick: store the latest order fields on the Customer item as denormalized. Then a GetItem on customer gives you basic info plus maybe last order date/ID. More rigorous approach: Use transactions to keep things in sync (like in a TransactWrite, update both customer and order).
- Inverted Indexes for Many-to-Many: If you have a many-to-many and want to query both directions efficiently, you might use an intermediary item as mentioned with adjacency lists. Another pattern is an inverted GSI: for table with PK=A, SK=B, create GSI with PK=B, SK=A (just flipping them). This effectively is a built-in many-to-many index. It doubles writes obviously (like storing edges twice). But then you can query by either.
- Conditional Writes for Uniqueness: DynamoDB doesn’t have unique constraints beyond the primary key. But you can enforce uniqueness of another field using a unique-index pattern: either maintain a separate table or item that represents that field. For example, to ensure no two users have same email, you might have a table or item keyed by email. When creating a new user, do a conditional Put on that email item (succeeds only if it doesn’t exist). Or use transactions to write both user and a record in an “email->user” mapping table, with a condition that both are new. If any conflict, the transaction fails, meaning that email is taken. This approach uses keys (the email as a key in a separate table or the same table if you can but probably separate to not conflict with user keys). It’s a common workaround for unique secondary attributes.
Workload Isolation and Multi-Tenancy Keys
If you have multi-tenant data, one often uses the tenant identifier as part of the key (so data naturally partitions by tenant). This can also be used to isolate hot tenants: e.g., if one tenant is extremely active, their partition keys will be hotter. Adaptive capacity might isolate them, but also consider maybe splitting that tenant’s data further (like adding sub-partitioning by user or category under that tenant’s key). Alternatively, sometimes large tenants are given their own table entirely (especially if you want to separate their throughput provisioning or for compliance). That’s more of an architecture choice than key design, but key design can support multi-tenancy well by embedding tenant in the keys. Also, multi-tenancy relates to security which we cover next: you can leverage IAM and keys to ensure one tenant can’t access another’s data.
These advanced techniques illustrate how flexible DynamoDB’s key model can be. You can model complex relationships, manage evolving data, and maintain performance by thinking creatively about how to use keys and item layouts. Always test these patterns with your access patterns to ensure they behave as expected (e.g., no hidden performance issues). Documentation and community solutions often provide blueprints for these scenarios, and we’ve cited a few where applicable.
Now we’ll discuss data integrity and consistency in DynamoDB, including using transactions and keys to avoid conflicts and ensure consistency where needed.
9. Data Consistency, Transactions, and Conflict Resolution
DynamoDB is eventually consistent by default for reads, and it’s designed for high concurrency without locks by using optimistic methods. However, when building applications, you often need to ensure certain operations are atomic, maintain consistency of related items, or resolve conflicting updates. Primary key design can influence how easy (or hard) it is to achieve these goals.
Consistency Levels and Key Design
-
Eventually Consistent Reads: By default, a
GetItem
orQuery
may return stale data (up to a second old, typically) because it might be served from a replica. If you absolutely require the latest data on read, you must specifyConsistentRead=true
(strongly consistent) on those operations. Strong reads are only available within the same AWS region (DynamoDB can’t do cross-region strong consistency). If you design your keys such that a user or process usually reads its own writes (which is common), eventually consistency might be fine because the propagation is usually very fast. However, if a user writes and then immediately reads, to be safe you may do a consistent read (at double cost) or implement a short delay/retry if an expected update isn’t visible yet. -
Isolation of Items: Each item (identified by primary key) is an independent unit for reads/writes. DynamoDB promises that writes to distinct keys are isolated (no locking between items). This means if you want to maintain some invariants across multiple items (e.g., two items should not both exist, or sums of some values), you either do it transactionally or in a way that doesn’t require strict isolation. In general, if possible, design so that each item can be updated independently without violating invariants. For example, rather than splitting a single logical record into 5 items that must all be updated together, see if some parts can be eventually consistent or derived. If not, DynamoDB transactions come to the rescue.
Transactional Operations (ACID)
DynamoDB supports ACID transactions via the TransactWriteItems
and TransactGetItems
API. With these, you can group up to 25 actions (across one or multiple tables) that either all succeed or all fail together. They also provide features like check conditions. How this ties to keys:
-
You can use transactions to update multiple items that share keys (or not) and have it atomic. For example, moving an item from one partition to another could be done with a Put in new key and Delete old key in a transaction to ensure no duplicate or loss.
-
Primary Key Choice and Transactions: If you frequently find you need to update many items together, consider grouping them under the same partition key if possible, because often they represent one entity (which sometimes you could have modeled as one item, but maybe you didn’t due to size limits or design). However, note that it’s not required to share partition keys for transaction; DynamoDB transactions can span arbitrary items across keys and tables. They use an underlying two-phase commit and it’s all handled for you (with proper isolation during commit).
-
Condition Expressions for Conflict Avoidance: You can include conditions in transactions or even normal Put/Update to avoid clobbering changes. For instance, when doing a conditional update (e.g., “update this item only if Version attribute = X”), that ensures if someone else updated it first (bumping the version), your update will fail rather than overwrite. This is optimistic concurrency control – it doesn’t prevent simultaneous writes, but it detects the conflict and one writer will fail, and you can retry or handle accordingly. To use this, your items usually have a version or timestamp attribute that you check. Alternatively, you might check something like “if attribute not exists” to ensure you don’t overwrite an existing item inadvertently (like for unique create).
-
Idempotency: Designing idempotent operations can reduce conflicts. For example, if you have a sort key that’s a unique request ID or timestamp, resubmitting the same operation can be detected as a duplicate (because the item already exists). In contrast, if you use an update in place, a retry might double-apply. So sometimes using primary keys to encode operation identity helps to naturally avoid duplicates. An example: PK = user, SK = "PAYMENT#" – if a client retries creating payment, it will try same PaymentId, and you can detect it’s already there (or condition put if not exists to only create once). This way you don’t accidentally charge twice.
-
State Aggregation vs Eventual Consistency: If you maintain counters or aggregations, decide if strong consistency is needed. DynamoDB doesn’t have an increment that locks the row (it has an atomic Update with ADD which is atomic on that item, but if two updates come at same time Dynamo ensures both apply – behind scenes they may use optimistic techniques but from user view it’s atomic per item). So incrementing an attribute is safe on a single item even with concurrency (DynamoDB will serialize them on that item). But if you maintain a counter in one item that depends on many other items, consider instead maintaining as separate or computing on query. Alternatively, use transactions to increment two places at once if needed (like increment item count and also add an entry to a list – ensure they both happen or none).
Conflict Resolution and Last-Writer-Wins
In a single-region setup, conflicts typically mean two writes to the same item (same primary key) around the same time. DynamoDB will apply them in some order (the one that arrives later will override attributes accordingly). If you use versioning, the second one could be rejected via condition if you expected a version. If you do nothing, “last write wins” at attribute level (basically the final state is as written by last writer, with no automatic merge of attribute values, except list append or numeric add if you used those update operations explicitly).
In multi-region Global Tables, conflict resolution is important: since two different regions might update the same item simultaneously, DynamoDB uses a timestamp attribute (internal) to decide which one wins – essentially last write wins (based on last modification time). This means without precaution you could lose an update. To handle this:
- If possible, avoid updating the same item from multiple regions at the same time. Some architectures designate a “leader” region for certain data or they partition keys by region to minimize conflicts.
- If not, and your use-case demands merging, you’d have to implement custom conflict resolution. For example, store a history of changes or use DynamoDB Streams to detect when an item version was overwritten by replication and then decide how to merge. Or design your keys such that concurrent updates go to different sort keys (so not actually conflicting). For instance, collaborative editing might store changes as new entries rather than updating one item.
Using Keys for Conflict Avoidance: One clever strategy is to incorporate something unique about the writer into the sort key or a separate item. E.g., in a shopping cart, two clients adding items at the same time – if you use item IDs as sort key, they’ll just add two different items (no conflict). Conflict arises if they both try to update the same item (like both increment quantity of product X). Using conditional update with version solves that – one will fail and you can retry by re-reading the new quantity and adding to it. Or design to allow parallel additions (like multiple lines per user, which you later consolidate if needed).
Transactions for Cross-Item Consistency: Use TransactWriteItems
to enforce invariants that involve multiple items. E.g., transferring points from UserA to UserB: you need to decrement A’s balance and increment B’s balance in one go. A transaction can do that (two updates with a condition that A’s balance >= amount to transfer). This ensures you don’t end up decrementing without incrementing or vice versa.
Item-Level Access Atomicity: Each item update (without transaction) is atomic (all or nothing for that item). But if you have a scenario where two attributes across items must change together, they either must be in one item or use a transaction. Key design can sometimes merge what would be two items into one if atomicity is crucial and item size allows. For instance, if you have a stats summary and a detail record separate but they must be in sync always, consider storing summary info with detail as single item (if it’s logically fine and size is okay) so that a single Update atomically updates both pieces.
Key-Based Idempotency and Workflows
When designing systems with DynamoDB, you can use primary keys to implement idempotent or exactly-once processing:
- For example, when processing a stream or an event, you can use a DynamoDB item as a deduplication log by using some unique event ID as primary key. If an event is processed, insert an item with that ID. If you see the item already exists on a new attempt, you know it’s a duplicate and skip. The efficient key lookup makes this feasible. This uses DynamoDB as a concurrency-safe dedup store (since PutItem with condition "not exists" will succeed only once).
- For ensuring one-time actions, keys that incorporate a unique action ID are handy. Many AWS services (like Step Functions or SQS with dedup) do similar things under the hood with tokens. In DynamoDB, you could embed these in keys.
In summary, DynamoDB provides tools (conditionally write, transactions) to ensure consistency, but your data model can either simplify or complicate their usage. A well-designed key schema often minimizes the need for complex transactions by grouping related data, but when needed, transactions can maintain integrity across items. Using condition expressions with primary keys (like exists or not exists) is a simple way to enforce uniqueness and sequence.
One more note: if you require strong referential integrity (like a foreign key constraint – cannot have Order item without Customer item existing), you would implement that in your application logic or with a transaction that checks existence of one before inserting another (TransactWrite with a ConditionCheck on customer item existence and a Put for order). Designing keys won’t automatically enforce referential integrity, but you can co-locate data (same partition) to at least make it easier to transactionally operate (since it’s in one partition, though Dynamo doesn’t require that for transactions).
Now, we will touch on security aspects like access control and auditing, which also relate to how we choose keys, particularly partition keys for multi-tenant scenarios.
10. Security: Key-Based Access Control, Encryption, and Auditing
Security in DynamoDB operates at the table or item level primarily through IAM policies and encryption settings. Your primary key design can facilitate robust security by aligning with access control requirements and by avoiding exposure of sensitive data.
Fine-Grained Access Control (FGAC) via IAM
DynamoDB can integrate with IAM to enforce item-level permissions. This is done using IAM policy conditions on the DynamoDB API calls, notably using the dynamodb:LeadingKeys
condition key. This condition allows you to restrict access such that a user (or role) can only perform operations on items where the partition key begins with a certain value (usually their user ID or tenant ID). For example, if your table’s partition key is UserId
, you can set an IAM policy for user Alice that allows dynamodb:GetItem
only if dynamodb:LeadingKeys=["Alice"]
. This means Alice can only fetch items where the partition key is "Alice". Similarly, for queries, you might restrict so that any Query’s partition key condition must equal the user’s ID.
To leverage this, your partition key must correspond to a security boundary (like a user or tenant identifier). This is a strong argument for including tenant or user IDs in your primary key for multi-tenant tables. For a multi-tenant app, if PK = TenantID and SK = something, you can allow each tenant’s role to only access PK = their TenantID. AWS Cognito and web identity federation often use this model: for example, set dynamodb:LeadingKeys = ${cognito-identity.amazonaws.com:sub}
(the user’s unique ID) in the policy, and design table PK = that user’s sub ID. The DynamoDB Developer Guide has examples of such policies, where a mobile app user is only allowed to access their own records by matching the partition key to their identity.
Design Tip: If you foresee the need for item-level authorization, design your partition key to include the principal’s ID that will be used in policy. This usually means you wouldn’t have a table partitioned by some other attribute that isn’t tied to user identity, because then you can’t restrict by user easily. For instance, if you partitioned data by product category, you can’t easily restrict one user to only their items, since their items span categories. Much cleaner is partition by user/tenant, then secondary key for category or other grouping within that.
One limitation: dynamodb:LeadingKeys
only applies to the partition key prefix (leading portion). If you have a composite key that starts with tenant ID or user ID, that works (the prefix can be the entire PK or just the beginning if you have a complex PK string). If your key is two parts (like Tenant and something) – since you supply them separately in API, IAM can ensure the given TenantID in the request equals the allowed value. In essence, plan to use something like Condition: "ForAllValues:StringEquals": { "dynamodb:LeadingKeys": ["TenantA"] }
to lock a role to TenantA’s items.
Example: A table Messages
with PK = UserId and SK = MessageId. We can give each user an IAM policy (via Cognito roles, etc.) that only allows DynamoDB actions on items where UserId
= their own. That means even if they try to query someone else’s messages, the request is denied at the authorization level. This is powerful — it pushes enforcement to AWS, reducing the chance of a bug in application code leaking data across tenants.
Note: FGAC applies to any DynamoDB operation including Query/Scan. For Scan, you can’t enforce per-item conditions as easily (scans by design read everything unless restricted). That’s another reason to avoid Scans in multi-tenant— use Query with keys so conditions can apply.
Encryption at Rest and in Transit
All DynamoDB data at rest is encrypted by default with AWS-owned CMK (since 2018). You can choose a customer-managed CMK if needed for compliance. This is transparent and doesn’t affect how you design keys, but one consideration: Don’t include sensitive information in keys if possible. Even though data is encrypted at rest, primary key values might appear in CloudTrail logs, metrics, or be used in URLs. For instance, if you enable CloudTrail, the API calls including the key of the item accessed are logged in plaintext in audit logs. If your partition key is something sensitive like a Social Security Number or personal email, that data would show up wherever logs of access are stored (even though the data in DynamoDB itself is encrypted). Also, DynamoDB Streams will carry the entire item image including keys (still within AWS environment, but worth noting). So a good practice: use surrogate keys or tokens for sensitive data. For example, instead of using an email address directly as a partition key, you might hash it or use a user ID. Or if you need to use a natural key like SSN, be aware of the exposure (and perhaps disable CloudTrail logging of those API calls or use encryption on CloudTrail storage).
Encryption in Transit: DynamoDB endpoints require SSL/TLS by default when using HTTPS, so data in transit to/from Dynamo is encrypted. There’s no special design needed for this.
Auditing and Logging
CloudTrail: AWS CloudTrail can log all DynamoDB API calls, which can be used for auditing access (who accessed what item keys, etc.). By using meaningful primary keys (like user IDs), these logs can help track if someone tried to access someone else’s data. However, CloudTrail logs at the API call level – if a Query returns 100 items, the log will show the key of the query (like partition key used), not necessarily all the returned keys (though in a multi-tenant scenario you’d structure queries to only get your own data anyway). CloudTrail will include the Table name, the key(s) used in the request, conditions, etc. So if a malicious actor tries to scan the table or query a key they shouldn’t, CloudTrail can catch that (and IAM ideally prevents it).
Streams for Auditing: If you need to maintain an audit trail of data changes, DynamoDB Streams provides an ordered log of item modifications. You can process the stream and store changes in an audit log (like a secure S3, or a separate Dynamo table that acts as audit store). This ensures even deletes are recorded (the stream can give you the deleted item image if you set it to KEYS_ONLY or OLD_IMAGE). Many use streams to send data to a centralized log (or to services like Elasticsearch for a temporal record).
Key-based Activity Monitoring: By analyzing usage patterns keyed by partition, you can detect anomalies. For example, if you see frequent queries for many different partition keys from one user, is that expected? If each user normally only queries their own PK, multiple distinct PKs access might indicate an attempt to scrape data. Tools like GuardDuty might integrate some of this detection. But designing keys per user simplifies monitoring because anything outside the norm stands out.
Item Attribute for Ownership: Another design tip: even if your partition key is tenant or user, some add an “Owner” attribute redundantly in the item for clarity or additional checks. This isn’t strictly needed if key is clear, but for instance, storing an Access Control List or owner field might be used by your app logic to double-check on certain operations.
Key Design and Multi-Factor Security
Sometimes, you may design keys with security in mind. One minor example: if you want to avoid “enumeration” of IDs (someone guessing IDs), using UUIDs or random strings as keys is better than sequential IDs. If your application exposed some key (like an order number) that’s guessable, a malicious user might try to fetch others by guessing. If keys are random (hard to guess) or scoped by user (so guessing someone else’s key is useless because of IAM), that risk is mitigated.
Access Patterns and Principle of Least Privilege: Determine what level of access each component of your system needs. Many times you might create separate tables for things because they have very different access patterns and you can then easily restrict one role to one table. If combining in a single table, you may rely on item-level permissions with conditions to separate them logically. E.g., storing both user personal data and some public data in one table might complicate access rules; separate tables could be simpler for security boundaries.
Encryption of Sensitive Attributes: If you have fields that are sensitive (like PII, API keys, etc.), and you worry about them being plaintext in Dynamo (even though it’s encrypted at rest, you might worry about a bug printing them or them going to logs), you can encrypt those at application layer. DynamoDB Encryption Client is an AWS library that encrypts item attributes on the client side before sending to Dynamo, using a KMS key you provide. This ensures that even if an attacker got access to DynamoDB data, those attributes are gibberish without the key. The tradeoff is you can’t query on encrypted data (since it’s random ciphertext) – so only do that for fields you never need to filter or key on. Primary keys cannot be client-side encrypted if you need to use them to query (because queries would require the plaintext value to match exactly). So typically you don’t encrypt keys; instead, you design keys to not directly contain sensitive info as mentioned.
Auditing Example
For illustration, consider a financial application where every transaction record is stored with PK = AccountID, SK = TransactionID. Security wise:
- Each user (account owner) should only see their transactions: Use IAM FGAC with
dynamodb:LeadingKeys = AccountID
for their role. - All access is logged in CloudTrail: you’ll have a record if someone attempted access to another’s AccountID.
- The table is encrypted at rest with a customer-managed CMK so that if someone somehow got a disk snapshot they can’t read it without key.
- You decide that transaction amounts are sensitive but need to be queried by range (so you can’t encrypt those and still range query). But maybe a field like "Notes" (freeform text) could be encrypted client-side to keep it confidential.
- Use DynamoDB Streams to send all transactions to an audit logging system (maybe a Lambda writes them to an immutable ledger or to S3 for archive). The partition key (AccountID) ensures each account’s events are grouped in the stream for ordering per account, which is nice for replay.
One more security note: Key usage in URLs. If your API endpoints include DynamoDB keys, be careful not to expose patterns that leak info. E.g., GET /users/{userid}/orders/{orderid}
– if those IDs are guessable, that’s a risk. If they are UUIDs or the auth token ensures you can only access your own (mapping to that user id via auth), then it’s okay.
In summary, align your key design with security boundaries. Use user/tenant identifiers as partition keys to leverage IAM item-level control. Keep sensitive data out of keys to avoid inadvertent exposure in logs. Leverage encryption for sensitive attributes not needed in keys. Audit everything with CloudTrail and Streams. This ensures the powerful performance of DynamoDB doesn’t come at the cost of data exposure or breaches, even in multi-tenant or high-security contexts.
11. Real-World Case Studies and Key Design Examples
To solidify these concepts, let’s explore how several real-world systems from different domains have applied DynamoDB primary key design, highlighting successes and pitfalls:
Case Study 1: E-Commerce Order Management (Retail Domain)
Scenario: An e-commerce platform needs to store Customers, Orders, Products, and Inventory in DynamoDB. The access patterns include: get customer profile by ID, list orders by customer (sorted by date), get order by order number, list products by category, check inventory by product, etc.
Design: They choose a single-table design for operational data. The primary key is a composite that encodes entity type and ID. For instance, Partition Key could be Customer#<CustID>
for customer items, Order#<OrderID>
for orders, Product#<ProdID>
for products. They store different types in one table but will use GSIs to support queries by alternate keys:
-
Main Table PK/SK: They actually opted for a slight variation: PK = CustomerID for customers and their orders (to group orders under customer), and separate PKs for items that aren’t naturally under a customer (like products). How? They set PK = CustomerID for both customer item (with SK = "PROFILE") and order items (with SK = "ORDER#OrderID"). Products and inventory, however, don’t belong to a customer, so they gave them their own PK like
Product#<ProdID>
such that they don’t collide with numeric customer IDs (or they put them in a different table – either approach works).In Amazon’s internal example, they often store everything in one table with an artificial top-level grouping, but let’s say here they actually used two tables for clarity:
CustomersOrders
table andProductsInventory
table. This separation was perhaps for different access control or scaling reasons (customer data might scale with number of users, product data with number of SKUs, different usage patterns). -
Orders Access: To get orders by customer, they simply query
CustomersOrders
table with PK = CustID, SK begins_with "ORDER#". This returns all orders of that customer sorted by OrderID (which often has time encoded, e.g., "ORDER#20230515-1234"). If OrderID isn’t time-ordered, they might use OrderDate as part of sort key or an LSI on OrderDate for per-customer ordering. -
Order by ID: If a service needs to fetch an order by its global OrderID and you don’t know customer (maybe an admin looks up by order number), they used a GSI: GSI1 with partition key = OrderID, sort key = CustomerID (or no sort key). When an order item is written to main table, it has attributes OrderID and CustomerID, and they project needed fields. Now a
Query
on GSI1 with OrderID gives the order (and tells which customer). -
Products Access: In
ProductsInventory
table, primary key is ProductID (as partition) and maybe a sort key for warehouse or variant. If inventory is separate items (each location’s stock is an item), PK = ProductID, SK = LocationID. That way, to get stock of a product, you query by ProductID and get all location quantities. They also want to list all products in a category: they create GSI2 on Category as partition key, with ProductName or ID as sort. NowQuery GSI2 where Category = X
yields all products in that category. -
Performance: This design handles growth: customers partition data by customer ID (billions of customers -> billions of partitions possible). Order writes/read are mostly per customer, avoiding hot keys (unless one super-customer like Amazon has millions of orders, but then that might still be fine or they could further shard).
-
Cost & Simplicity: By using a single table for customers+orders, they achieved transactional consistency easily via transactions for certain ops (like deleting a customer and all their orders in one TransactWrite with condition checks). They had to carefully manage GSIs: the Order GSI on OrderID had a low cardinality partition key (OrderID is unique per order, so it’s fine). The Category GSI potentially has uneven keys if some category extremely large. They planned for that by maybe adding a prefix to category or splitting by subcategory if needed, or ensuring read throughput on that index is high enough.
Result: The platform scaled to millions of users and orders seamlessly, delivering single-digit ms queries for customer order history and product catalog queries. On Prime Day scale (to imagine a scenario), a system like Amazon’s order pipeline runs on DynamoDB and they reported massive scale: Amazon’s internal “Herd” workflow system (backing order processing) moved to DynamoDB and could avoid using ~1000 Oracle instances for Prime Day scaling. By having such keys (customer, order) they spread workload so much that the system handles peak loads without manual sharding.
Common pitfalls avoided: They did not use something like PK = "Orders" and SK = OrderID globally – that would put all orders in one partition (bad). Instead, partition by customer (lots of them). They also didn’t over-index every attribute, just the key ones (OrderID and Category). They learned that scanning by say price range without an index is not feasible – they considered adding GSI on Price for products if needed for a “filter by price” feature, but decided not to as that’s better done in memory or an ElasticSearch.
Case Study 2: Gaming Leaderboards (Gaming Domain)
Scenario: A mobile game tracks high scores and achievements for players worldwide. They need real-time leaderboards per game level and per region, and allow players to query their rank.
Design: DynamoDB is a great fit as demonstrated by many mobile games (like “Capcom, PennyPop, and Electronic Arts used DynamoDB to scale gaming applications”). A common approach:
- Table:
Scores
with composite key. PK =GameLevel
(or some composite of game and level, e.g., "Level1", "Level2", etc.), SK = Score (as a number, sorted descending maybe). They also include PlayerID and other data as attributes. - To get the top N scores of a level: Query PK="Level1" with ScanIndexForward=false, Limit N – returns top N because sort is by score descending.
- But to allow multiple players to have same score and to make the combination unique, SK might actually be Score#PlayerID (score as leading part, player as tiebreaker). Possibly store score as zero-padded or inverted number for correct ordering if numeric.
- To get a player’s rank, they need to find how many scores are above. Without a direct index, one way is to Query for that level from top until that player is found (could be inefficient if they are far down). Alternatively, maintain a mapping table or use a GSI keyed by PlayerID to find their score then do a calculation (not trivial without scanning). Some games periodically compute ranks offline and store as attribute on player.
- Some games use a GSI with partition key = PlayerID to easily get all scores across levels for a player (profile view). And maybe sort by level or score.
- Hot partition issue: If one game level is extremely popular (everyone plays level1), its partition "Level1" will be hit a lot. That could be a hot partition. They might incorporate something like region or use sharding for the level key: e.g., PK = Level#Region. That spreads players of a level across regions logically. Or if truly global, maybe even break by first letter of player name: PK = Level#Bucket where bucket is 'A'-'Z' based on player name first letter, to break the partition by 26 keys. This sacrifices an exact sorted list of all players (you’d have one per bucket) but if needed you can merge on client sort. The trade-off is to avoid throttle at extreme scale.
Result: Games like this achieve massive scale during peak usage (launch of a new game or daily peaks). DynamoDB handles high write rates (every game action maybe posting a score event). A cited benefit from similar use-case: DynamoDB scaled to 1 million writes per second for a gaming app (FanFight) while cutting their costs 50% after migrating from another solution. Their design likely used keys like MatchID
or LeagueID
as partitions to separate scoreboards, and players as sort keys or part of it.
They avoided pitfalls such as using a global scoreboard key (which would be one hot partition). Instead, keys include at least some high-cardinality component (game, match, region, etc.). They also perhaps used conditionally update if score is higher logic: e.g., update the item only if new score > old, to ensure each player’s best score is kept (ensuring not to lower a score by accident).
Case Study 3: IoT Time-Series (Internet of Things Domain)
Scenario: A company has thousands of IoT sensors globally, each emitting data every second. They need to store sensor readings and query recent data by device, and occasionally run analytics on historical data (likely via exporting to big data systems). They also must avoid hot spots as data tends to come in simultaneously from many devices.
Design:
- Table:
SensorData
with PK = DeviceID (each device’s data in its own partition), SK = Timestamp (ISO8601 string or epoch). This straightforward model groups data by device and sorted by time, ideal for querying “last hour of data for device X” (Query PK=X between start and end time
). - This yields a naturally distributed load if there are many devices – each device writes to its own partition key, so writes are distributed. If one device sends far more data, that one partition could get hot, but typically IoT devices are similar rate.
- Adaptive capacity: if one device does spike, adaptive capacity will temporarily give it more share or even split that partition if needed. But the cardinality of DeviceID is high (thousands), so likely it’s okay.
- Issues & Strategies: Over time, one device’s partition accumulates potentially millions of items (could hit item collection 10GB limit if LSIs were in use, but if none, it’s just a large partition). That might slow queries for far history. The team decides to roll older data to a cold table or S3 periodically. E.g., after 30 days, they export those items to S3 (maybe via Time to Live attribute). This keeps partitions size moderate.
- They consider using time-based partition keys: e.g., PK = DeviceID#YYYYMM (device plus month). This means each month a new partition per device. That way no single partition grows unbounded. It complicates querying across month boundaries (need to query multiple PKs if crossing month). They weighed this and perhaps opted not to complicate until needed because adaptive capacity can handle a large partition as long as traffic at one time doesn’t exceed a single partition’s throughput.
- They wanted to query latest data for all devices (like a dashboard snapshot). Scanning the whole table is too slow. Instead, they maintain a separate table or GSI that keeps just the latest reading per device (write an item with PK="LATEST" and sort=DeviceID or vice versa, update on each device update). That “last readings” table is always just one item per device. Query it (or scan it) to get all current states is manageable (thousands of items only).
- Performance: With this design, they achieved ingestion of say 100k writes/sec uniformly across devices easily. Read of one device’s timeline is quick due to partition key. Querying a subset (like all devices in one location) needed maybe a GSI or different table though – they might have a GSI with PK = Location and SK = DeviceID to map devices by location, then they can take those device IDs and query their data. Or replicate some data to a TSDB for multi-device queries.
Result: This pattern is widely used for IoT. Amazon’s internal services (like AWS IoT) often use DynamoDB under the hood for device registry or last state. For time-series, many use DynamoDB as a buffer and then archive to cheaper storage. The key design (device as PK, timestamp as SK) is a textbook approach for time-series on Dynamo. Pitfall avoided: using time as partition key (which would lump all devices at a given time into one key – very bad distribution). Instead, deviceID was the partition (high cardinality and uniform access, as each device writes its own data at its own pace). Also, by potentially partitioning by month, they ensure a single partition doesn’t get too huge or exceed any limits.
Case Study 4: Financial Transactions Ledger (Finance Domain)
Scenario: A digital wallet app uses DynamoDB to store transactions and balances. They require strong consistency for balance updates and an audit trail for transactions (cannot lose or double apply any transaction). High volume of small transactions.
Design:
- They use two tables: one for balances and one for transactions.
- Balances Table: PK = AccountID (one per user), attributes: current balance, last txn ID applied, etc. They update this with conditional writes or transactions to ensure correctness (optimistic locking or using
TransactWriteItems
). - Transactions Table: PK = AccountID, SK = TransactionTimestamp (or TxnID). They insert every transaction here. If money moves between accounts, they might insert one record in each account’s partition (one negative, one positive). They ensure consistency by using transactions that insert both records (for sender and receiver) and update both balances atomically. Dynamo’s transaction spanning two items (maybe two different partitions) handles this, which is a big win (otherwise in an RDBMS would need distributed tx if accounts on different shard).
- They also have a GSI maybe to query all transactions by date globally for compliance (PK=TransactionDate, SK=AccountID) to quickly find all transactions on a given day across accounts (sparse if needed).
- Key aspects: AccountID as partition means each user’s transactions are together – easy to get history and easy to ensure serial processing per account. They might even leverage ConditionExpression on balance updates: e.g., "set balance = balance + :amt if balance = :oldBalance and :amt >= 0" to avoid negative overdraft or double spends. Actually, they likely use a transaction: TransactWrite with a ConditionCheck that balance >= amt (for sender) and Update for sender (subtract) and Update for receiver (add) plus Put items in transactions table for both.
- Performance: Each account is mostly independent so it scales with number of accounts. One account doing many transactions in a short time (like a high-frequency trading bot) could saturate that partition; but Dynamo can handle a couple thousand writes/sec on one key, beyond which it splits that partition (but splitting doesn’t help if all writes still go to same key, since one key cannot span partitions unless using sharding trick). If needed, they could break extremely active accounts by currency or sub-account as separate keys. Usually not needed unless one account sees enormous TPS (which might be the case for like a central bank account).
- Auditing: All transactions are stored, DynamoDB Streams also capture them to forward to a central ledger system or backup. The keys (account, timestamp) ensure easy retrieval per account and natural time order.
Real example: A fintech called PayPay in Japan uses DynamoDB for a mobile payment app serving 30M users, delivering 300 million in-app messages per day reliably. For transactions, similar scale can be achieved. The atomic multi-item transaction feature was crucial when it was introduced to get these use-cases on Dynamo (previously many did eventual consistency or tried complex schemes). Now, with transactions, DynamoDB can satisfy ACID for those specific workflows. One pitfall they watched: not to have a “global sequence” or something that becomes a single point (like an item “system-wide ledger counter”). Everything is keyed by account or logical partition to distribute load.
Case Study 5: Social Networking Feed (Social Media Domain)
Scenario: A social network stores user profiles, posts, and follows. Needs to show each user a timeline of posts from people they follow, efficiently.
Design Choices: Social graphs are tricky in NoSQL. One approach:
-
User Table: PK = UserID for profile info.
-
Posts Table: PK = UserID, SK = PostTime (descending perhaps). This stores each user’s own posts.
-
To get someone’s personal posts, easy Query on their PK.
-
The challenge: constructing a feed of all posts from multiple followed users sorted by time. Two approaches:
- Fan-out on write: Whenever a user posts, write that post item into not only their own partition, but also into all of their followers’ “feed” partitions. E.g., have a table
Feeds
PK = FollowerID, SK = PostTime#AuthorID, and you insert a pointer to the new post. Then each user’s feed is readily assembled by Query on their ID. This approach writes potentially many items per post (if user has many followers). Big names like Facebook did similar patterns externally. Dynamo can handle it if scaled and maybe with background processing for heavy fanouts. - Fan-in on read: When a user requests their feed, do parallel queries to the Posts table for each person they follow (limit some items each) and merge results in application. If a user follows 1000 people, that’s 1000 queries – not great. You can batch or cache, but it’s heavy.
- Fan-out on write: Whenever a user posts, write that post item into not only their own partition, but also into all of their followers’ “feed” partitions. E.g., have a table
They likely opt for a hybrid: popular users (with millions of followers) might have fan-out done via a background job to not overwhelm writes, or use selective fan-out (followers who are online). For others, fan-out on write directly since each post might just be replicated to say 50 followers’ partitions, which Dynamo can do as 50 writes (possibly as a batch for efficiency). The feed read then is a single partition query (cheap).
- Follow relationship: They store a mapping table with PK=UserID, SK=FollowerID for each follower or vice versa. That allows queries like "who do I follow" or "who follows me" easily (though "who follows me" could be millions – might not often be fetched fully).
- Indexes: Could have GSI on follower to invert that if needed. Also maybe GSI for searching users by name prefix etc.
Result: A system like this could support massive scale. For example, previously, Amazon’s own teams used DynamoDB for the Twitch chat system (lots of fan-out with messages, slight variant). Also, implementations like Reddit’s or Instagram’s feed could conceptually map to this. A common pitfall in this area: doing uncontrolled fan-out that overwhelms throughput. They must monitor and possibly throttle how many writes a single post triggers and perhaps break it into batches with pauses to not hog capacity. Dynamo’s ability to horizontally scale means if you have provisioned capacity high enough, it can do it, but cost might blow up if one user with 10M followers posts frequently.
Lessons/Pitfalls: They discovered that storing feed as a single item (with an array of postIDs) was bad because it continually grows and is a hotspot when all followers try to update it. So they correctly stored each feed entry as an item (PK follower, SK time). They also made sure to use sparse indexes where needed (e.g., maybe an index for “posts with hashtags” to support a hashtag search, but only posts that have that hashtag appear there, which might be a GSI with PK=Hashtag, SK=PostTime, and only posts with that tag have the attribute).
Summary of Domain Outcomes:
- Retail/E-commerce: Achieved scale on order processing using single-table design and GSIs. Saw huge reduction in infrastructure needed (as Amazon noted, DynamoDB allowed them to avoid massive fleets of relational DBs for peak). Common pitfalls avoided include hot keys by always partitioning by high-card fields like customer or order ID.
- Gaming: DynamoDB proved capable of scaling to millions of ops/sec. For instance, the case of FanFight: 1M writes/sec, 50% cost reduction, 4x revenue increase due to scalability. Keys were designed to avoid any single partition bottlenecks (game+region as keys, etc).
- IoT: Companies like Netflix (for some monitoring systems) and other IoT platforms use similar time-series patterns. They can handle continuous ingestion and quick per-device queries. Pitfall: if design had partition key = “SensorType”, e.g., all temperature sensors in one partition, it would fail; they wisely choose device or a compound with device to distribute.
- Finance: Firms like Capital One have talked about using DynamoDB for extreme transactions per second after moving away from mainframes. They needed careful use of transactions and keys to maintain integrity. PayPay’s example shows confidence in Dynamo for high-volume, reliable transactions (the mention of 300M messages daily implies stable high throughput).
- Social Media: DynamoDB was used in Amazon’s own social features (e.g., Amazon Follow, etc.) and others. It’s less public how, but Reddit’s devs have discussed using Cassandra (similar modeling needed). Designing for high fan-out is tricky, but Dynamo’s elastic throughput can handle big spikes if properly provisioned.
Common Pitfalls Observed and Solved in These Cases:
- Hot Keys: The recurring theme: avoid them! The solutions: partition by user, device, etc. or add randomization. For example, a Reddit-like system might partition comments by parent but if one post gets huge, that partition can be hot; they might randomize SK prefixes or break into “shards”.
- Large Items/Collections: Solutions included splitting into multiple items (e.g., one item per list element vs one item with huge list).
- Too Many Indexes: Some initial designs might add a GSI for every queryable field. This inflated costs. They learned to optimize by limiting indexes to truly necessary and sometimes using a composite key or filter for less frequent queries to avoid an index.
- Single-Table Complexity: Some teams struggled with the mental model (learning curve mentioned by Alex DeBrie). But those who mastered it (like the Amazon retail teams or certain game devs) reaped major performance gains. One has to invest in understanding access patterns deeply.
We have seen through these examples that DynamoDB’s primary key design, when done right, leads to remarkable performance at scale and low operational burden. When done wrong, it can cause throttling or uneven performance – but those are fixable by schema refactoring and using the patterns we discussed.
Finally, we conclude with a summary of best practices and a look at trends.
12. Best Practices and Emerging Trends
Drawing from the discussions above, here is a consolidated list of best practices for DynamoDB primary key design and usage:
-
Design for Access Patterns (“Query-First”): Always start by identifying how the data will be accessed (lookup by what fields, sorted by what, related items fetched together, etc.). Use this to choose your partition key and sort key. If you cannot satisfy an important access pattern with the primary key alone, plan for a GSI or data duplication to cover it. Avoid needing scans.
-
Choose High-Cardinality Partition Keys: Ensure the partition key has a very large number of possible values (and ideally, the workload touches many of them uniformly). Good candidates: user IDs, order IDs, device IDs, session IDs, etc. Bad candidates: boolean flags, single “global” keys, dates (alone), small categories. If one partition key might become a bottleneck, add a composite element or shard it.
-
Leverage Composite Keys: Use sort keys to model 1-to-many relationships and allow range queries (e.g., date ranges, prefixes). Encode multiple pieces of info in sort key (with delimiters) to maximize query flexibility (like
Type#ID
orYear#Month#ID
). This can eliminate the need for secondary indexes in some cases and keeps related data in the same item collection for efficient retrieval. -
Single-Table Design for Multiple Entities: Consider a single table for all related entities in your system to enable rich query patterns without JOINS. Use a “type” prefix in keys to distinguish item types (e.g.,
USER#123
,ORDER#12345
). Co-locate entities that are frequently accessed together under the same partition key when possible (like storing an order and its order items in one partition). This reduces cross-partition queries and exploits DynamoDB’s fast single-partition operations. -
Embrace Denormalization: Duplicate data where needed to avoid expensive multi-fetch operations. For example, store an item’s parent name or category in the item even if it’s stored elsewhere, if you often need that when fetching the item. Maintain consistency via conditional writes or transactions, or accept eventual consistency for that field if not critical.
-
Use Secondary Indexes Wisely: Create GSIs to support query patterns that the primary key cannot (alternate lookup keys). But avoid creating too many GSIs; each one adds overhead. Aim to reuse indexes for multiple query types if possible by using generic keys (overloading). Project only necessary attributes to indexes to save space. Use LSIs if you need a different sort key for the same partition (e.g., to sort user’s data by a different attribute), remembering the 10GB item collection limit.
-
Prevent Hot Partitions: Distribute writes and reads. If one partition key starts to dominate, consider write sharding (appending a random or hashed suffix). For example, instead of PK = “CHAT#room1”, use PKs = “CHAT#room1#shard1”…“shardN” and query all N in parallel when reading. Or break time into buckets. For reads, if one key is extremely hot (e.g., a popular celebrity’s feed), you might need to cache or replicate that data to reduce direct hits.
-
Small, Focused Items: Keep item size small as possible for efficiency. Don’t stuff an item with rarely used data; split it out. Avoid unbounded growth of item attributes (use multiple items instead when data grows). Use pagination for large lists (store as multiple items or pages).
-
Use Transactions for Complex Operations: If you need to update multiple items as a unit (maintaining invariants), use
TransactWriteItems
rather than trying to do two-phase commit manually. Design your keys such that the items can be identified and locked easily in the transaction. For instance, have all related items’ keys if you need to update them together (or know how to derive them to include in transaction request). Keep transactions small (limit to a few items) and avoid long-running patterns that might conflict often. -
Exploit Condition Expressions: For idempotency and concurrency, use conditions on writes (e.g., “only insert if not exists” for uniqueness, or “only update if version = X” for optimistic locking). This ties into key design by using something like a version number attribute or choosing unique keys for natural idempotency (like using a request ID as part of key so duplicate requests don’t create duplicates).
-
Plan for Data Lifecycles: If data accumulates, plan how to archive or remove it. Use TTL on items to expire them automatically if appropriate (e.g., session logs, temp data). Partition by date if you need to drop old partitions easily (like separate tables per year or partition key includes year-month). This keeps table slim for hot data and you can offload cold data to cheaper storage.
-
Monitor and Adapt: Continuously monitor CloudWatch metrics (ConsumedCapacity, Throttles, etc.). If certain keys are getting throttled, revisit your key design or use Adaptive Capacity as a hint (it might be splitting partitions, indicating a key is hot). Be ready to add an index or cache if a new query pattern emerges that isn’t well-supported by current schema.
-
Security-Aware Keys: Include tenant or user identifiers in keys to support IAM fine-grained access control. Do not embed secrets or sensitive personal data in keys (which could appear in logs). Use random or hashed keys to prevent guessability if necessary.
-
Testing at Scale: Before finalizing, simulate with large key spaces and access distributions. Tools like DynamoDB’s NoSQL Workbench can help visualize data model and access patterns. Ensure queries remain efficient as data grows (e.g., query that always does a full partition scan of thousands of items might be fine at start but not later; consider adding sort key conditions or indexes preemptively).
-
Document the Data Model: DynamoDB’s flexibility is a double-edged sword – it’s easy to forget how data is structured especially with single-table designs. Clearly document what each entity’s keys look like, what indexes exist and their keys, and what queries are supported. This prevents accidental misuse that could degrade performance (like someone unintentionally scanning because they didn’t realize a GSI existed for that attribute).
-
Stay Updated on New Features: AWS frequently releases enhancements. For instance, DynamoDB Accelerator (DAX) – an in-memory cache for Dynamo – can be added if read latency needs to drop further for certain patterns (with essentially the same key lookups). DynamoDB Global Tables allow multi-region active-active setups – if using these, you must factor in conflict resolution (last-writer-wins) and maybe include region in keys if you want to avoid conflicts. Also newer features like Standard-Infrequent Access table class can cut storage costs for less accessed data, so you might separate data into another table with that class rather than keeping it all in one.
-
Evaluate Single vs Multiple Table Approach: AWS now clarifies that single-table design is great when access patterns are well known and overlapping, but using multiple tables is fine if it makes sense (don’t cram unrelated data together). The key is to not make more tables than necessary for relational reasons – but separating by domain or usage pattern can simplify development. Use multiple tables when different parts of data have no shared access patterns and different lifecycle (e.g., config vs logs).
Emerging Trends:
- Integration with other services: Many architectures use DynamoDB streams to trigger Lambda functions (for reactive programming). We see an increasing trend of using streams + Lambda to maintain secondary indexes manually for complex stuff, or to do multi-table fan-out. The key design is still central, but now “event-driven” usage means your keys might also be chosen to make event processing easier (like including certain fields in keys so the stream consumer can quickly identify event type).
- NoSQL Modeling Tools: More tooling and ORMs (like Amazon’s NoSQL Workbench, or open-source mappers) are emerging to assist with DynamoDB modeling. These sometimes allow you to define an ER model and then simulate a single-table design. They follow the best practices under the hood, but as they evolve, developers might design at a higher level and rely on such tools to implement keys correctly.
- Adaptive Capacity improvements: AWS keeps improving how adaptive capacity mitigates hot keys. It's essentially making DynamoDB more forgiving to uneven access. Some now say you can worry a bit less about minor skews (as adaptive capacity will handle micro-hotspots). However, it still won’t fix a badly designed key pattern where one key handles most traffic. But the threshold for pain might increase (e.g., maybe now it can handle one key using 20% of traffic fine, whereas years ago that might throttle). Always good to stay updated via AWS blogs.
- Global Tables usage: More apps require multi-region high availability. DynamoDB Global Tables have matured. Key design remains the same, but conflict handling is an app concern if concurrent writes to same key in different regions. One trend is designing keys to be “region-scoped” or using strategies like each region writes to its own partition of data and then replicate. Another approach: attach a “region” or “source” marker to data to later reconcile. This is an advanced scenario but increasingly relevant as global apps adopt Dynamo for its multi-region story.
- Beyond Core DynamoDB: Some advanced patterns combine DynamoDB with other AWS services: e.g., using PartiQL (SQL-compatible query language for Dynamo) for ad-hoc queries – though PartiQL doesn’t change key design, it just provides another way to query (it still needs keys unless scanning). Also, Amazon is bridging analytics with Dynamo via services like Amazon Athena federation (query DynamoDB with SQL for analytical purposes). These don’t change OLTP design but provide relief for OLAP queries – meaning you don’t have to contort your keys to serve analytical queries, you can rely on export/athena for that.
In conclusion, Amazon DynamoDB requires thoughtful data modeling – your primary key and secondary indexes form the backbone of a scalable design. By following the best practices of uniform key distribution, modeling data to fit your queries, and leveraging features like GSIs, LSIs, and transactions, you can build applications that seamlessly scale from zero to millions of users with consistent low latency. Many real-world success stories – from e-commerce giants to mobile games – attest that investing in good key design upfront pays off massively in terms of performance, scalability, and even cost optimization. Keep learning from the community and AWS updates, as DynamoDB evolves and new patterns emerge, and you’ll be well-equipped to harness DynamoDB’s full potential in your own projects.
Sources: The guidance and examples above reference key insights from AWS’s DynamoDB documentation and expert blog posts, including best practice guides on partition keys, recommendations from DynamoDB creators on single-table design, as well as real-world case studies of DynamoDB usage at scale which illustrate these principles in action.