Scaling, Partitioning & Performance Quiz

Q1. EASY: In a streaming system, which metric best indicates that consumers are not keeping up with producers? A growing consumer lag (increasing backlog of unprocessed messages).
A high 99th percentile (p99) processing latency.
A high producer throughput (messages per second).
An increased number of partitions in the system.

Q2. EASY: What is one effect of using large batch sizes and compressing messages in a messaging system? Improved throughput and lower latency per message.
Lower network usage with no impact on latency.
Higher overall throughput but potentially increased end-to-end latency.
Reduced throughput and improved latency per message.

Q3. EASY: If one partition in a distributed system becomes a hot partition (receiving much more traffic than others), how can you mitigate this hotspot? Increase the partition's replication factor to spread load.
Reduce the total number of partitions to concentrate resources.
Route all traffic through a single broker to avoid imbalance.
Modify the partitioning strategy (e.g., add a random key prefix) to better distribute traffic.

Q4. EASY: In a messaging pipeline, what is the main purpose of implementing back-pressure? To maximize throughput by sending messages at the fastest rate possible.
To prevent producers from overwhelming slower consumers.
To replicate messages to multiple consumers simultaneously.
To randomly drop messages when load is high.

Q5. EASY: What is a typical trade-off when replicating data across multiple availability zones or regions? Improved fault tolerance and availability, but higher network latency.
Lower latency for all users due to multiple data copies.
Elimination of the need for data backups or durability measures.
Reduced data durability because of long-distance replication.

Q6. MEDIUM: Which partitioning strategy minimizes data movement when new partitions or nodes are added? Hashing keys by taking the key mod the current number of partitions.
Partitioning by sorted key ranges.
Using a single partition for all data.
Using a consistent hashing scheme for key distribution.

Q7. MEDIUM: In a system with consumer groups, what is the purpose of tracking an offset for each group? To evenly distribute messages across all partitions in the cluster.
To throttle producers from sending messages too quickly.
To keep track of the last consumed position in each partition for that group.
To measure the network delay between producers and consumers.

Q8. MEDIUM: In a replicated messaging cluster, how is a new leader chosen for a partition after its leader fails? All client applications vote to elect a new partition leader.
An in-sync follower replica is promoted to become the new leader.
The partition remains unavailable until the original leader restarts.
The broker with the longest uptime takes over as leader automatically.

Q9. MEDIUM: Why do many high-throughput systems use an append-only log on disk instead of random writes or only in-memory storage? In-memory storage ensures data will persist even after a server restart.
Sequential disk writes are faster than random writes and still provide durability unlike memory.
Random writes on modern SSDs are faster than sequential writes.
Using an append-only log eliminates the need for any memory caching.

Q10. MEDIUM: Compared to standard queues, what is a known limitation of Amazon SQS FIFO queues? They cannot preserve the ordering of messages.
They do not support sending messages in batches.
They have a much lower maximum throughput (around 300 messages per second by default).
They are not replicated across multiple availability zones.

Q11. MEDIUM: In Apache Kafka, what happens during a consumer group rebalance event? Consumers pause consuming messages while partition assignments are rearranged.
All committed offsets for the group are reset to the earliest position.
No new messages can be produced to the topic during the rebalance.
The replication factor of each partition is automatically increased.

Q12. HARD: In the context of high-throughput messaging, what does 'zero-copy' data transfer mean? Using specialized hardware so that message latency is virtually zero.
Moving data from disk to network without copying it through user-space memory.
Replicating messages to a backup node with zero CPU overhead.
Compressing messages so they occupy zero space until needed.

Q13. HARD: In a replicated log, when would a follower be removed from the In-Sync Replicas (ISR) set? When it falls too far behind the leader in keeping up with new messages.
When it acknowledges writes faster than the leader does.
When it is promoted to leader for a different partition.
When it runs a newer software version than the leader.

Q14. HARD: Which design choice helps a messaging system avoid back-pressure issues? Using an unbounded queue on the consumer side to buffer incoming messages.
Having producers push messages to consumers as quickly as possible.
Disabling acknowledgments to maximize producer throughput.
Letting consumers pull messages from the queue at their own pace.

Q15. HARD: Requiring acknowledgments from all in-sync replicas (acks=all) instead of only the leader (acks=1) will generally: Reduce durability of messages but significantly improve throughput.
Have no effect on overall throughput or message loss risk.
Increase reliability (less risk of data loss) but at the cost of some throughput.
Improve throughput by writing to all replicas in parallel.

system-design