Message Bus & Event-Driven Architecture
A comprehensive deck covering core message bus concepts, delivery guarantees, reliability patterns, distributed transactions, schema versioning, observability, pitfalls, and real-world implementations.
1. Why does a growing microservices system often abandon direct API calls in favor of a message bus?
Direct synchronous calls create tight coupling, cascading failures, and “integration spider-webs.” A message bus buffers work, decouples producers/consumers in time and space, and simplifies many-to-many interactions.
2. What are the four fundamental building blocks of a message bus?
• Bus & channels (topics/queues)
• Message envelopes (payload + metadata)
• Publishers (producers)
• Handlers (consumers/subscribers)
3. What key metadata often lives in a message envelope?
Unique ID, type/name, timestamp/origin, routing key, correlation ID, and schema/version headers.
4. One-to-one or one-to-many: which bus type is which?
• Command bus ⇒ one-to-one (single handler)
• Event bus ⇒ one-to-many (fan-out)
5. What happens if no handler exists for a command?
It’s typically considered an error—the sender expected an action to occur.
6. What happens if no handler exists for an event?
Nothing; the publisher is unaffected. Events are notifications of fact, not requests.
7. Which bus type generally carries stronger logical coupling and why?
Command bus, because the sender assumes a specific responsibility will be fulfilled.
8. What primary benefit does asynchronous messaging provide?
The producer proceeds immediately; failures or slowness downstream don’t block it.
9. Name two scenarios where synchronous messaging over a bus is useful.
- In-process mediator patterns (method-call replacement)
- Request-reply workflows that need an immediate answer
10. Define at-most-once delivery.
A message is delivered 0 or 1 times—no retries; possible loss.
11. Define at-least-once delivery.
A message is delivered ≥ 1 times until acknowledged, risking duplicates.
12. Why is true exactly-once delivery difficult, and how is it usually approximated?
Distributed duplicates/ack gaps are hard to eliminate; systems use at-least-once + deduplication or idempotent processing to achieve “effectively once.”
13. How do idempotent consumers mitigate at-least-once duplicates?
They detect duplicate message IDs or design operations so re-execution has no additional effect.
14. What is a dead-letter queue (DLQ)?
A quarantine channel where messages go after repeated processing failures, for manual inspection or special handling.
15. Give two reasons ordering may break even on an ordered queue.
- Multiple parallel consumers
- Sharding/partitioning of topics
16. Why employ exponential backoff on retries?
To avoid hammering a struggling service and worsening the outage.
17. What problem do sagas address?
Maintaining consistency across multiple services without two-phase commits.
18. Contrast saga choreography with saga orchestration.
• Choreography: events trigger each step, no central controller
• Orchestration: a coordinator issues commands and decides on compensations
19. What is the Transactional Outbox pattern used for?
Atomically persisting both DB changes and the event that announces them, preventing “write-to-DB-but-lost-event” gaps.
20. Why is a message schema a 'public API'?
Multiple independent services rely on its structure and semantics.
21. What is the safest first choice when evolving a message schema?
Make backward-compatible additions (e.g., add optional fields).
22. How does a schema registry help?
Enforces compatibility rules and tracks versions so producers/consumers can’t publish or consume unknown/breaking schemas.
23. Name three bus-level metrics worth monitoring.
Queue/topic length, consumer lag, throughput (msgs/sec or bytes/sec).
24. What role do correlation IDs play in debugging?
They stitch together logs/traces across asynchronous hops to rebuild end-to-end flow.
25. Define back-pressure.
Mechanisms that slow or block producers when consumers fall behind, preventing unbounded queue growth or OOM failures.
26. What is hidden coupling in an event-driven system?
Undocumented dependencies on event order, timing, or schema that break when any publisher or consumer changes.
27. Why can the message bus itself become a single point of failure?
All traffic flows through it; if the broker cluster is down, inter-service communication halts.
28. How does complexity creep manifest in event-driven designs?
Circular dependencies, hard-to-trace workflows, and difficulty predicting the impact of a change.
29. Give one key trait of a lightweight in-process bus (e.g., MediatR/EventEmitter).
Runs entirely in memory inside one process—great for modular monolith decoupling, not cross-service.
30. Why is a Kafka-style log considered an 'event store'?
It durably retains the ordered stream so consumers can replay history or join late.
31. What integration tasks might a traditional Enterprise Service Bus (ESB) handle beyond routing?
Message transformation, protocol bridging, policy enforcement, and workflow orchestration.
32. Command vs. Event—Which implies required business action?
Command.
33. Command vs. Event—Which is safer to publish without expecting responses?
Event.
34. At-least-once vs. At-most-once—Which demands idempotency?
At-least-once (due to duplicate risk).
35. Ordered processing—Which two common strategies enforce it?
- Single consumer per queue
- Partition-by-key with per-partition FIFO
36. DLQ threshold—Why keep it low?
A spike indicates systemic failure; silent buildup hides critical issues.
37. Saga orchestration trade-off?
Central logic visibility vs. a new single point of control.
38. Schema change—Add field vs. rename field: which one breaks consumers?
Rename (non-backward compatible).
39. Back-pressure in pull vs. push systems—Who controls the pace?
• Pull: consumer
• Push: broker (may block publishers)
40. List three elements you’d log for robust traceability of each message.
- Correlation ID
- Message type/version
- Processing outcome (success/error + duration)
41. If a high-priority queue starts lagging, name two immediate mitigation steps.
- Scale consumer instances
- Throttle lower-priority producers
42. True or False: Exactly-once delivery rules out the need for idempotency.
False—defensive idempotency still protects against logic or configuration errors.
43. In an event bus, why is 'publish-subscribe' called loosely coupled yet still risky?
Publishers don’t know receivers, but all share implicit schema/semantic contracts that can break silently.
44. What does 'D.R.I.V.E.' remind you to design for in a message bus?
Decoupling, Reliability, Idempotency, Versioning, Exposure (observability).
45. Summarize the golden rule for message schemas in one sentence.
“Change slowly, version explicitly, and stay backward-compatible whenever humanly possible.”