The architecture diagram that launched a thousand Kafka clusters
Somewhere, right now, a team is drawing an architecture diagram. There are boxes for services. There are arrows between them. And in the middle, there's a big rectangle labeled "Kafka" or "Event Bus" or "Message Broker."
Everyone nods approvingly. This is event-driven architecture. This is how Netflix does it. This is modern.
Except it's not event-driven. It's request-response with a queue in the middle. You've added latency, operational complexity, and a $2,000/month Confluent bill, and you've gotten exactly nothing in return.
Let me explain.
Commands are not events
The single most common sin in "event-driven" systems is confusing commands with events.
CreateUser is a command. It's an instruction. It's telling a service to do something. UserCreated is an event. It's a fact. It's telling the world that something happened.
The difference matters enormously. A command has one recipient and expects a result. An event has zero or many recipients and expects nothing. When you put commands on a message queue and call it event-driven architecture, you've built request-response with extra steps. Your "event consumer" is just an API handler that reads from Kafka instead of HTTP, and now you've lost synchronous error handling, request tracing, and the ability to return a response to the caller.
I've seen systems where the "event" payload includes a replyTo queue so the consumer can send the response back. At that point, you've reinvented HTTP. Poorly.
The event sourcing trap
Event sourcing is a beautiful idea. Store every state change as an immutable event. Reconstruct state by replaying the event stream. Full audit trail. Time travel. It's elegant.
It's also a trap for 95% of applications.
Here's the test: do you actually need to query the event stream temporally? Do you need to ask "what was the state of this account at 3:47 PM on Tuesday?" If yes, event sourcing might be for you. If no, you've built an append-only log that you have to replay to answer basic questions.
I watched a team spend six months building an event-sourced order system. To answer "what are the active orders for customer X?" they had to replay every event for every order that customer had ever placed. Their 95th percentile query time was 4 seconds. They eventually added a read model (a regular database table) that they projected from the events. So now they had a database, an event store, a projection pipeline, and eventual consistency bugs. They could have just used PostgreSQL.
"Eventual consistency" is not an excuse
Speaking of eventual consistency. This is my favorite phrase in software architecture because of how often it's used as a get-out-of-jail-free card for bugs.
User creates an account. User immediately tries to log in. Login fails because the "UserCreated" event hasn't been processed yet. Support ticket filed. The team's response: "It's eventually consistent. The user just needs to wait."
No. The user does not need to wait. You need to design your system so that the operations users expect to be immediate are, in fact, immediate. Eventual consistency is a valid architectural choice for things like recommendation engines, search indexes, and analytics dashboards. It is not a valid choice for "can the user see the thing they just created."
You don't need Kafka
This one hurts, because Kafka is genuinely excellent software. It's also genuinely overdeployed.
If your system processes fewer than 1,000 events per second, you almost certainly don't need Kafka. PostgreSQL's LISTEN/NOTIFY can handle that. Redis Streams can handle that. A simple outbox table with a polling consumer can handle that, and it'll be transactionally consistent with your database writes, which Kafka won't be without a lot of extra work.
Kafka shines when you have millions of events per second, multiple consumer groups with different processing needs, and a genuine requirement for replay. I've used it in a maritime telemetry system processing 2 million+ concurrent vessel streams. Multiple consumers needed the same data: real-time dashboards, historical analytics, alerting systems. Replay was essential because satellite links drop, and when a vessel reconnects, you need to process the backlog without losing ordering guarantees.
That's a legitimate Kafka use case. Your B2B SaaS with 47 microservices and 100 events per second is not.
Schema evolution, or the silent killer
Here's a fun exercise. Take your event-driven system. Change a field name in one event schema. Deploy the producer. Watch what happens to every consumer.
If your answer is "they break silently," congratulations, you have the most common event-driven architecture in production. No schema registry. No versioning strategy. Just JSON payloads with implicit contracts that nobody documents and everyone assumes will never change.
Events are APIs. They need versioning, documentation, and compatibility guarantees. If you wouldn't change a REST API response without versioning, why are you doing it with events?
When events actually make sense
I'm not anti-event. Events are powerful when applied correctly. Here's when they earn their complexity:
Audit trails. When regulatory or business requirements demand a complete history of state changes, events are the natural representation. Not because you need event sourcing, but because an append-only log of facts is exactly what auditors want.
Multi-consumer fanout. When one thing happens and five different systems need to know about it, events are cleaner than five API calls. The producer doesn't need to know about the consumers. The consumers don't need to coordinate with each other. This is real decoupling.
Cross-service boundaries. Between services owned by different teams, events provide loose coupling that lets teams deploy independently. Inside a service, just call a function.
Genuine replay requirements. When data arrives out of order, when sources go offline and come back, when you need to reprocess historical data with new logic. This is where Kafka and friends earn their keep.
The litmus test
Here's how to know if your event-driven architecture is actually event-driven:
Draw your system as a request-response architecture. Remove the message broker. Replace every event publication with an API call to each consumer. If the system works identically, you didn't need events. You needed function calls.
If removing the broker would break something fundamental, if you'd lose the ability to add new consumers without changing the producer, if you'd lose replay, if you'd lose the decoupling between teams, then your events are earning their keep.
The goal isn't to avoid events. It's to use them where they provide value and use simpler tools everywhere else. Your architecture should be as simple as possible, and no simpler. Most of the time, that means a database, some API calls, and a background job queue. Not a distributed log with a $2,000 monthly bill.