How to detect and prevent breaking changes in event schemas

Last week, we looked at 6 ways to version event schemas [1] and found the best solution is to avoid breaking changes and minimise the need for versioning.

But how exactly do you do that?

How can you prevent accidental breaking changes from creeping in?

You can detect and stop breaking changes:

At runtime, when the events are ingested;
During development, when schema changes are made;
Or a combination of both!

Here are three approaches you should consider.

1. Consumer-Driven Contracts

In consumer-driven contract testing, the consumer writes tests that express their expectations of the provider. These expectations become a contract file that the provider checks before making any change. This ensures the provider does not make changes that will break the consumers.

Pact is a popular framework for consumer-driven contract testing.

In a typical API-to-API scenario, a consumer test simulates an HTTP request and its expected JSON response, Pact captures that as a contract, and the provider verifies it before merging any changes.

The same flow applies to events - the consumer test specifies the topic or queue plus the exact payload, Pact saves it as a message contract, and the event producer runs that pact to make sure its events match the consumer's expectations.

Pros

Catch breaking schema changes early.
Keeps teams loosely coupled.
Flexibility - the provider can still make breaking schema changes that don't affect its consumers.

Cons

Additional operational overhead for maintaining pact files and the pact broker.
Requires comprehensive test coverage or gaps will slip through.
Requires buy-in from the entire organization to be truly useful.
Still allows breaking changes when change is not covered by a consumer test - this can be problematic when replaying old events. For example, when a new consumer needs to seed its data based on historical events.

2. Integration tests with schema packages

In large organizations, getting widespread buy-in for consumer-driven contract testing is impractical without some sort of "shock event" that focuses the minds*.

That's why the teams at LEGO took a different approach [2] to contract testing their event-driven architecture.

The idea is pretty simple.

Event publishers publish their schemas as NPM packages, and event consumers write tests against these packages.

Pros

Low barrier to entry - no need to set up a Pact broker or wait for broad org-wide adoption of a new tool.

Cons

Doesn't work as well cross programming languages.
Lacks enforcement power. The publishers do not know about the consumer tests and do not need to exercise them before commiting changes. This leaves plenty of room for breaking changes to creep in before the consumers would notice.

This approach is simple, but not very effective at catching breaking changes early. Instead, it focuses on giving consumers confidence that their code works against the latest event schema from the publisher.

-----

* At a previous employer, we successfully introduced Pact due to several high-profile outages caused by integration issues. They gave us the political energy to align everyone's short-term priorities to push through an organization-wide change. As the saying goes, "Never let a good crisis go to waste".

3. Schema registry and broker-side validation

Both approaches above rely on testing to catch breaking changes during development. However, these are only as effective as their test coverage.

PostNL takes yet another approach [3], which combines the use of a schema registry and an event broker.

The schema registry serves as the single source of truth for event schemas.

When a producer publishes an updated schema, the registry applies compatibility rules to make sure there are no breaking changes - e.g. fields are not removed, and their data types have not changed.

If the change violates these rules, the registry rejects the schema, preventing incompatible versions from ever being registered.

This provides early feedback to event publishers, preventing breaking changes before they go into production.

Furthermore, as each event is ingested, the event broker looks up its schema in the registry and verifies that the payload matches the expected structure and data types.

Invalid messages can be rejected, quarantined, or routed to a dead-letter queue, so only valid events reach consumers.

Pros

Centralized governance - a single source of truth for all event schemas, simplifying management and versioning.
The combination of publish time and runtime validations provides complete protection against breaking changes.
Simplifies things for both event publishers and consumers as validation is taken care of by these central resources.

Cons

Running and scaling the schema registry and broker adds operational complexity and cost.
Compatibility rules cover structure and data types, but can’t enforce business‑specific expectations. For example, what events should be fired when?
Schema lookups and validation add latency overhead.
Can be overly restrictive. For example, breaking changes are not allowed even when there are no consumers.

Summary

Consumer‑driven contracts enforce each consumer’s exact event expectations at development time. It's ideal when you need to validate business‑specific rules and have teams buying into writing integration tests and achieving a high test coverage.

Alternatively, publishers can share event schemas as code libraries for consumers to test against. This is simple to set up and does not require organization-wide buy-in to be effective. However, it's not effective in preventing breaking changes.

Using a schema registry and broker-side validation gives you centralised governance and provides complete protection against breaking changes. However, it adds operational overhead and can't enforce business‑specific expectations.

You can also combine these techniques.

For example, use registry + broker validation to block accidental breaking changes and add consumer-driven contract tests to verify that specific business events fire exactly when and how you expect.

Links

[1] Event versioning strategies for event-driven architectures

[2] How LEGO approaches contract testing

[3] (Podcast) Event-driven architecture at PostNL with Luc van Donkersgoed

Master Serverless

How to detect and prevent breaking changes in event schemas

1. Consumer-Driven Contracts

2. Integration tests with schema packages

3. Schema registry and broker-side validation

Summary

Links

AppSync: how to implement unauthenticated operations

How to version APIs with API Gateway and Lambda

Bye bye schema coupling, hello semantic coupling