Software systems are getting bigger and more complex. And we are constantly looking for ways to test code in production without risking user experience. Canary deployments is a popular mechanism for rolling out changes incrementally, allowing us to limit the blast radius in case something goes wrong. However, they’re not without limitations. Canary deployments essentially sacrifice a small portion of users for the greater good. But what if you want to gain insights without impacting any real users? That's where the dark read pattern comes in. Canary Deployments: a quick primerCanary deployments let you release updates to a small subset of users while leaving the majority untouched. The process is simple:
Canary deployments work well for testing new features or configurations in real-world settings. However, they can impact the experience for users in the canary group. If something goes wrong, they will feel it. The Dark Read patternThe Dark Read pattern takes a fundamentally different approach. It involves deploying the new version alongside the old one and executing both in parallel. The user request is served from the existing system, but the request is simultaneously executed against the new system to observe its behaviour and validate its response. This way, you can see how the new code would perform if it were handling production traffic without impacting user experience. Think of it as a “shadow test”. The goal is to see:
The Dark Read pattern in actionAt DAZN, my team was responsible for rewriting the "schedule service". It's responsible for deciding what content the user sees on the home screen and is one of the most business-critical services. Given the business criticality, we opted for the Dark Read pattern.
We ran this for several weeks and were able to identify edge cases and fix bugs without impacting any users. This pattern is very effective for backend services where the focus is on response accuracy, latency, and handling load rather than UI or frontend logic. You get the perks of testing in production without the direct risk to user experience. Why Use Dark Read Over Canary?1. No User Impacts 2. Ideal for Load Testing in Production 3. More Extensive Validation 4. Continuous Monitoring without Worrying about Rollback Drawbacks1. Increased Complexity Running two versions of code in parallel adds architectural complexity and requires infrastructure for mirroring traffic, logging, and comparing results. 2. Applicable Only to Certain Types of Tests Dark reads are great for validating logic and load handling but won’t help in testing UX, frontend changes, or how users interact with a new feature. 3. Additional Costs Duplicating traffic and processing them twice leads to increased costs, especially under high traffic. ConclusionWhile the dark read pattern doesn’t replace canary deployments, it’s a useful tool to have in your arsenal. Canary deployments provide controlled, real-world testing with an impact radius, while dark reads offer shadow testing without risking real-world effects. For critical backend changes, database migrations, or performance improvements, dark reads enable deeper insights without risking real user impact. Related posts |
Join 14K readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.
ICYMI, Serverless Inc. recently announced the Serverless Container Framework. It allows you to switch the compute platform between Lambda and Fargate with a one-liner config change. This is a game-changer for many organizations! It'd hopefully nullify many of the "lock-in" worries about Lambda, too. As your system grows, if Lambda gets expensive, you can easily switch to Fargate without changing your application code. To be clear, this is something you can already do yourself. It's not a...
During this week's live Q&A session, a student from the Production-Ready Serverless boot camp asked a really good question (to paraphrase): "When end-to-end testing an Event-Driven Architecture, how do you limit the scope of the tests so you don't trigger downstream event consumers?" This is a common challenge in event-driven architectures, especially when you have a shared event bus. The Problem As you exercise your system through these tests, the system can generate events that are consumed...
I recently helped a client launch an AI code reviewer called Evolua [1]. Evolua is built entirely with serverless technologies and leverages Bedrock. Through Bedrock, we can access the Claude models and take advantage of its cross-region inference support, among other things. In this post, I want to share some lessons from building Evolua and offer a high level overview of our system. But first, here’s some context on what we’ve built. Here [2] is a quick demo of Evolua: Architecture This is...