Read on my blog
Read time: 3 minutes
Migrating the database while continuing to serve user requests can be challenging. It’s a question that many students have asked during the Production-Ready Serverless workshop.
So here’s my tried-and-tested approach to migrating a live service to a new database without downtime. I’m going to use DynamoDB as an example but it should work with most other databases.
Before we dive into it, I want to remind you to keep things simple whenever you can. If the database migration can be completed within a reasonable timeframe, then consider doing it over a small maintenance window.
This is often not possible for large applications with a global user base. Or maybe you’re working in a microservices environment where downtime for a single service can impact many others.
However, it might be a good option for smaller applications or applications with a regional user base.
Ok, with that said, let’s go.
First, make sure all inserts and updates go to the new database.
Use the old database as a fallback for read operations. If the intended data is not available in the new database then fetch it from the old database and save it into the new database.
This is similar to a read-through cache.
Implementing these two steps will deal with the active data that users are interacting with.
Run a background script to migrate all data to the new database.
You should start the background script AFTER the application has been updated to perform Steps 1 & 2 above. Once the application has been updated, it will write the active data into the new database.
We need to make sure the script doesn’t override newer versions of the data we’re migrating.
Assuming the new database is a DynamoDB table, we need to use conditional puts. Use the attribute_not_exists conditional function to ensure the item doesn’t exist in the DynamoDB table already.
But what about deletes?
This sequence of events will be problematic:
Oops, we just added a piece of deleted data back into the system!
Thank you, race condition…
To handle this scenario, we can write a tombstone record in the new database. This stops the background script from writing the deleted data back into the system.
However, it might require behaviour change in the application to handle these tombstone records in read operations. Luckily, it doesn’t have to be forever.
Tombstones are necessary during the migration process. But once the background script has finished you can clean things up by:
This is my simple, 3-step process to migrate a live service to a new database. As mentioned at the start of this post, it should apply to most database systems. For this process to work, your new database needs to support some form of conditional write operation.
If you want to learn more about building production-ready serverless applications, then why not check out my next workshop?
The next cohort starts on January 8th, so there is still time to sign up and level up your serverless game in 2024!
Whenever you're ready, here are 3 ways I can help you:
Join 17K readers and level up you AWS game with just 5 mins a week.
Modern applications rarely do just one thing at a time. An API request creates an order, and then another service needs to reserve stock, another to charge the customer, another to send an email, and so on. In a serverless or event-driven architecture, follow-up actions are usually triggered by messages (either events or commands). That gives us loose coupling, better scalability, and independent services. But it also introduces a reliability problem. “What happens when the database update...
If you use Claude Code a lot, you’ve probably run into usage limits, sometimes even in short coding sessions. But cost isn’t the only problem. In long-running sessions, the context window eventually fills up, and that can cause the agent to forget earlier decisions, lose important details, or come back from compaction with gaps in its working memory. Here are three tools worth checking out if you want to reduce token usage and make longer coding sessions possible. 1. CavemanThis is a Claude...
AI agents can now scan an entire open-source codebase for exploitable vulnerabilities in hours. Frontier models carry the complete library of known bug classes in their weights. So you can simply point an AI agent at a codebase and tell it to find zero-days. This isn't theoretical. Willy Tarreau, the HAProxy lead developer, reports that security bug reports have jumped from 2–3 per week to 5–10 per day. Greg Kroah-Hartman, the Linux kernel maintainer, described what happened: "Months ago, we...