Multi-region Cognito User Pools: can we do it?


Cognito doesn't support multi-region user pools (yet), but maybe we can get there ourselves 🤔

Let's do a thought experiment together.

Proposed solution

An API Gateway authorizer has a "ProviderARNs" attribute, where you can reference more than one Cognito User Pools.

These are ARNs, so one assumes it can reference a user pool in another region.

Cognito's "Post confirmation" hook is fired after a signed-up user confirms their user account. This invokes a Lambda function with the "userAttributes" and "clientMetadata".

We can use the information in the payload to create the user in the other region.

BUT, we don't have the user's password.

If we DON'T use passwords, that will not be a problem. That is, we can go for passwordless authentication. For example, we can use one-time passwords [1] or magic links [2].

If we don't use passwords, we can set a random password when creating the user in the other region.

In the front end, the user can log in with either user pool. The user's JWT token can be used to access APIs in either region.

Handling failures

What if the Cognito service is down in one of the regions?

The whole point of going multi-region is to provide higher redundancy, but if Cognito is down in one region, then synchronization is broken.

What do we do then?

We introduce a fallback.

If we can't synchronously add a new user to the other region's user pool, we push a message in a SQS queue.

A Lambda function is subscribed to the queue and will retry the failed operation N times before moving the message to a DLQ for further investigation and manual retries.

Summary

This is how the solution will look at a high level.

There are a few constraints:

  • We have to use API Gateway REST APIs
  • We have to use passwordless authentication

Under these conditions, I think it's a workable solution.

Do you think this will work?

Do you see any flaws in this solution?

And would you like to see a POC for this solution?

Links

[1] Passwordless Authentication made easy with Cognito: a step-by-step guide​

[2] Implementing Magic Links with Amazon Cognito: A Step-by-Step Guide​

Master Serverless

Join 17K readers and level up you AWS game with just 5 mins a week.

Read more from Master Serverless

Lambda Durable Functions makes it easy to implement business workflows using plain Lambda functions. Besides the intended use cases, they also let us implement ETL jobs without needing recursions or Step Functions. Many long-running ETL jobs have a time-consuming, sequential steps that cannot be easily parallelised. For example: Fetching data from shared databases/APIs with throughput limits. When data needs to be processed sequentially. Historically, Lambda was not a good fit for these...

Step Functions is often used to poll long-running processes, e.g. when starting a new data migration task with Amazon Database Migration. There's usually a Wait -> Poll -> Choice loop that runs until the task is complete (or failed), like the one below. Polling is inefficient and can add unnecessary cost as standard workflows are charged based on the number of state transitions. There is an event-driven alternative to this approach. Here's the high level approach: To start the data migration,...

Lambda Durable Functions comes with a handy testing SDK. It makes it easy to test durable executions both locally as well as remotely in the cloud. I find the local test runner particular useful for dealing with wait states because I can simply configure the runner to skip time! However, this does not work for callback operations such as waitForCallback. Unfortunately, the official docs didn't include any examples on how to handle this. So here's my workaround. The handler code Imagine you're...