Fine-grained access control in API Gateway with Cognito groups & Lambda authorizer


In security and access control, authentication and authorization mean two distinct but related things.

Authentication verifies the identity of a user or system.

Authorization determines what actions an authenticated user is allowed to perform in your system.

API Gateway has built-in integration with Cognito, but it doesn’t provide any fine-grained authorization out-of-the-box.

By default, a Cognito authorizer only checks if a user’s bearer token is valid and that the user belongs to the right Cognito User Pool.

Here are many ways you can implement a fine-grained authorization with API Gateway. Here are three that I have come across over the years:

  • Using Lambda authorizer with Cognito groups;
  • Using Cognito access tokens with OAuth scopes;
  • Using Lambda authorizer with Amazon Verified Permissions [1];

Over the next few weeks, let’s look at these approaches in-depth and then compare them at the end.

Today, let’s look at Lambda authorizer with Cognito groups.

Model roles with Cognito groups

In Cognito, you can use groups to model the different roles in your system, e.g. Admin, ReadOnly.

Users can belong to more than one group at once, just as they can have multiple roles within a system.

Cognito encodes the groups a user belongs to in the ID token. If you decode the ID token, you will see something like this:

Here, we can see the user belongs to both the Admin and ReadOnly groups.

Lambda authorizer

A Lambda authorizer can use this information to generate its policy document. As a reminder, a Lambda authorizer can return a policy document like this:

So, we need to take the list of groups a user belongs to and turn them into a set of policy statements.

One approach is to keep a mapping in your code like this.

In many systems, there are a small number of roles that supersede each other. That is, they are hierarchical, and a higher role has all the permissions of a lower role plus some.

In this case, we need to find the most permissive role that the user has.

But what if the roles are more lateral? That is, a user’s permissions are derived from all its roles.

Well, that’s easy enough to accommodate.

Conclusion

This is my preferred approach for simple use cases.

It’s easy to follow and test and makes no API calls (i.e. no extra latency overhead).

Furthermore, it does not require Cognito’s Advanced Security Features, which are charged at a much higher rate [2]. This makes it a very cost-efficient approach.

However, using a Lambda authorizer means you need to think about cold starts and their impact on user experience.

Also, the roles and policies are static. Whilst it’s good enough for most simple use cases, it cannot (easily) support more advanced use cases. For example, if you need to allow users to create custom roles while maintaining the tenant boundary.

Amazon Verified Permissions is a better fit for more advanced use cases. More on it later.

Links

[1] Amazon Verified Permissions service

[2] Cognito’s pricing page

Master Serverless

Join 12K readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.

Read more from Master Serverless

One of the most misunderstood aspects of Lambda is how throttling applies to async invocations. Or rather, how it doesn't! Every Lambda invocation has to go through its Invoke API [1], whether you're invoking the function directly or through an event source such as API Gateway or SNS. With the Invoke API, you can choose invocationType as either "RequestResponse" (i.e. synchronous) or "Event" (i.e. asynchronous). Synchronous invocations With synchronous invocations, throttling limits are...

When it comes to building event-driven architectures on AWS, EventBridge has become the de facto service for ingesting, filtering, transforming and distributing events to their desired destinations. It provides a standard envelope encapsulating each event, including metadata like the source, detail type, and timestamp. These fields are useful, but I'm gonna give you several reasons why you should wrap your event payload in its own envelope. For example, like this: 1. Clear separation between...

Years ago, I worked at a large e-commerce company that was one of the biggest food delivery services in the UK. They did something very interesting - they regularly ran load tests against production using fake orders. As a partial observer, here's what I think we can learn from this practice and how it partially caused the biggest outages they ever experienced (but not from the load test itself!). Load Testing in production As a food delivery service, they experienced large traffic spikes...