Here is one of the most misunderstood aspects of AWS Lambda


One of the most misunderstood aspects of Lambda is how throttling applies to async invocations. Or rather, how it doesn't!

Every Lambda invocation has to go through its Invoke API [1], whether you're invoking the function directly or through an event source such as API Gateway or SNS.

With the Invoke API, you can choose invocationType as either "RequestResponse" (i.e. synchronous) or "Event" (i.e. asynchronous).

Synchronous invocations

With synchronous invocations, throttling limits are checked to make sure you stay within:

  • The regional concurrency limit, and;
  • The function's reserved concurrency.

However, this is not true for async invocations.

Async invocations

With asynchronous invocations, the Event Invoke Frontend service (see diagram below) accepts the request and passes it onto an internal queue. It does not check the concurrency limits and will succeed even if the function does not have the concurrency to process the request. But that's OK because it does not have to process the request right away, given the asynchronous nature of the invocation.

Instead, concurrency limits are checked when the internal poller attempts to invoke the function synchronously.

This means that you will never experience throttling when you invoke a function asynchronously.

Even if you set the reserved concurrency to 0 - which will stop the function from running - the "Event" Invoke call will still succeed.

But what happens when the internal poller invokes the function synchronously and the function is throttled?

In that case, the invocation request is returned to the internal queue and is retried for up to 6 hours. This is described in the official documentation here [3].

Async invocations vs. Async event sources

Another important detail to consider is that async event sources such as SNS and EventBridge also invoke Lambda asynchronously.

This means, even though they each offer a longer retry period:

  • EventBridge retries failed deliveries for up to 24 hours.
  • SNS retries failed deliveries for up to 23 days.

But, because async invocations never fail due to throttling, so they count as successful deliveries for SNS and EventBridge. Lambda's Event Invoke Frontend service accepts the request, and any throttling errors will be retried for up to 6 hours ONLY.

I asked about this on Twitter, and two of the principal engineers on the Lambda team confirmed my hypothesis above. See their responses here and here.

So what?

Why do these details matter?

Quite a few of you have told me that you prefer SNS -> Lambda over a direct async Lambda invocation because it protects against throttling errors.

Good news, given the above, you don't need the SNS topic! (unless you need it for fan-out)

This is a good thing because:

  • Fewer moving parts.
  • Fewer things to pay for.
  • One less place where things can go wrong (e.g. delivery problem from SNS to Lambda).

You are welcome :-)

This follows one of my most important architectural principles and I think you should follow it too.

Aren't Lambda-to-Lambda calls an anti-pattern?

Yes, synchronous Lambda-to-Lambda calls are an anti-pattern.

However, there are valid use cases for asynchronous Lambda-to-Lambda calls.

For example, when you offload secondary responsibilities (e.g. analytics tracking) from a user-facing API function to a second function and invoke it asynchronously.

This is so that:

  • The user-facing API function can respond to the user quicker and improve user experience.
  • We get built-in retries + DLQ/failure destination support for the second, asynchronously invoked function.

These benefits justify the extra cost of invoking a second function instead of doing everything in the API function.

Links

[1] Lambda's Invoke API

[2] AWS re:Invent 2022 - A closer look at AWS Lambda (SVS404-R)

[3] How Lambda handles errors and retries with asynchronous invocation

Master Serverless

Join 13K readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.

Read more from Master Serverless

2024 was the year I got back and amongst the community, and it felt great to be back! Blog I published 33 new blog posts. As a whole, my blog garnered 353k views from 255k visitors. About half of them came through Google search. This is down from 2023... but the decline is offset by more people reading my content through my newsletter nowadays. Most read blog posts: Hit the 6MB Lambda payload limit? Here’s what you can do When to use Step Functions vs. doing it all in a Lambda function How to...

One of my favourite questions from the November cohort of Production-Ready Serverless [1] is, "How do you handle e2e tests involving multiple services across bounded contexts?" In a microservices environment, testing user journeys that span across multiple bounded contexts requires collaboration and a clear delineation of responsibilities. Depending on how your organisation is structured, different teams are responsible for testing parts or the entirety of the user journey. For example... The...

The ability to invalidate a user's session with immediate effect is a common enterprise requirement. For example: If a user's credentials are compromised, we need to immediately revoke the user's access and force the user to change credentials. If an employee is terminated or an external contractor's access is revoked, their session should be invalidated immediately to prevent misuse. Many regulations mandate strict access controls and the ability to prevent unauthorized access in real time....