AppSync's new async Lambda resolver is great news for GenAI apps


A common challenge in building GenAI applications today is the slow performance of most LLMs (except ChatGPT 4-o and Groq). To minimize delays and enhance user experience, streaming the LLM response is a must.

As such, we see a common pattern emerge in AppSync:

  1. The caller makes a GraphQL request to AppSync.
  2. AppSync invokes a Lambda resolver.
  3. The Lambda function queues up a task in SQS.
  4. The Lambda resolver returns so that AppSync can respond to the caller immediately. In the meantime, a background SQS function picks up the task and calls the LLM.
  5. The caller receives an acknowledgement from the initial request.
  6. The background function receives the LLM response as a stream and forwards it in chunks (as they are received) to the caller via an AppSync subscription.

This workaround is necessary because AppSync could only invoke Lambda functions synchronously. To support response streaming, the first function has to hand off calling the LLM to something else.

AppSync now supports async Lambda invocations.

On May 30th, AppSync announced [1] support for invoking Lambda resolvers asynchronously.

This works for both VTL and JavaScript resolvers. Setting the new invocationType attribute to Event will tell AppSync to invoke the Lambda resolver asynchronously.

Here's how the VTL mapping template would look:

And here's the JavaScript resolver:

The response from an async invocation will always be null.

The new architecture

With this change, we no longer need the background function.

  1. The caller makes a GraphQL request to AppSync.
  2. AppSync invokes a Lambda resolver asynchronously.
  3. AppSync immediately receives a null response and can respond to the original request.
  4. The Lambda function receives the LLM response as a stream and forwards it in chunks (as they are received) to the caller via an AppSync subscription.

This is a simple yet significant quality-of-life improvement from the AppSync team.

It's not just for GenAI applications. The same pattern can be applied to any long-running task requiring more than AppSync's 30s limit.

Links

[1] AWS AppSync now supports long running events with asynchronous Lambda function invocations

Master Serverless

Join 11K readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.

Read more from Master Serverless

Software systems are getting bigger and more complex. And we are constantly looking for ways to test code in production without risking user experience. Canary deployments is a popular mechanism for rolling out changes incrementally, allowing us to limit the blast radius in case something goes wrong. However, they’re not without limitations. Canary deployments essentially sacrifice a small portion of users for the greater good. But what if you want to gain insights without impacting any real...

In security and access control, authentication and authorization are two distinct yet interconnected concepts. Authentication is the process of confirming the identity of a user or system, while authorization defines the actions that the authenticated user is permitted to perform within your system. Although API Gateway integrates directly with Cognito, it lacks built-in support for fine-grained authorization. In a previous article, we looked at implementing fine-grained authorization using a...

A common narrative is that one should always use access tokens to call your APIs, while ID tokens are strictly for identifying users. Some of it has come from this article by Auth0 [1], which makes a strong statement about using ID tokens: However, things are usually more nuanced. In some cases, using ID tokens instead of access tokens is both acceptable and pragmatic. Cognito User Pools might be one of these cases. Cost of using access tokens The common practice amongst Cognito users is to...