AppSync's new async Lambda resolver is great news for GenAI apps


A common challenge in building GenAI applications today is the slow performance of most LLMs (except ChatGPT 4-o and Groq). To minimize delays and enhance user experience, streaming the LLM response is a must.

As such, we see a common pattern emerge in AppSync:

  1. The caller makes a GraphQL request to AppSync.
  2. AppSync invokes a Lambda resolver.
  3. The Lambda function queues up a task in SQS.
  4. The Lambda resolver returns so that AppSync can respond to the caller immediately. In the meantime, a background SQS function picks up the task and calls the LLM.
  5. The caller receives an acknowledgement from the initial request.
  6. The background function receives the LLM response as a stream and forwards it in chunks (as they are received) to the caller via an AppSync subscription.

This workaround is necessary because AppSync could only invoke Lambda functions synchronously. To support response streaming, the first function has to hand off calling the LLM to something else.

AppSync now supports async Lambda invocations.

On May 30th, AppSync announced [1] support for invoking Lambda resolvers asynchronously.

This works for both VTL and JavaScript resolvers. Setting the new invocationType attribute to Event will tell AppSync to invoke the Lambda resolver asynchronously.

Here's how the VTL mapping template would look:

And here's the JavaScript resolver:

The response from an async invocation will always be null.

The new architecture

With this change, we no longer need the background function.

  1. The caller makes a GraphQL request to AppSync.
  2. AppSync invokes a Lambda resolver asynchronously.
  3. AppSync immediately receives a null response and can respond to the original request.
  4. The Lambda function receives the LLM response as a stream and forwards it in chunks (as they are received) to the caller via an AppSync subscription.

This is a simple yet significant quality-of-life improvement from the AppSync team.

It's not just for GenAI applications. The same pattern can be applied to any long-running task requiring more than AppSync's 30s limit.

Links

[1] AWS AppSync now supports long running events with asynchronous Lambda function invocations

Master Serverless

Join 17K readers and level up you AWS game with just 5 mins a week.

Read more from Master Serverless

Modern applications rarely do just one thing at a time. An API request creates an order, and then another service needs to reserve stock, another to charge the customer, another to send an email, and so on. In a serverless or event-driven architecture, follow-up actions are usually triggered by messages (either events or commands). That gives us loose coupling, better scalability, and independent services. But it also introduces a reliability problem. “What happens when the database update...

If you use Claude Code a lot, you’ve probably run into usage limits, sometimes even in short coding sessions. But cost isn’t the only problem. In long-running sessions, the context window eventually fills up, and that can cause the agent to forget earlier decisions, lose important details, or come back from compaction with gaps in its working memory. Here are three tools worth checking out if you want to reduce token usage and make longer coding sessions possible. 1. CavemanThis is a Claude...

AI agents can now scan an entire open-source codebase for exploitable vulnerabilities in hours. Frontier models carry the complete library of known bug classes in their weights. So you can simply point an AI agent at a codebase and tell it to find zero-days. This isn't theoretical. Willy Tarreau, the HAProxy lead developer, reports that security bug reports have jumped from 2–3 per week to 5–10 per day. Greg Kroah-Hartman, the Linux kernel maintainer, described what happened: "Months ago, we...