Join 8k+ readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.
When to use Step Functions vs. doing it all in a Lambda function Read on my blog Read time: 6 minutes I’m a big fan of AWS Step Functions. I use it to orchestrate all sorts of workflows, from payment processing to map-reduce jobs. Why it’s yet another AWS service you need to learn and pay for. And it introduces additional complexities, such as:
So it’s fair to ask “Why should we even bother with Step Functions?” when you can do all the orchestration in code, inside a Lambda function. Let’s break it down. Lambda pros1. Doing all the orchestration in code is simpler. It’s more familiar. Everything you can do in Step Functions, you can do with just a few lines of code. Case in point: module.exports.handler = async (event) => {
// error handling
try {
await doX()
} catch (err) {
// handle errors
}
// branching logic
if (condition === true) {
await doSomething()
} else {
await doSomethingElse()
}
// parallelism
const promises = event.input.map(x => doY())
const results = await Promise.all(promises)
return results
} 2. It’s likely cheaper. A Step Functions state machine would likely use Lambda for its In which case, you’d end up paying for both:
Paying for two services is likely more expensive than paying for just one. 3. It’s likely more scalable. When you use both Step Functions and Lambda functions (for the Step Functions Standard Workflows have modest limits on the no. of state transitions and the no. of executions you can start per second. Both of these limits can be raised. So with proper planning, they wouldn’t be an issue. Without Step Functions, you are limited only by the concurrent executions limit on Lambda. Similarly, Lambda has default throughput limits on the no. of concurrent executions. Again, with proper planning, and given the recent scaling changes for Lambda [3], you will be OK. Both the cost and scalability arguments are situational and depend on several architectural choices. E.g. do you use Standard or Express workflows? Do you use Lambda functions for Have you estimated your throughput needs and raised the soft limits accordingly? Because of these factors, they are only “likely” to be true based on what I think the average AWS customer is capable of. Lambda cons
Step Functions pros
2. Step Functions has a built-in audit history of everything that happened, including:
3. Step Functions have direct service integration with almost every AWS service. So it’s possible to implement an entire workflow without needing any Lambda functions. 4. No Lambda, no cold starts. No cold starts = more predictable performance. 5. Long execution time. A Standard Workflow can run for up to a year. 6. Callback patterns are a great way to support human decisions (e.g. approve a deployment request) in a workflow. 7. Standard Workflows are arguably the most cost-efficient way to wait. Because you don’t pay for the duration, only the state transition. 8. You can implement more robust error handling. This is important for business-critical workflows. To make a workflow more robust, you need to have both:
With Lambda, this puts you between two opposing forces:
It’s difficult to guess the right timeout in these situations. Especially when your workflow might have different branches. And if you get it wrong, then your workflow would be killed off halfway and there’s no easy way to restart from the point of failure. By lifting the error handling and retries out of your code and into the state machine itself, you alleviate this tension. 9. Step Functions lets you resume an execution from that point of failure [4]. Step Functions cons
5. Hard to test. However, this is getting easier with the new TestState API [5]. ConclusionIn conclusion, Step Functions offer a plethora of capabilities. But they come with their own set of complexities and costs. Whether it’s right for you depends on the demands of your use case. Generally, I advocate for the path of least resistance: simple workflows call for simple solutions. And Lambda functions excel in these scenarios. However, for more workloads, Step Functions can simplify the implementation with its built-in functionalities that would otherwise require custom solutions. For example, when you need to incorporate human decisions into a workflow. You should use Step Functions and leverage its callback patterns instead of creating a bespoke solution. Similarly, for workflows where resuming from a point of failure is important, you should go with Step Functions. Personally, I heavily lean towards Step Functions for business-critical workflows. The advantages of visualization, audit trails, and robust error handling align with the high stakes involved. These workflows, like payment processing, warrant the extra investment in Step Functions due to their critical nature. I will leave you with this diagram with the pros & cons captured in one place. To learn more about testing Step Function, check out my course, Testing Serverless Architectures [6]. I have a whole chapter dedicated to Step Functions. Links[1] Why is Step Functions so hard to test? [2] What is AWS Step Functions? An in-depth overview. [3] AWS Lambda functions now scale up to 12X faster [4] Introducing AWS Step Functions redrive to recover from failures more easily [5] Does Step Function’s new TestState API make end-to-end tests obsolete? [6] Testing Serverless Architectures course Whenever you're ready, here are 3 ways I can help you:
|
by Yan Cui, AWS Serverless Hero
Join 8k+ readers and level up you AWS game with just 5 mins a week. Every Monday, I share practical tips, tutorials and best practices for building serverless architectures on AWS.
"The offer is strong with this one." Hey there! I've got an awesome offer for you this Star Wars Day. I've partnered with the best AWS content creators to give you 30% off on a fantastic range of AWS books and courses! From left to right: me, Philip Riecks, Sandro Volpicella, Alex DeBrie, Daniel Galati and Tobias Schmidt. Enter the code TBMAPRBD at checkout to get your discount. But hurry, this offer ends in 3 days. Check out these deals: 30% OFF on AppSync Masterclass: Learn fullstack...
I can’t believe it’s May already! It’s been a busy few months here. Here’s what I've been up to and what you might have missed. Blog posts How to handle execution timeouts in AWS Step Functions How to apply the TDD mindset to serverless Here are four ways you can implement WebSockets using serverless DynamoDB now supports cross-account access. But is that a good idea? When to use Step Functions vs. doing it all in a Lambda function When to use API Gateway vs. Lambda Function URLs First...
Step Functions lets you set a timeout on Task states and the whole execution. By default, a Task state times out after 60 seconds. But an execution can run for a year if no TimeoutSeconds is configured. To a user, the execution would appear as “stuck”. AWS best practices recommend using timeouts to avoid such scenarios [1]. So it’s important to consider what happens when you experience a timeout You can use the Catch clause to handle the States.Timeout error when a Task state times out. You...