Read on lumigo blog
Read time: 6 minutes
Step Function added support for testing individual states [1] with the new TestState API [2]. Which lets you execute individual states with:
And returns the following:
With the TestState API, you can thoroughly test every state and achieve close to 100% coverage of a state machine.
So does this eliminate the need for Step Functions Local [3]?
Can we do away with end-to-end tests as well?
If not, then where should this new API fit into your workflow and how should you use it?
As I wrote previously [4], my strategy for testing Step Function uses a combination of:
The TestState API lets you test these hard-to-reach states directly. It should help you achieve better test coverage of your state machine with less effort.
However, it’s worth remembering that it’s not a local simulation tool. In most cases, it wouldn’t help you improve the speed of your feedback loop.
For example, if you’re testing a Lambda-based Task
state, then the referenced Lambda function and the relevant IAM role need to be deployed first. Similarly, after you change the Lambda function, you have to deploy the change first before you can test the state.
Another good use case for TestState API is for testing input or output processing logic [5]. This includes when you modify the current input with the Pass
state’s Result
field.
Because the TestState API takes in the state definition as an argument, you do not have to redeploy the state machine after every change. Instead, you can iterate on your settings and test them by passing the modified state definition to the TestState API.
For example, take the Task 2
state from the imaginary state machine above:
We can write tests to make sure that:
nextState
.Task 3
state.We need a way to fetch the definition of our state machine and the IAM role we should use. I like to encapsulate this into a given
module, like this:
And we also need a way to call the TestState API with our state definition and input. I like to encapsulate this into a when
module:
So I can keep my test code simple and easy to read.
(You can try out this demo project here [6])
I can write tests like this for every state in the state machine and cover every scenario.
However, as I mentioned before, both the Lambda function (used by the Task
state) and the IAM role need to be deployed first. So your typical workflow would be as follows:
As you iterate on your state definitions and Lambda functions, how do you maintain a fast feedback loop? Can you avoid having to redeploy the project every time you make a change?
Yes, you can. That’s why we need a full suite of different tests.
Yes, you should still perform component-level testing on the Lambda functions involved.
Use “remocal testing” (i.e. execute the Lambda function code locally against remote AWS resources) to maintain a fast feedback loop as you iterate on your Lambda function.
As you iterate on your Lambda function, you can run these tests and execute the latest code locally. Because the code is executed locally, you don’t need to deploy them to the Lambda service.
But a Task
state is more than just the Lambda function. There are input and output processing and there are error handling settings as well.
The TestState API helps you test these settings as we have seen in the example above.
Step Functions Local was best used to test execution paths that are difficult to reach, thanks to its mocking capability.
The ability to test individual states means this is no longer necessary.
Another potential use for Step Functions Local is so that you can iterate on your state machine locally without redeploying the project.
Unfortunately, this doesn’t work very well in practice.
Because your state machine likely depends on Lambda functions, SNS topics and other AWS resources. So you have to either provide a full simulation of all these resources (e.g. by running LocalStack [7]) or you still have to deploy your project first.
The same dynamic still exists with the TestState API.
But no, you don’t need to use Step Functions Local anymore.
End-to-end tests execute the state machine in the cloud and make sure everything works together. Before the TestState API, end-to-end tests played an important role in my test strategy.
They were the workhorse in my test suites.
From a test coverage point of view, you don’t need end-to-end tests anymore. You can achieve better test coverage with less effort by testing individual states with the TestState API.
However, it’s easy to lose sight of the forest when you only look at the individual trees.
I think there is still value in having end-to-end tests for business-critical execution paths. This is to ensure that all the individual states do indeed function together as a unit.
In a state machine, data flows from one state to the next. You need to make sure that if you change the output from Task #1
(see below) then you also change the conditions in Choice #2
.
It’s easy to break the contract between Task #1
and Choice #2
when you’re testing them separately.
This is similar to the kind of integration problems that you often face in a microservices environment. In the context of a state machine, end-to-end tests can help you catch these “integration” problems early.
To summarise:
Task
state, you still have to deploy the project first.[1] Official announcement blog post for the new TestState API
[2] The TestState API reference
[3] Step Functions Local
[4] A practical guide to testing AWS Step Functions
[5] Input and Output processing in Step Functions
[6] Demo project to illustrate how to use the TestState API
[7] LocalStack
Join 15K readers and level up you AWS game with just 5 mins a week.
I recently shared six event versioning strategies for event-driven architectures [1]. In response to this, Marty Pitt reached out and showed me how Orbital [2] and Taxi [3] use semantic tags to eliminate schema coupling in event-driven architectures and simplify the schema management. It's a novel way to manage schema evolution, and I want to share what I learnt with you. Problems with Schema Coupling In an event-driven architecture, event consumers are typically coupled to the schema of the...
Last week, we looked at 6 ways to version event schemas [1] and found the best solution is to avoid breaking changes and minimise the need for versioning. But how exactly do you do that? How can you prevent accidental breaking changes from creeping in? You can detect and stop breaking changes: At runtime, when the events are ingested; During development, when schema changes are made; Or a combination of both! Here are three approaches you should consider. 1. Consumer-Driven Contracts In...
Synchronous API integrations create temporal coupling [1] between two services based on their respective availability. This is a tighter form of coupling and often necessitates techniques such as retries, exponential delay and fallbacks to compensate. Event-driven architectures, on the other hand, encourage loose coupling. But we are still bound by lessor forms of coupling such as schema coupling. And here lies a question that many students and clients have asked me: “How do I version my...