Does Step Function's new TestState API make end-to-end tests obsolete?

Read time: 6 minutes

Step Function added support for testing individual states [1] with the new TestState API [2]. Which lets you execute individual states with:

the state definition
an input
an IAM role

And returns the following:

the output of the state
the status — whether it succeeded, errored or caught an error
the next state in the execution
the error and cause (where applicable)

With the TestState API, you can thoroughly test every state and achieve close to 100% coverage of a state machine.

So does this eliminate the need for Step Functions Local [3]?

Can we do away with end-to-end tests as well?

If not, then where should this new API fit into your workflow and how should you use it?

What problems does the TestState API solve?

As I wrote previously [4], my strategy for testing Step Function uses a combination of:

Component testing on individual Lambda functions.
Use end-to-end tests to test most execution paths.
Use Step Functions Local to test hard-to-reach execution paths (using mocks to direct the execution to the target branches).

The TestState API lets you test these hard-to-reach states directly. It should help you achieve better test coverage of your state machine with less effort.

However, it’s worth remembering that it’s not a local simulation tool. In most cases, it wouldn’t help you improve the speed of your feedback loop.

For example, if you’re testing a Lambda-based Task state, then the referenced Lambda function and the relevant IAM role need to be deployed first. Similarly, after you change the Lambda function, you have to deploy the change first before you can test the state.

Another good use case for TestState API is for testing input or output processing logic [5]. This includes when you modify the current input with the Pass state’s Result field.

Because the TestState API takes in the state definition as an argument, you do not have to redeploy the state machine after every change. Instead, you can iterate on your settings and test them by passing the modified state definition to the TestState API.

How to use the TestState API

For example, take the Task 2 state from the imaginary state machine above:

We can write tests to make sure that:

In the happy path, the execution succeeds and there are no nextState.
In the error case, the execution errs but the error is caught and the execution should proceed to the Task 3 state.

We need a way to fetch the definition of our state machine and the IAM role we should use. I like to encapsulate this into a given module, like this:

And we also need a way to call the TestState API with our state definition and input. I like to encapsulate this into a when module:

So I can keep my test code simple and easy to read.

(You can try out this demo project here [6])

I can write tests like this for every state in the state machine and cover every scenario.

However, as I mentioned before, both the Lambda function (used by the Task state) and the IAM role need to be deployed first. So your typical workflow would be as follows:

Work on the state machine design.
Implement the Lambda functions.
Deploy the project, including the state machine, Lambda functions, IAM roles and so on.
Run tests against individual states.

As you iterate on your state definitions and Lambda functions, how do you maintain a fast feedback loop? Can you avoid having to redeploy the project every time you make a change?

Yes, you can. That’s why we need a full suite of different tests.

Do we still need component tests?

Yes, you should still perform component-level testing on the Lambda functions involved.

Use “remocal testing” (i.e. execute the Lambda function code locally against remote AWS resources) to maintain a fast feedback loop as you iterate on your Lambda function.

As you iterate on your Lambda function, you can run these tests and execute the latest code locally. Because the code is executed locally, you don’t need to deploy them to the Lambda service.

But a Task state is more than just the Lambda function. There are input and output processing and there are error handling settings as well.

The TestState API helps you test these settings as we have seen in the example above.

Do we still need Step Functions Local?

Step Functions Local was best used to test execution paths that are difficult to reach, thanks to its mocking capability.

The ability to test individual states means this is no longer necessary.

Another potential use for Step Functions Local is so that you can iterate on your state machine locally without redeploying the project.

Unfortunately, this doesn’t work very well in practice.

Because your state machine likely depends on Lambda functions, SNS topics and other AWS resources. So you have to either provide a full simulation of all these resources (e.g. by running LocalStack [7]) or you still have to deploy your project first.

The same dynamic still exists with the TestState API.

But no, you don’t need to use Step Functions Local anymore.

Do we still need end-to-end tests?

End-to-end tests execute the state machine in the cloud and make sure everything works together. Before the TestState API, end-to-end tests played an important role in my test strategy.

They were the workhorse in my test suites.

From a test coverage point of view, you don’t need end-to-end tests anymore. You can achieve better test coverage with less effort by testing individual states with the TestState API.

However, it’s easy to lose sight of the forest when you only look at the individual trees.

I think there is still value in having end-to-end tests for business-critical execution paths. This is to ensure that all the individual states do indeed function together as a unit.

In a state machine, data flows from one state to the next. You need to make sure that if you change the output from Task #1 (see below) then you also change the conditions in Choice #2.

It’s easy to break the contract between Task #1 and Choice #2 when you’re testing them separately.

This is similar to the kind of integration problems that you often face in a microservices environment. In the context of a state machine, end-to-end tests can help you catch these “integration” problems early.

Summary

To summarise:

The new TestState API is awesome! You can use it to achieve nearly 100% test coverage of your state machines.
Because the business logic of a state machine is often split across Lambda functions and state definitions, you should still have tests for Lambda functions.
You should use “remocal tests” for Lambda functions to help you maintain a fast feedback loop.
Because the TestState API invokes the remote resources referenced by the Task state, you still have to deploy the project first.
You don’t need to use Step Functions Local anymore.
There is still value in end-to-end tests. You should use them to ensure critical business workflows work end-to-end.