Effective Automation Tests

Kevin Fawcett
ITNEXT
Published in
10 min readOct 2, 2020

--

Automation tests are not punishment. Their purpose is not to meet acceptance criteria, pass a code review, or satisfy code coverage tools. They improve code confidence by preventing regressions and reduce time spent manually testing. The following points illustrate how to use them more effectively.

Test behavior, not an implementation

Challenge: Write tests for functions using only its outputs and parameters (signature). This strategy is a key component of test-driven development and will produce simpler application code. If that proves challenging, consider splitting functions to reduce complexity and responsibility.

The fake function below will be used as an example. It increments a number, ensuring that the next one is greater than the previous.

A contrived example that increments a number — applicable to other languages.

An implementation test targets the internals of the function to determine how to set up and mock.

Sample test that uses the implementation to determine mocks

A behavioral test is focused on outcomes, and only has access to change the values of the function’s parameters (in this case none).

Sample test focusing on the behavior of the getNextSequence function

Both tests pass, but the one focused on implementation does not survive the following refactor.

The behavior of the getNextSequence function didn’t change, and the application still works. However, the implementation test targets a hidden detail that no longer exists and sets up a mock that is no longer valid. It will fail, which is a false negative.

What is a false negative?

A false negative is when a test fails even though the function is working as expected; the failure is not a legitimate application bug.

Dangers of false negatives

  • Developers lose confidence in the tests. Code changes are often tested manually while writing them (TDD might not always be followed). A developer will assume a failing test that someone else wrote is unreliable and may be frustrated to waste time understanding and updating or removing it.
  • Developers lose confidence in the application. When a codebase has a reputation for poorly written tests, sweeping changes like upgrading frameworks become intimidating. Those changes end up shelved or never done.
  • Time lost from troubleshooting. Hundreds of lines of setup can make root cause analysis difficult. The time spent diagnosing a bug in the test code could have been used on a bug in the application instead. A shorter solution would be disabling broken tests, resulting in lost coverage and confidence.
  • Doubling or tripling build time. A test is flaky when it fails intermittently. When builds can be fixed by restarting them, that becomes the first step of troubleshooting. When a developer accepts this norm, they may not look at the output of a failed build until the second or third attempt.
  • Increased manual testing. If the automation tests aren’t trusted, teams will fall back to manual testing. Manual testing is slower, prone to human error, and more expensive, but provides the missing confidence.

What is a false positive?

False positives occur when a test passes, despite the application being broken.

In the previous example, generateId is being stubbed. Imagine the actual function had the following definition.

let counter = 0;
function generateId() {
// Counting down, not up!
counter -= 1;
return counter;
}

The implementation test would erroneously pass because we’re overriding the return value with fake data. The behavioral test would get stuck in an infinite loop, timeout, and successfully fail.

Bonus example:

test('create user', function () {
createUsers()
.then((user) => {
expect(user.id).to.not.be.null();
});
});

The test above does not wait for the asynchronous createUsers call to finish, so it passes regardless of the outcome. There are multiple fixes available, depending on the testing framework.

// Include a done callback
test('create user', function (done) {
createUser()
.then((user) => {
expect(user.id).to.not.be.null();
done();
});
});
// return the createUser() promise
test('create user', function () {
return createUser()
.then((user) => {
expect(user.id).to.not.be.null();
});
});
// use async/await. The test will catch unhandled exceptions
test('create user', async function () {
const user = await createUser();
expect(user.id).to.not.be.null();
});

Unfortunately, these kinds of false positive might slip past code review.

Ways to prevent them:

  • Use lint plugins like prefer-expect-assertions
  • With test-driven development, the tests start in a failing state and won’t pass until the implementation is written.
  • Some frameworks, like ava, won’t pass a test unless it makes an assertion.

Dangers of false positives

  • Developers lose confidence in the tests. The tests have failed their one and only job. What other tests are not actually testing anything?
  • Quality Assurance (QA) loses confidence in the developers. A QA tester will be frustrated about blatant bugs being introduced into a test environment, despite a passing build with automation tests. They will be slower to adopt the idea of automation and feel the need to manually verify every ticket.
  • Bugs are shipped to production. A benefit of automation tests is to verify every part of the system in a timely manner. QA testers are human and depending on the company, non-existent. The customer gains the responsibility of reporting bugs, which can sometimes be show-stopping. They may not bother reporting it, and either silently resent the application or move to another platform.

Code coverage is a guide, not a metric of code quality

Tools like nyc, which inform lines of code missing test coverage, introduced a toxic change in thinking for some managers. One hundred percent test coverage produces diminishing returns. This mentality encourages testing implementation details so that every if statement is covered, regardless of the effect on behavior. In the getNextSequence example, developer time was wasted setting up mocks for a needless implementation test, and later diagnosing a false negative produced from a valid refactor.

That does not mean the coverage tools are useless. Code coverage tools are there to help, not dictate. They can find oversights where tests are missing entirely, or provide ammunition to present a case to a product owner about code confidence.

Don’t rely on mocking libraries

Many developers consider mocking libraries like sinon a code smell.

“A code smell is a surface indication that usually corresponds to a deeper problem in the system.” ~ Martin Fowler

Mocks can be used to access internal code that is not exposed through a parameter or class variable. However, after moving away from testing implementation details, the need for them dwindles. Others can be removed by following the dependency injection pattern. See the adjustment to the example below.

The ID generator function is passed as a parameter instead of being imported

Instead of stubbing generatorIdFunction, a function could be passed directly: getNextSequence((lastSequence) => { return lastSequence + 1; });

While mocks may not be required, they do have a convenient and expressive API: expect(callback).to.be.calledOnce(); However, they should only have limited access through the parameters.

Handling third-party libraries

Third-party libraries are not a justification for mocking libraries. Using dependency injection, an interface can easily be swapped for a fake implementation.

import serverUploader from "./serverUploader";const uploadToServer(file) {
if (!file) {
// This is what we're testing
throw new Error("You didn't include a file!");
}

serverUploader.upload(file);
}

Instead of importing serverUpload, include it as an injected dependency.

const uploadToServer(serverUploader, file) {
if (!file) {
throw new Error("You didn't include a file!");
}

serverUploader.upload(file);
}

or

class ServerUploader {
constructor(serverUploader) {
this.serverUploader = serverUploader;
}
uploadToServer(file} {
if (!file) {
throw new Error("You didn't include a file!");
}

this.serverUploader.upload(file);
}
}

With the increased access to serverUploader, providing a fake implementation of () => { // do nothing } is trivial.

Third-party code should be tested where possible

External libraries like moment.js have their own tests, so there is no need to cover every case, but it’s wise to cover cases the application is using, or at least not purposely exclude them. Having those tests can catch bugs from upgrading versions, and ensure functionality persists when switching to libraries like luxon.

Don’t test constants and variable assignment

function getTodoAction(title) {
return { type: 'ADD_TODO', title };
}
test('returns todo action', function () {
const todo = getTodoAction('Test');
expect(todo.type).to.equal('ADD_TODO');
expect(todo.title).to.equal('Test');
});

This example may seem extreme (I’ve seen it), but happens frequently with developers dictated by code coverage tools.

There is nothing to test here because there is no behavior. The test above is testing that a variable is assigned to a value, one of the most basic functionalities of a language. If that fails, there will be bigger problems to address.

Not only does this test not add value, but it also removes value. Every time a developer changes the structure of the action or the name of the type, they must also fix the false negative test failure.

These lines would be implicitly covered by integration and end-to-end tests.

Extract and test legacy code in small chunks

Sometimes writing tests is a huge burden, especially when dealing with legacy code that was created without testing in mind. Developers that are afraid to touch this code will rely on manual verification to reduce effort.

Splitting the code into logical, manageable chunks can alleviate the burden: take 100 related lines in a 2000 line function and extract them into a separate file. With the separation, patterns like dependency injection can be added effortlessly.

After splitting a small chunk, stop. Legacy code may not get changed very often, and there are likely not many people that understand it fully. Because of the lack of automation, larger changes introduce risk. Why not release a small change, then split more at a later date?

This may not be achievable if a manager/co-worker is prescribing a sweeping change, although one way to get them on board is using the keywords mitigate and risk.

Another difficulty with legacy codebases is out-dated or poorly written tests. Dealing with a 2,000 line test file is difficult, especially if the tests are dependent on each other. Fortunately, having two test files for the same target is a reasonable solution. Make a new file for new tests, and rename the old file to filename.deprecated.test.js.

Write independent tests

Isolated testing allows developers to have more confidence that other test data is not giving false positives and negatives. Consider the following example.

test('user is created', async function () {
for (let i = 0; i < 10; i += 1) {
await createUser();
}
const users = await getUsers();
expect(users).to.have.lengthOf(10);
});

This test is not cleaning up after itself, which means every test that runs after it will have ten users in the database. Alternatively, it could be a victim of another test not cleaning up, which could cause it to fail unexpectedly and be difficult to diagnose, especially if that user was created in another file.

A safer approach would be to start with a fresh database on every test, even if it takes longer to run.

beforeEach(function () {
cleanDatabase();
});

Use test-driven development (TDD)

This methodology makes some people groan; others question its feasibility. The goal is to write tests for a function before writing its implementation, using its intended behavior as a guide. For people used to testing implementation details, that task seems unreasonable.

Following TDD helps developers…

  • think about their interface and response
  • keep functions small, with a single responsibility
  • avoid testing implementation details — there aren’t any yet!
  • think about dependency injection, resulting in fewer or no mocks.

When to not use it:

  • When creating a proof of concept. This is code that is acceptable to have glaring bugs. Move fast, and don’t bother writing tests since it’s only an experiment that you are going to throw away. The lack of automation can be used as leverage to destroy it since product owners might want to ship it.
  • When the final architecture is unclear. Sometimes developers don’t know what’s involved in implementation until they start doing it. They can’t think of any classes and function signatures because they’re still not sure they’re taking the right approach. Writing tests at this stage will only result in wasted time, as you’ll likely delete/rewrite them.

Remove randomness from tests

Introducing randomness may seem appealing because of the variety of samples.

test('planet is created', function () {
planets = ['mars', 'pluto', ...theRestOfThem];
randomPlanet = selectRandomFrom(planets);
const planet = createPlanet(randomPlanet); expect(planet.name).to.be.oneOf(planets);
}

There are a few major drawbacks with this test:

  • If this test fails on Pluto (i.e. planets.filter(p => p !== “pluto")), it will be difficult to diagnose, especially when including planets outside of the milky way. This would be considered a false negative because the test sample does not match the business requirement of excluding Pluto.
  • Without logging, the planet that caused the failure will be unknown. Logging for a test is a code smell.
  • Recreating the failure locally will be unlikely, and the build will pass on the next CI run. The test will be considered flaky and ignored, destroying the intention of adding randomness in the first place.
  • Only one case is checked for each test run. If all the planets need to be checked, splitting them into individual tests is available in most frameworks.
test.each(planets, 'planet x is created', function () { ... }
  • Imagine the createPlanet call trims the name:planet.name.slice(0,4). This test will fail for all cases except mars (only milky way), where it produces a false positive.

Something similar to this happened to me. Someone put randomness in a test touching code I changed. Miraculously, the tests passed locally (1 in 10 chance), and once again on my PR. However, once it was merged, my luck ran out and everyone’s builds started failing. I wasted time trying to diagnose the differences in the environment variables, only to find that the test was the culprit.

Write more integration tests

Write tests. Not too many. Mostly integration. — Guillermo Rauch

Integration Tests > Unit Tests > End to End Tests in terms of value, not necessarily frequency

A new paradigm is emerging placing integration tests as most valuable on the testing pyramid. The tests ensure cohesion between functions and external systems by running the code in the same way as the user. Although they may overlap coverage of some unit tests, unit tests are still valuable.

Integration tests…

  • provide more confidence by using the code in the way a user would.
  • limit the exposure to implementation details. The only access available are inputs and outputs.
  • are more reliable and faster than end-to-end tests, which introduce external variables such as browsers, latency, and operating systems.
  • have fewer false positives and negatives.
  • take longer to run than unit tests since they probably interact with external systems like a database.

The last point is only a slight drawback with modern processors. If the main concern is the speed of development, there are options:

  • Slow down. Hurried code is prone to human error. Is the impending deadline really worth losing customers to bugs?
  • Limiting scope can create more time for testing.
  • Many testing frameworks have a --watch command that only runs tests related to changes. The flag won’t be run on a CI pipeline.
  • Consider a framework that can run tests asynchronously or in a distributed environment. After all, tests should not be dependent on each other.
  • Disable irrelevant tests locally. Most frameworks have an option to only run the file or test that is actively being worked on. (i.e. describe.only('...)

Conclusion

Put the same quality into tests as the application code. If a test is not valuable, don’t be afraid to rewrite it or remove it entirely. Don’t take test coverage too seriously. Tests are there to provide confidence, not be a burden or chore.

--

--