Tests versus specs

It’s been popular for some years now to say that tests are “executable specifications.” I think this is a wrong way to to think about programming and leads to buggier programs than the traditional view that tests are tests.

Saying that tests are specs implies that you don’t need separate specifications. If this isn’t what you mean when you call a “test” a “spec,” then my arguments mainly won’t apply, but remember: regardless of what you mean, people will hear, “replace your specifications.”

Programs are three-legged stools that stand on the trio of specs, tests and program code. Stable programs require that each leg receives equal care. When done well, a useful tension between specs, tests and program code improves quality.

The form and labels can vary: specifications can be formal requirements or notes in a bug tracker; tests can be automated or manual. Whatever the form, every program has these three parts.

“Tests are specs” refers to a specific form: automated tests written in a style called “behavior driven.” Behavior-driven means that tests look like this:

describe Frobber... it 'frobs'...

Instead of

TestFrobber... test_frobs()...

From this style, we can infer the reasoning for replacing specs with tests:

Axiom: specs describe what a program should do

Axiom: tests verify that the program does what the specs say

Assume that you can write tests in a style where they describe what programs should do. Or, equivalently, assume you can write specifications as executable code that can verify compliance.

When so written, specs and a tests serve the same purpose. Thus, by the principle that you should eliminate redundancy, they should be the same thing.

The first problem with this argument is that there’s no real basis for believing that you can write specifications in executable language as clearly as if you wrote them in natural language. Things like using describe... it... and expect(x).toBe(y) (instead of assert x == y) superficially make program code look English-like (if you squint) but it’s not at all obvious that they make things more clear. If this style really is more clear, why are tests special? Why not write all code in quasi-English?

By a dubious appeal to Whorfianism – the idea that words we use affect what we do – behavior-driven style supposedly encourages people to write tests more declaratively because you “describe” what the program should do. It is true that programs are nearly always more understandable when written declaratively than imperatively, but again, there is no reason this should be specific to tests; all program code should be written as clearly and declaratively as possible.

Even if we assume that somehow we can write executable language as clearly as natural language, it doesn’t follow that we need only one or the other. We justify combining test and specs because “redundancy is bad.” By that logic, however, we don’t need test-specs to be separate from program code either. If the specification is executable, it doesn’t need to be the test because it could just as easily be the program. And then there was one.

In a sense, you truly can combine tests, specs and program. A program with unwritten program code is nothing but an idea. A program with unwritten tests and specs, is still a program. It’s just not a very good program and we know why.

[When programming] one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn’t work.

– The Mythical Man Month

Automated tests are programs too and just as capricious. A thorough test suite must often be as large as the program it tests and, therefore, will have as many errors.

Moreover, programs are notoriously hard to change without breaking, which brings us to the second major flaw in the logic behind combining tests with specs. It’s true that tests exist to verify conformance with specification, but that is not the only reason; tests also verify that changes don’t break things.

Remembering that tests are programs too, and just as hard to change (safely) as any program, it should be clear that the only way to avoid breaking tests is to avoid changing them. Furthermore, the only way to test the tests themselves is the write them in lock-step with the program under test. This is called the “red-green” cycle and it goes like this:

  1. Write a failing test
  2. Write the code necessary to make the test pass
  3. Repeat

In step one, if you check that the test fails in the way you expect, you know that it tests the code you wrote in step two. Then back away. So long as you don’t change the test, you can be reasonably sure it’s testing for the intended problem.

The benefit of a specification, by contrast, is that you write it before you write the program. It clarifies the problem and the goal and forestalls misguided coding.

Specs also will be incomplete and sometimes plain wrong, because writing a flawless description of program is as hard as writing a flawless program. So, to remain relevant, specs must change as programs change. If your specs are also your tests, not only will they be incomplete and wrong at the beginning, errors introduced by the changes will make them wrong in different ways at the end.

I still hope for executable specifications though I think “property-based” testing is a more promising avenue than “behavior-driven.” However they come, executable specifications will arrive – at latest, computers will learn to understand English better than humans – and even then, specifications cannot replace tests.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s