Tests versus specs

It’s been popular for some years now to say that tests are “executable specifications.” I think this is a wrong way to to think about programming and leads to buggier programs than the traditional view that tests are tests.

Saying that tests are specs implies that you don’t need separate specifications. If this isn’t what you mean when you call a “test” a “spec,” then my arguments mainly won’t apply, but remember: regardless of what you mean, people will hear, “replace your specifications.”

Programs are three-legged stools that stand on the trio of specs, tests and program code. Stable programs require that each leg receives equal care. When done well, a useful tension between specs, tests and program code improves quality.

The form and labels can vary: specifications can be formal requirements or notes in a bug tracker; tests can be automated or manual. Whatever the form, every program has these three parts.

“Tests are specs” refers to a specific form: automated tests written in a style called “behavior driven.” Behavior-driven means that tests look like this:

describe Frobber... it 'frobs'...

Instead of

TestFrobber... test_frobs()...

From this style, we can infer the reasoning for replacing specs with tests:

Axiom: specs describe what a program should do

Axiom: tests verify that the program does what the specs say

Assume that you can write tests in a style where they describe what programs should do. Or, equivalently, assume you can write specifications as executable code that can verify compliance.

When so written, specs and a tests serve the same purpose. Thus, by the principle that you should eliminate redundancy, they should be the same thing.

The first problem with this argument is that there’s no real basis for believing that you can write specifications in executable language as clearly as if you wrote them in natural language. Things like using describe... it... and expect(x).toBe(y) (instead of assert x == y) superficially make program code look English-like (if you squint) but it’s not at all obvious that they make things more clear. If this style really is more clear, why are tests special? Why not write all code in quasi-English?

By a dubious appeal to Whorfianism – the idea that words we use affect what we do – behavior-driven style supposedly encourages people to write tests more declaratively because you “describe” what the program should do. It is true that programs are nearly always more understandable when written declaratively than rather than imperatively, but again, there is no reason this should be specific to tests; all program code should be written as clearly and declaratively as possible.

Even if we assume that somehow we can write executable language as clearly as natural language, it doesn’t follow that we need only one or the other. We justify combining test and specs because “redundancy is bad.” By that logic, however, we don’t need test-specs to be separate from program code either. If the specification is executable, it doesn’t need to be the test because it could just as easily be the program. And then there was one.

In a sense, you truly can combine tests, specs and program. A program with unwritten program code is nothing but an idea. A program with unwritten tests and specs, is still a program. It’s just not a very good program and we know why.

[When programming] one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn’t work.

– The Mythical Man Month

Automated tests are programs too and just as capricious. A thorough test suite must often be as large as the program it tests and, therefore, will have as many errors.

Moreover, programs are notoriously hard to change without breaking, which brings us to the second major flaw in the logic behind combining tests with specs. It’s true that tests exist to verify conformance with specification, but that is not the only reason; tests also verify that changes don’t break things.

Remembering that tests are programs too, and just as hard to change (safely) as any program, it should be clear that the only way to avoid breaking tests is to avoid changing them. Furthermore, the only way to test the tests themselves is to write them in lock-step with the program under test. This is called the “red-green” cycle and it goes like this:

  1. Write a failing test
  2. Write the code necessary to make the test pass
  3. Repeat

In step one, if you check that the test fails in the way you expect, you know that it tests the code you wrote in step two. Then back away. So long as you don’t change the test, you can be reasonably sure it’s testing for the intended problem.

The benefit of a specification, by contrast, is that you write it before you write the program. It clarifies the problem and the goal and forestalls misguided coding.

Specs also will be incomplete and sometimes plain wrong, because writing a flawless description of program is as hard as writing a flawless program. So, to remain relevant, specs must change as programs change. If your specs are also your tests, not only will they be incomplete and wrong at the beginning, errors introduced by the changes will make them wrong in different ways at the end.

I still hope for executable specifications though I think “property-based” testing is a more promising avenue than “behavior-driven.” However they come, executable specifications will arrive – at latest, computers will learn to understand English better than humans – and even then, specifications cannot replace tests.

Do you need a Model?

It’s common for graphical programs to work something like this:

  1. Pull data from persistent storage: database, web server and so on
  2. Tuck that data away in a “Model” object
  3. Display the data
  4. Receive updates from either the user interface or the persistent storage
  5. Hope that the screen or the storage also updates, respectively
  6. Goto 4

The role of “hope” in this scenario is played by something often called “Data Bindings.” The problem with this design is that it’s an obviously bad idea: by keeping a Model like this, you’ve created a cache and cache invalidation is known to be a hard problem.

I’ll assume you’re writing a webapp, because, who isn’t? I claim you don’t need any Model client-side at all. If you have a long memory, you might say that I’m just being ornery and yearning for a time when we didn’t need a megabyte of JavaScript to display a form with one field:

Google's homepageYou’ll say you need these Models and Bindings because modern.

You’d be partly right. This does make me ornery, but let’s step back and look at what might be the number one killer of program maintainability: mutable state. A graphical interface is nothing if not a big pile of mutable state and the job of its programmers is to wrangle the ways it mutates.

In a sense, every useful program needs to grapple with this problem. Programs that lack persistent data or ways to display it are pretty useless, in the way that a tree might fall but nobody knows whether it makes noise and most people don’t care.

I advocate that you scrap the client-side Model, but I don’t claim this will make your webapp easy to write. No matter what, you’re dealing with a pile of mutable state and that’s bound to be tricky.

Why make it trickier than necessary? You really do need widgets on the screen: checkboxes that can be checked, or not, spans that could say one thing, or another. Do you need a shadow copy of those things as well?

Now, not all state duplication is called a Model and not everything called a Model is state duplication. Elm, for example, takes the view that Model is the application state but avoids state duplication by virtue of being purely functional.

Most programs aren’t written in purely functional languages, so the Model is state duplication, but users don’t see Models. They see widgets and to them, the state of the widgets is the state of the data. In a programmer’s ivory tower, you might argue that the Model is the source of truth, but your arguments matter not at all to actual people.

So don’t fight it. The path to enlightenment lies in realizing that once you’ve rendered the data, it doesn’t matter anymore. Throw it away.

Wait, that might work for your Web 2.0 site and its DHTML, but I have a Web APP. It’s modern. It has a span over here and a checkbox over there and the span says ‘frobbing’ or ‘unfrobbing’ depending on whether the checkbox is checked.

Ok, but I’m not seeing how Models and magic Bindings are simpler than this:

checkbox.onchange =
  span.textContent = checkbox.checked ? 'frobbing' : 'unfrobbing'

Well, it’s not just this one span, this blink tag over here needs to appear if you’re not frobbing. With your plan, I need to add another line of code to handle that.

True, but you’d have to add something when backing it with Models and Bindings as well. It wouldn’t be less code, just different code and more indirection.

There are more ways frobbingness can change though. I don’t want to copy this logic in every place.

And well you should not, but you already have the tool to solve that problem: a function. Move the interface update logic to a function and call it anytime frobbingness changes.

I see what you’re saying, but I don’t like it. The span and the blink are in logically different components. I’ll end up with a bunch of functions that update both, but don’t sensibly belong to either.

That’s a very good point.

Partly, the appeal of the client-side Model is that it seems we ought to be able to bind all the widgets to the same object, giving us a kind of fan-in-fan-out approach to updating them. In principle, you’d have just one Frobber and you’d change its isFrobbing property, then all the dependent widgets would get notified. People have been trying to implement this concept since the invention of the computer and it’s gone by many names. Recently, the buzzword is “reactive” programming.

This idea can work well in some systems. Spreadsheets and style sheets are examples of very successful reactive programming models, for example. It works well in limited domains with declarative or purely functional languages.

In non-functional practice, the same data structures aren’t convenient for all widgets. Imagine that Frobbers are parts of Whatzits, but in one place you display all your Frobbers and in another you display the content of a select Whatzit. Your Models probably look something like this:

frobbers:
  - id: 1
    isFrobbing: true
  - id: 2
    isFrobbing: false

selectedWhatzit:
  frobbers:
    - id: 1
      isFrobbing: true

Now which copy of Frobber number 1 is the true Frobber? Ideally, you’ll make them point to the same object so it doesn’t matter. Your Binding magic may or may not be able to make sense of that arrangement and the temptation to duplicate data is strong. Models and Bindings just moved the concrete problem of what appears in widgets to the abstract problem of what goes where in the invisible Model.

With or without magic Models, as the application grows, you’ll need to put things in sensible places and call them in sensible ways. That is the whole job of user interface programming.

If you keep doing this for all the data and all the widgets in your app, you’ll notice you keep doing the same kind of tedious things over and over. This thing over here updates some data. Need to show it over there. Where’s the right place to put that code? Here, there, somewhere else? Decide, repeat.

Naturally, as programmers, we think we can automate away the repetition, but some complexity is fundamental.