Tests versus specs

It’s been popular for some years now to say that tests are “executable specifications.” I think this is a wrong way to to think about programming and leads to buggier programs than the traditional view that tests are tests.

Saying that tests are specs implies that you don’t need separate specifications. If this isn’t what you mean when you call a “test” a “spec,” then my arguments mainly won’t apply, but remember: regardless of what you mean, people will hear, “replace your specifications.”

Programs are three-legged stools that stand on the trio of specs, tests and program code. Stable programs require that each leg receives equal care. When done well, a useful tension between specs, tests and program code improves quality.

The form and labels can vary: specifications can be formal requirements or notes in a bug tracker; tests can be automated or manual. Whatever the form, every program has these three parts.

“Tests are specs” refers to a specific form: automated tests written in a style called “behavior driven.” Behavior-driven means that tests look like this:

describe Frobber... it 'frobs'...

Instead of

TestFrobber... test_frobs()...

From this style, we can infer the reasoning for replacing specs with tests:

Axiom: specs describe what a program should do

Axiom: tests verify that the program does what the specs say

Assume that you can write tests in a style where they describe what programs should do. Or, equivalently, assume you can write specifications as executable code that can verify compliance.

When so written, specs and a tests serve the same purpose. Thus, by the principle that you should eliminate redundancy, they should be the same thing.

The first problem with this argument is that there’s no real basis for believing that you can write specifications in executable language as clearly as if you wrote them in natural language. Things like using describe... it... and expect(x).toBe(y) (instead of assert x == y) superficially make program code look English-like (if you squint) but it’s not at all obvious that they make things more clear. If this style really is more clear, why are tests special? Why not write all code in quasi-English?

By a dubious appeal to Whorfianism – the idea that words we use affect what we do – behavior-driven style supposedly encourages people to write tests more declaratively because you “describe” what the program should do. It is true that programs are nearly always more understandable when written declaratively than imperatively, but again, there is no reason this should be specific to tests; all program code should be written as clearly and declaratively as possible.

Even if we assume that somehow we can write executable language as clearly as natural language, it doesn’t follow that we need only one or the other. We justify combining test and specs because “redundancy is bad.” By that logic, however, we don’t need test-specs to be separate from program code either. If the specification is executable, it doesn’t need to be the test because it could just as easily be the program. And then there was one.

In a sense, you truly can combine tests, specs and program. A program with unwritten program code is nothing but an idea. A program with unwritten tests and specs, is still a program. It’s just not a very good program and we know why.

[When programming] one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn’t work.

– The Mythical Man Month

Automated tests are programs too and just as capricious. A thorough test suite must often be as large as the program it tests and, therefore, will have as many errors.

Moreover, programs are notoriously hard to change without breaking, which brings us to the second major flaw in the logic behind combining tests with specs. It’s true that tests exist to verify conformance with specification, but that is not the only reason; tests also verify that changes don’t break things.

Remembering that tests are programs too, and just as hard to change (safely) as any program, it should be clear that the only way to avoid breaking tests is to avoid changing them. Furthermore, the only way to test the tests themselves is the write them in lock-step with the program under test. This is called the “red-green” cycle and it goes like this:

  1. Write a failing test
  2. Write the code necessary to make the test pass
  3. Repeat

In step one, if you check that the test fails in the way you expect, you know that it tests the code you wrote in step two. Then back away. So long as you don’t change the test, you can be reasonably sure it’s testing for the intended problem.

The benefit of a specification, by contrast, is that you write it before you write the program. It clarifies the problem and the goal and forestalls misguided coding.

Specs also will be incomplete and sometimes plain wrong, because writing a flawless description of program is as hard as writing a flawless program. So, to remain relevant, specs must change as programs change. If your specs are also your tests, not only will they be incomplete and wrong at the beginning, errors introduced by the changes will make them wrong in different ways at the end.

I still hope for executable specifications though I think “property-based” testing is a more promising avenue than “behavior-driven.” However they come, executable specifications will arrive – at latest, computers will learn to understand English better than humans – and even then, specifications cannot replace tests.

Do you need a Model?

It’s common for graphical programs to work something like this:

  1. Pull data from persistent storage: database, web server and so on
  2. Tuck that data away in a “Model” object
  3. Display the data
  4. Receive updates from either the user interface or the persistent storage
  5. Hope that the screen or the storage also updates, respectively
  6. Goto 4

The role of “hope” in this scenario is played by something often called “Data Bindings.” The problem with this design is that it’s an obviously bad idea: by keeping a Model like this, you’ve created a cache and cache invalidation is known to be a hard problem.

I’ll assume you’re writing a webapp, because, who isn’t? I claim you don’t need any Model client-side at all. If you have a long memory, you might say that I’m just being ornery and yearning for a time when we didn’t need a megabyte of JavaScript to display a form with one field:

Google's homepageYou’ll say you need these Models and Bindings because modern.

You’d be partly right. This does make me ornery, but let’s step back and look at what might be the number one killer of program maintainability: mutable state. A graphical interface is nothing if not a big pile of mutable state and the job of its programmers is to wrangle the ways it mutates.

In a sense, every useful program needs to grapple with this problem. Programs that lack persistent data or ways to display it are pretty useless, in the way that a tree might fall but nobody knows whether it makes noise and most people don’t care.

I advocate that you scrap the client-side Model, but I don’t claim this will make your webapp easy to write. No matter what, you’re dealing with a pile of mutable state and that’s bound to be tricky.

Why make it trickier than necessary? You really do need widgets on the screen: checkboxes that can be checked, or not, spans that could say one thing, or another. Do you need a shadow copy of those things as well?

Now, not all state duplication is called a Model and not everything called a Model is state duplication. Elm, for example, takes the view that Model is the application state but avoids state duplication by virtue of being purely functional.

Most programs aren’t written in purely functional languages, so the Model is state duplication, but users don’t see Models. They see widgets and to them, the state of the widgets is the state of the data. In a programmer’s ivory tower, you might argue that the Model is the source of truth, but your arguments matter not at all to actual people.

So don’t fight it. The path to enlightenment lies in realizing that once you’ve rendered the data, it doesn’t matter anymore. Throw it away.

Wait, that might work for your Web 2.0 site and its DHTML, but I have a Web APP. It’s modern. It has a span over here and a checkbox over there and the span says ‘frobbing’ or ‘unfrobbing’ depending on whether the checkbox is checked.

Ok, but I’m not seeing how Models and magic Bindings are simpler than this:

checkbox.onchange =
  span.textContent = checkbox.checked ? 'frobbing' : 'unfrobbing'

Well, it’s not just this one span, this blink tag over here needs to appear if you’re not frobbing. With your plan, I need to add another line of code to handle that.

True, but you’d have to add something when backing it with Models and Bindings as well. It wouldn’t be less code, just different code and more indirection.

There are more ways frobbingness can change though. I don’t want to copy this logic in every place.

And well you should not, but you already have the tool to solve that problem: a function. Move the interface update logic to a function and call it anytime frobbingness changes.

I see what you’re saying, but I don’t like it. The span and the blink are in logically different components. I’ll end up with a bunch of functions that update both, but don’t sensibly belong to either.

That’s a very good point.

Partly, the appeal of the client-side Model is that it seems we ought to be able to bind all the widgets to the same object, giving us a kind of fan-in-fan-out approach to updating them. In principle, you’d have just one Frobber and you’d change its isFrobbing property, then all the dependent widgets would get notified. People have been trying to implement this concept since the invention of the computer and it’s gone by many names. Recently, the buzzword is “reactive” programming.

This idea can work well in some systems. Spreadsheets and style sheets are examples of very successful reactive programming models, for example. It works well in limited domains with declarative or purely functional languages.

In non-functional practice, the same data structures aren’t convenient for all widgets. Imagine that Frobbers are parts of Whatzits, but in one place you display all your Frobbers and in another you display the content of a select Whatzit. Your Models probably look something like this:

frobbers:
  - id: 1
    isFrobbing: true
  - id: 2
    isFrobbing: false

selectedWhatzit:
  frobbers:
    - id: 1
      isFrobbing: true

Now which copy of Frobber number 1 is the true Frobber? Ideally, you’ll make them point to the same object so it doesn’t matter. Your Binding magic may or may not be able to make sense of that arrangement and the temptation to duplicate data is strong. Models and Bindings just moved the concrete problem of what appears in widgets to the abstract problem of what goes where in the invisible Model.

With or without magic Models, as the application grows, you’ll need to put things in sensible places and call them in sensible ways. That is the whole job of user interface programming.

If you keep doing this for all the data and all the widgets in your app, you’ll notice you keep doing the same kind of tedious things over and over. This thing over here updates some data. Need to show it over there. Where’s the right place to put that code? Here, there, somewhere else? Decide, repeat.

Naturally, as programmers, we think we can automate away the repetition, but some complexity is fundamental.

Schroot cheatsheet

I don’t always install software whose idea of installation instructions is curl ... | sudo, but when I do, I jail it. In this case, I’m setting up a chroot for Nodejs:

sudo apt install schroot deboostrap
sudo mkdir /srv/npm-chroot
sudo debootstrap stable /srv/npm-chroot
sudo mkdir -p /srv/npm-chroot/home/joe/projects
sudo chown -R joe:joe /srv/npm-chroot/home/joe/

In these examples, “joe” is my username.

The “s” in “schroot” stands for “securely,” but it might as well be “simple” because “schroot” handles fiddly bookkeeping tasks for setting up your environment, based on its config file.

Edit /etc/schroot/schroot.conf:

[npm]
description=npm projects
type=directory
directory=/srv/npm-chroot
root-users=joe
setup.fstab=joe-projects/fstab

Normally, schroot mounts /home from the host as /home in the chroot. I don’t want programs in jail to muck about with my home on the host though, so I edit the setup.fstab option. Its default lives in /etc/schroot/default/fstab.

For my purposes, the schroot’s default configuration is a good start, so:

sudo mkdir /etc/schroot/joe-projects
sudo cp /etc/schroot/default/fstab /etc/schroot/joe-projects/

Edit /etc/schroot/joe-projects/fstab, removing the /home line and adding instead:

/home/joe/projects /home/joe/projects none rw,bind 0 0

Finally, enter the chroot, as root.

schroot -c npm -u root

I like to install sudo so it feels like a normal Ubuntu:

# now in the schroot
apt update
apt install sudo
exit

Then log in as my normal user:

# in the host
schroot -c npm

From here I can install npm in relative isolation; this is not sufficient for isolating malicious software, but it’s a nice way to avoid inconsiderate programs from pooping all over your system.

Evolution

In the beginning, there was html


<form method=post action=signup>
  <label>Username       <input name=user></label>
  <label>Password       <input name=pw type=password></label>
  <label>Password again <input name=pwv type=password></label>
  <button>Submit</button>
</form>

Then came JavaScript

<form id=signup method=post action=signup>
  <label>Username       <input name=user></label><br>
  <label>Password       <input name=pw type=password></label><br>
  <label>Password again <input name=pwv type=password></label><br>
  <button>Submit</button>
</form>

<script>
<!--
document.forms.signup.onsubmit = function () {
  if (this.pw.value != this.pwv.value) {
    alert("Passwords don't match")
    return false
  }
}
// -->
</script>

And jQuery

<script src="https://code.jquery.com/jquery-1.1.4.js"></script>

<form id="signup" method="post" action="signup">
    <label>Username       <input name="user"></label><br/>
    <label>Password       <input name="pw" type="password"></label><br/>
    <label>Password again <input name="pwv" type="password"></label><br/>
    <button>Submit</button>
</form>

<script>
// <![CDATA[
$('#signup').submit(function () {
  if (this.pw.value != this.pwv.value) {
    alert("Passwords don't match")
    return false
  }
})
// ]]>
</script>

And json

<script src="https://code.jquery.com/jquery-1.5.js"></script>

<form id="signup">
    <label>Username       <input name="user"></label><br/>
    <label>Password       <input name="pw" type="password"></label><br/>
    <label>Password again <input name="pwv" type="password"></label><br/>
    <button>Submit</button>
</form>

<script>
/*jslint browser: true */
/*global $, alert, window */

$("#signup").submit(function (event) {
    "use strict";

    var form = $(event.target);
    var formData = {};

    form.find("input").each(function (ignore, el) {
        var input = $(el);
        var name = input.attr("name");
        formData[name] = input.val();
    });

    if (formData.pw !== formData.pwv) {
        alert("Passwords don't match");
    } else {
        $.ajax("signup", {
            type: "POST",
            contentType: "application/json",
            data: JSON.stringify(formData)
        }).done(function () {
            window.location = "signup";
        }).fail(function () {
            alert("Error");
        });
    }

    return false;
});
</script>

And Ecma 6

<form id="signup">
    <label>Username       <input name="user"></label><br>
    <label>Password       <input name="pw" type="password"></label><br>
    <label>Password again <input name="pwv" type="password"></label><br>
    <button>Submit</button>
</form>

<script>
/*jshint esversion: 6 */
document.querySelector("#signup").addEventListener("submit", submitEvent => {
    "use strict";
    
    const form = submitEvent.target;
    const formData = {};

    submitEvent.preventDefault();

    for (const input of form.querySelectorAll("input")) {
        var name = input.getAttribute("name");
        formData[name] = input.value;
    }

    if (formData.pw !== formData.pwv) {
        alert("Passwords don't match");
    } else {
        const xhr = new XMLHttpRequest();
        xhr.addEventListener("load", () => {
            if (xhr.status === 200) {
                window.location = "signup";
            } else {
                alert("Error");
            }
        });
        xhr.addEventListener('error', _ => {
            alert('Error');
        });
        xhr.open("POST", "signup");
        xhr.setRequestHeader("Content-Type", "application/json");
        xhr.send(JSON.stringify(formData));
    }
});
</script>

Progress?

Linting JavaScript considered harmful

I am in a minority of programmers so small that I might be the only member. I can’t find anyone on the Internet advocating my position: you should not lint your JavaScript.

I don’t mean you shouldn’t use this or that flavor of lint. I mean that you shouldn’t use any JavaScript linter. At all.

First, the arguments for linting…

Lint catches bugs?

How many bugs does it really catch? Only a few. Rules against unused variables can be useful when you’ve renamed something in one place and forgotten another place, for example. Trouble is, I’ve only noticed lint catching bugs in code that wasn’t complete or tested anyway, and therefore already broken by definition.

The major category of bugs caught by lint can be caught instead by a simple statement:

use strict

JavaScript strict mode really does find bugs, almost always results of accidentally overwriting globals. It’s tragic that we need lint to tell us about missing strict mode declarations when browsers could warn us.

But is it worth bringing in lint for the few bugs it can find?

I think not; the better approach is simple: learn to use strict reflexively. Then spend the effort you were going to use typing semicolons on testing instead.

Lint enforces consistency?

So what? Consistency for its own sake is a pursuit of feeble minds.

In writing, particularly writing computer programs, consistency is a proxy for something much more important: readability. And readability is not something that computers yet understand well.

It is perfectly possible, common, in fact, to write incomprehensible code that passes a linter.

But, you say, “lint rules can help a little, so we should use them. We just need to pick a ruleset.”

Lint reduces bikeshedding?

In the beginning, lint was an inflexible representation of one man’s own preferences. Next, everyone bought into the idea they should lint their JavaScript, then adopted Crockford Style and moved on argued endlessly about which rules were important.

We went from the ultra-rigid linter, to the ultra-configurable, to the ultra-pluggable. At each step, we introduced more and more time-consuming opportunities to argue about picayune issues.

To paraphrase Tim Harford:

Lint is such a temping distraction because it feels like work, but it isn’t. When you’re arguing about lint rules or fixing lint errors, you’re editing code, but you’re not getting things done.

So should I never lint?

All that said, I recommend lint for one purpose: it can be a useful way to learn about the idioms and pitfalls of a language. Run your code through a linter and learn why it complains.

This use of lint as a teaching tool is actively discouraged by the way most people advocate using it. Imagine you set up a hook that says “all code must pass lint before commit.” Those who actually could benefit from the lint suggestions are blocked by them, thus encouraged to “fix” the “problem” as quickly as possible: obey the tool and never learn why.

Copying antifeatures: quote style

The C language gave different meanings to single quotes and double quotes:

'c' // a byte
"c" // array of bytes

Decades later, a difference persisted but, in most languages, changed its character. Single and double quotes now often change how string interpolation works:

"#{foo}" # interpolate the string value of variable foo
'#{foo}' # literal, no interpolation

As far as I know, this comes from Bourne shell. In the weird world of shell scripts, it actually turns out to be useful, but in the more structured world of most programming languages, it has approximately zero real-world use cases.

Suppose that you do literally want to write “#{” in a string:

'#{foo}' # intentional, but looks like a mistake

Python got this right. It used to be right in JavaScript.

Single quotes should be interchangeable with double quotes. Interchangeability allows you to use the easier-to-type single quotes most of the time, but switch to double quotes for easy single-quote escaping.

Copying antifeatures: multiline strings

Multiline strings are a Good Thing but they usually end up implemented just a little bit wrong.

Delimiters

First, they require special delimiters. Most grammars could easily allow ordinary strings to break across lines, and many languages could even add this in a backward-compatible way. A string that includes line breaks is no harder to parse than a single-line string:

foo = "
Hello
world
"

To the great annoyance of programmers everywhere, languages generally require special delimiters for multiline strings:

foo = """
Hello
world
"""

Thus, when we find ourselves typing a string literal and the line gets uncomfortably long, the language makes us go back to the beginning and change delimiters. Vice versa when shortening strings.

Indentation

Most multiline strings end up being indented. Ruby gets multiline string quote style right, but fails on indentation:

def do_stuff()
   foo = "
     hello
     world
     "
   return foo
end

puts do_stuff() # "\n    hello\n    world\n    "

What are those leading spaces doing? I can’t think of a time when I’ve ever wanted a string literal to maintain its source indentation at runtime.

Almost always, multiline strings fall into two categories:

Indentation is irrelevant Needs trim and dedent
sql = "
  select *
  from foo
  where bar
    and baz "
help = "
   usage: frob [-nicate]
     frob the widget
   
   -n turn on the n
"

So there are many use cases where trimming and dedenting is warranted, but almost none where it hurts.

CoffeeScript almost got both aspects right, but it makes a distinction between “multiline strings”, that use ordinary quotes and “block strings,” which use triple-quotes. Ordinary strings collapse line breaks to single spaces, triple-quoted strings trim and dedent correctly.

Next time: the perils of such subtle distinction in quote style.

Painless Android releases revisited

Previously, I described a Gradle script that handily generates release version codes for Android apps. The generated version codes take the form [date][number].

I finished that article with a litany of Gradle bugs. Today: fresh Google bugs!

In May, Google added automatic crash reporting to the Google Play developer console. Before auto-reporting, users had to explicitly send reports when apps crashed. So far so good, but if you’re testing on a physical device, you might notice something alarming: reports of bugs you already fixed, or crashes you only saw in development.

Apparently, Google forgot to filter out reports from debug-mode applications. Perhaps Google would claim this is a feature, but it means that you can’t tell which crashes are actually happening in the wild.

Google says crash reporting is “opt-in.” This is meant ironically, since the option to turn it off doesn’t actually exist on, for example, the Samsung S8. (There is a different option, “report diagnostic information.” As far I can tell, it’s a placebo.)

To work around this, we need to make crash reports from the debug version look somehow different from the production version. Crash reports include the version code, so remember that suffix? We can use that. Instead of using one number per release, use two: one for the release, one for the next development version:

// Version code updates when released to a date-based format. Even-numbered version codes are
// release builds, odd-numbered version codes are debug builds. MAX five releases per day.
def releaseVersionCode = null
def writeVersionCode(versionCode) {
    def releaser = project.plugins[net.researchgate.release.ReleasePlugin]
    def propsFile = releaser.findPropertiesFile()
    def props = new Properties()
    propsFile.withInputStream { props.load(it) }
    props.versionCode = versionCode.toString()
    propsFile.withOutputStream { props.store(it, null) }
}

task nextDebugVersionCode { doLast {
    // Even though this runs after the release build, project.versionCode is still the version
    // code *before* release. The Release plugin runs the release build in a separate Gradle
    // invocation, so the release package picks up version changes in gradle.properties. When
    // control returns here though, it's the original Gradle invocation, and has *not* reloaded
    // gradle.properties.
    writeVersionCode(releaseVersionCode + 1)
}}
updateVersion.dependsOn nextDebugVersionCode

task setReleaseVersionCode { doLast {
    def current = project.versionCode.toInteger()
    releaseVersionCode = new Date().format('YYMMdd0', TimeZone.getTimeZone('UTC')).toInteger()
    if (releaseVersionCode &amp;amp;lt;= current) {
        // Should only happen when there is more than one release in a day
        releaseVersionCode = current + 1
    }
    writeVersionCode(releaseVersionCode)
}}
unSnapshotVersion.dependsOn setReleaseVersionCode

So, now the first release of the day gets suffix zero, the debug version that follows gets suffix one, and so on. I’m writing this on July 26, so if I cut two releases today, my version codes will be:

  • 1707260, production
  • 1707261, debug
  • 1707262, production
  • 1707263, debug

It’s subtle, but at least now we can tell which crashes actually happened to people using your app: they are even numbers.

Or are they?

It appears that Google stores the crash data on the phone and reports it only once per day. The version code it reports is the version running on the phone when it sends the report, not when the crash actually happened.

If the app updates in the interim, we can still get crash reports for bugs already fixed and they will seem to come from a version that includes the fix.

I don’t know of any workaround.

Why (not) pdf?

Text Collector lets you print text messages by converting them to pdf. What we call “text messages,” of course, includes messages with both text and pictures. Sometimes they include other types of attachments, like dirty .gif files, but that’s another article. For now, I’m just discussing images and text.

On the surface, pdf seems ideal: it’s universally viewable, supports pagination, and, unlike images, includes text in a searchable way. But I don’t really like pdf as an ediscovery interchange format.

Why? Pdf is too complicated. As file formats go, it’s far from the worst monster out there, but it’s also far from simple. As a consequence, the many different programs that generate pdf often get it slightly wrong; they’re not necessarily bad programs: they’re just dealing with a complicated problem and make mistakes.

Pdf viewers grapple with the resultant problems and show you something that looks correct, so everything seems fine at first.

When you want to edit pdf, however, things quickly go wrong. For a typical ediscovery operation like stamping Bates numbers on your pdf files, the small errors compound and you have a significant chance that the result will be illegibly damaged.

Assume some set of pdf files P , and an operation b that you want to do on them to produce an output set, O .

b:P \rightarrow O

You can visually check some number v for correctness.

When the size of O is larger than v there will be some subset E, larger than zero, that is terribly broken.

E \subset O \land |O| > v \Rightarrow |E| > 0

Ulfers’ Law of Batch Pdf Editing

Second, since it is complex, pdf allows all manner of invisible content. This makes redacting pdf hazardous and if you have a highly-developed sense of self-doubt, it’s hard to shake the feeling you’ve done something wrong that allows the redaction to be removed.

So, why does Text Collector convert messages to pdf and not something else?

There are no suitable alternatives. Html is universally viewable but has no notion of pagination and very limited image embedding. Microsoft Word format is too easy to edit, and comparable in complexity to pdf. Mhtml never got universal support and it lacks pagination anyway. Tiffs and text are too large and useless to average people. How about svg?

So pdf it is, for better or worse.