Stop Twiddling My Bits

Googling for how to compute checksums with Java might return insanity. Quick. What does this function do?

// Java
static String twiddleDee(byte[] data) {
  StringBuffer buf = new StringBuffer();
  for (int i = 0; i < data.length; i++) {
    int halfbyte = (data[i] >>> 4) & 0x0F;
    int two_halfs = 0;
    do {
      if ((0 <= halfbyte) && (halfbyte <= 9))
        buf.append((char) ('0' + halfbyte));
        buf.append((char) ('a' + (halfbyte - 10)));
      halfbyte = data[i] & 0x0F;
    } while (two_halfs++ < 1);
  return buf.toString();

Compute a SHA-1 and output a hex string. This is my first public service code donation:

// Java
public static String sha1(byte[] itsAllBitsAfterAll) {
  MessageDigest digester = newSha1Digester();
  return bytesAsHex(digester.digest());

// This might make a good future post about senseless
// factories
private static MessageDigest newSha1Digester() {
  try {
    return MessageDigest.getInstance("SHA-1");
  } catch (NoSuchAlgorithmException e) {
    throw new RuntimeException(
        "How many times must exceptions be thrown?", e);

static String bytesAsHex(byte[] bytes) {
  Formatter result = new Formatter();
  for (byte nextByte : bytes) {
    result.format("%02x", nextByte);
  return result.toString();

Like the countless optical illusions where the lines turn out to really be straight or the colors are actually the same, the first snippet matches the “bytesAsHex” method. The first is twenty times faster, but the second is twenty times clearer.

Always write the second. If you really need to squeeze those few extra milliseconds out of your code, use a library. If you think you can improve the library, use something open source, write it better, benchmark, and contribute.

Update, seven years later: In the intervening time, I’ve become somewhat more comfortable with bitwise operations and much more wary of dependencies. Today, I would not include a library only to optimize a little bit of string formatting.

Nothing Exceeds Like Exceptions

As a bright-eyed computer science student, I fell in love with exceptions. Of course, growing up in C++ land, no one explained the idea of an exception type hierarchy, so when I threw them, they were usually ints. But even at that naive time, something in me recognized the value of losing all those “if errorlevel” statements. Something else recognized the tingle of a problem unanswered.

After uprooting my ratty recliner and other worldly belongings for a move to Java land, I discovered the magic of checked exceptions. Wow. The compiler tells me when I should handle an error? Brilliant. Fantastic. That must have been the itch; I could never really specify what I might throw. Then I started writing Java.

// Java
try {
    InputStream captain = new FileInputStream("kirk.txt");
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block

Well, that was obnoxious.

# Python
science = open('spock.txt')

Alright, Python is arguably a scripting language, ill-equipped to handle the rigors of large pieces of software; that that is not the point. Here, Python and Java do almost the same thing, but Python does it with only one line. The Java snippet silently suppresses an error, if you happen to not be constantly watching your output stream. So do your todo and re-throw FileNotFound as a RuntimeException. Make your code at least as fail-fast as if it were Python.

The point is also not that Python beats Java. Although strong typing may be fascism, the real question is whether the checked exception added any value.

What could I do with that exception? If I were writing a “good-enough” utility, I would want to crash swiftly and furiously. If I were writing end-user worthy code, I would have checked for the file’s existence before trying to open it. In that case, the odds of someone removing the file between my existence check and open attempt would be low enough that the method could reasonably return null, simplify my code, and disintegrate later when I actually tried to read something. My file reading code would presumably be at least as careful as my file opening code, so it could own the error handling for that situation.

So maybe the itch was actually the feeling that excellent code is almost entirely error handling. “Have your functions return error values when things go wrong, and deal with these explicitly, no matter how verbose it might be,” said Joel Spolsky, about the time I was entering my first class on object orientation. His points are true, but he proposes no solution to the need for a heart-wrenchingly ugly fail-fast mechanism in a language. For language designers, I propose this half-baked idea:

Make exceptions uncatchable.

First, this implies that you no longer need the exception class hierarchy because any throw becomes an exit with an error code. So how is this better than System.exit(1)? It gives you a stack trace. Replace all throws with asserts; asserts that really work, that is.

Second, libraries would have to use assert carefully, but when they did, they could use it as a real teaching experience. You might think the file opening method that crashed your entire process ludicrous, but you might learn to check for the file’s existence before opening it. The generated empty catch block has probably never taught anyone this lesson.

On the down side, you now have no way to record these catastrophic errors. Even if the virtual machine printed the stack trace to standard error with a dying breath, you might have forwarded that to /dev/null. Allowing a shutdown hook to intercept the error might help, but you know that would tempt you to do too much with a program in an unknown state.

Oh well. It’s still a good idea because I just thought of it.

Clocks in SOX

Most people never read Sarbanes Oxley, section 404, but plenty use it as an excuse for convoluted processes, mostly involving peculiar Chinese walls.

Something like “common sense,” at some companies, for example, says that those who deploy home-grown software must be different people from those who write it. The developers get annoyed because they have to explain to some moron in “change management” obvious things like that the Spring property configurator just needs the new name for the file where you put the password which is clearly different for the key store you need to get from the security people who must know which host you want it to deploy on. Programmers can spend endless days complaining about how those idiots could not figure this all out, because they definitely sent emails explaining that you need to configure acegi-context.xml.

In that respect, at least, the “segregation of duties” becomes helpful by encouraging developers to simplify the configuration their applications require.

It does not, however, encourage any extra rigor with regards to application quality. In fact, programmers become even more reluctant to fix their mistakes because they had such a difficult time getting it deployed last time.

Just to make a dysfunctional system that little bit funnier, some people have created a process involving Rational ClearQuest. In this process, the developer creates a “deployment ticket.” The ticket specifies a human being who should perform the deployment, a time window, instructions, and some other information no one reads. A potential deployer then receives the ticket and prioritizes it among the other incoming tickets. In the fashion of true technological progress, the two parties never need to communicate except through the ticket.

This is, of course, a recipe for inaccurate execution, if not total disaster. The deployer has no control over what time the scheduler requests the ticket be executed. The scheduler has no access to the deployer’s calendar or any idea of what other schedulers might simultaneously schedule that same deployer for. The schedulers know the deployment only takes a few minutes, but they build in a half day window for potential backlogs in the deployer’s queue. The deployer sees an entire half day for completion, so feels no particular urgency.

Meanwhile, the testers wait for deployment to complete, and you have transmogrified five man-minutes of work into four to six wasted hours for several people.

Why Is Worse Than How

When we use software, we operate in a binary tree. At the root, we could be in success or failure mode, where we are either doing whatever we meant to do, or distracted by some idiosyncrasy of our tool. In success mode, our productivity flows. We could be CTRL+Spacing to auto-complete or ESC, colon, w, q-ing to save and exit, and doing what we meant to do.

In failure mode, we have another couple of options:

  1. Why does this suck?
  2. Where is the option to make a zero-based list in WordPress?

I want a zero-based list because it appeals to my binary metaphor. The first item, zero, equals bad software. It is the tool that you wish you could will into nonexistence. You imagine the deepest levels of hell holding those who inflicted this impediment on you.

While you use them, you usually do not think about good tools. Your drill lets you swap bits without thinking about the drill. You only wonder whether this bit will work for that material and do you want 5/16 or 1/4?

Sometimes your tool cannot meet the challenge. You want to bore a new dryer vent through your cinder block or put drywall around the entirely bare laundry room and your minuscule battery-powered appliance will fight valiantly and fail. This is where you leave the text editor and open the IDE, but it is not a failure mode; you have simply exceeded your tool’s capability, and you need to go get that spiffy hammer drill.

You encounter tool failure mode when you do not immediately know how to accomplish your immediate task with the tool you are using, and you do not immediately have another tool that does it. You now find yourself in one of two camps, but you have no idea  which:

  1. Your tool cannot do it
  2. You do not know that ESC, y, y copies a line

So you have to feel your way through, based on your prior experience with that package. In the good programs, you, like the Little Engine, believe that you can.