Copying antifeatures: quote style

The C language gave different meanings to single quotes and double quotes:

'c' // a byte
"c" // array of bytes

Decades later, a difference persisted but, in most languages, changed its character. Single and double quotes now often change how string interpolation works:

"#{foo}" # interpolate the string value of variable foo
'#{foo}' # literal, no interpolation

As far as I know, this comes from Bourne shell. In the weird world of shell scripts, it actually turns out to be useful, but in the more structured world of most programming languages, it has approximately zero real-world use cases.

Suppose that you do literally want to write “#{” in a string:

'#{foo}' # intentional, but looks like a mistake

Python got this right. It used to be right in JavaScript.

Single quotes should be interchangeable with double quotes. Interchangeability allows you to use the easier-to-type single quotes most of the time, but switch to double quotes for easy single-quote escaping.

Copying antifeatures: multiline strings

Multiline strings are a Good Thing but they usually end up implemented just a little bit wrong.

Delimiters

First, they require special delimiters. Most grammars could easily allow ordinary strings to break across lines, and many languages could even add this in a backward-compatible way. A string that includes line breaks is no harder to parse than a single-line string:

foo = "
Hello
world
"

To the great annoyance of programmers everywhere, languages generally require special delimiters for multiline strings:

foo = """
Hello
world
"""

Thus, when we find ourselves typing a string literal and the line gets uncomfortably long, the language makes us go back to the beginning and change delimiters. Vice versa when shortening strings.

Indentation

Most multiline strings end up being indented. Ruby gets multiline string quote style right, but fails on indentation:

def do_stuff()
   foo = "
     hello
     world
     "
   return foo
end

puts do_stuff() # "\n    hello\n    world\n    "

What are those leading spaces doing? I can’t think of a time when I’ve ever wanted a string literal to maintain its source indentation at runtime.

Almost always, multiline strings fall into two categories:

Indentation is irrelevant Needs trim and dedent
sql = "
  select *
  from foo
  where bar
    and baz "
help = "
   usage: frob [-nicate]
     frob the widget
   
   -n turn on the n
"

So there are many use cases where trimming and dedenting is warranted, but almost none where it hurts.

CoffeeScript almost got both aspects right, but it makes a distinction between “multiline strings”, that use ordinary quotes and “block strings,” which use triple-quotes. Ordinary strings collapse line breaks to single spaces, triple-quoted strings trim and dedent correctly.

Next time: the perils of such subtle distinction in quote style.

Painless Android releases revisited

Previously, I described a Gradle script that handily generates release version codes for Android apps. The generated version codes take the form [date][number].

I finished that article with a litany of Gradle bugs. Today: fresh Google bugs!

In May, Google added automatic crash reporting to the Google Play developer console. Before auto-reporting, users had to explicitly send reports when apps crashed. So far so good, but if you’re testing on a physical device, you might notice something alarming: reports of bugs you already fixed, or crashes you only saw in development.

Apparently, Google forgot to filter out reports from debug-mode applications. Perhaps Google would claim this is a feature, but it means that you can’t tell which crashes are actually happening in the wild.

Google says crash reporting is “opt-in.” This is meant ironically, since the option to turn it off doesn’t actually exist on, for example, the Samsung S8. (There is a different option, “report diagnostic information.” As far I can tell, it’s a placebo.)

To work around this, we need to make crash reports from the debug version look somehow different from the production version. Crash reports include the version code, so remember that suffix? We can use that. Instead of using one number per release, use two: one for the release, one for the next development version:

// Version code updates when released to a date-based format. Even-numbered version codes are
// release builds, odd-numbered version codes are debug builds. MAX five releases per day.
def releaseVersionCode = null
def writeVersionCode(versionCode) {
    def releaser = project.plugins[net.researchgate.release.ReleasePlugin]
    def propsFile = releaser.findPropertiesFile()
    def props = new Properties()
    propsFile.withInputStream { props.load(it) }
    props.versionCode = versionCode.toString()
    propsFile.withOutputStream { props.store(it, null) }
}

task nextDebugVersionCode { doLast {
    // Even though this runs after the release build, project.versionCode is still the version
    // code *before* release. The Release plugin runs the release build in a separate Gradle
    // invocation, so the release package picks up version changes in gradle.properties. When
    // control returns here though, it's the original Gradle invocation, and has *not* reloaded
    // gradle.properties.
    writeVersionCode(releaseVersionCode + 1)
}}
updateVersion.dependsOn nextDebugVersionCode

task setReleaseVersionCode { doLast {
    def current = project.versionCode.toInteger()
    releaseVersionCode = new Date().format('YYMMdd0', TimeZone.getTimeZone('UTC')).toInteger()
    if (releaseVersionCode <= current) {
        // Should only happen when there is more than one release in a day
        releaseVersionCode = current + 1
    }
    writeVersionCode(releaseVersionCode)
}}
unSnapshotVersion.dependsOn setReleaseVersionCode

So, now the first release of the day gets suffix zero, the debug version that follows gets suffix one, and so on. I’m writing this on July 26, so if I cut two releases today, my version codes will be:

  • 1707260, production
  • 1707261, debug
  • 1707262, production
  • 1707263, debug

It’s subtle, but at least now we can tell which crashes actually happened to people using your app: they are even numbers.

Or are they?

It appears that Google stores the crash data on the phone and reports it only once per day. The version code it reports is the version running on the phone when it sends the report, not when the crash actually happened.

If the app updates in the interim, we can still get crash reports for bugs already fixed and they will seem to come from a version that includes the fix.

I don’t know of any workaround.

Why (not) pdf?

Text Collector lets you print text messages by converting them to pdf. What we call “text messages,” of course, includes messages with both text and pictures. Sometimes they include other types of attachments, like dirty .gif files, but that’s another article. For now, I’m just discussing images and text.

On the surface, pdf seems ideal: it’s universally viewable, supports pagination, and, unlike images, includes text in a searchable way. But I don’t really like pdf as an ediscovery interchange format.

Why? Pdf is too complicated. As file formats go, it’s far from the worst monster out there, but it’s also far from simple. As a consequence, the many different programs that generate pdf often get it slightly wrong; they’re not necessarily bad programs: they’re just dealing with a complicated problem and make mistakes.

Pdf viewers grapple with the resultant problems and show you something that looks correct, so everything seems fine at first.

When you want to edit pdf, however, things quickly go wrong. For a typical ediscovery operation like stamping Bates numbers on your pdf files, the small errors compound and you have a significant chance that the result will be illegibly damaged.

Assume some set of pdf files P , and an operation b that you want to do on them to produce an output set, O .

b:P \rightarrow O

You can visually check some number v for correctness.

When the size of O is larger than v there will be some subset E, larger than zero, that is terribly broken.

E \subset O \land |O| > v \Rightarrow |E| > 0

Ulfers’ Law of Batch Pdf Editing

Second, since it is complex, pdf allows all manner of invisible content. This makes redacting pdf hazardous and if you have a highly-developed sense of self-doubt, it’s hard to shake the feeling you’ve done something wrong that allows the redaction to be removed.

So, why does Text Collector convert messages to pdf and not something else?

There are no suitable alternatives. Html is universally viewable but has no notion of pagination and very limited image embedding. Microsoft Word format is too easy to edit, and comparable in complexity to pdf. Mhtml never got universal support and it lacks pagination anyway. Tiffs and text are too large and useless to average people. How about svg?

So pdf it is, for better or worse.

Keyboard Savior Xtreme released

Tired of websites trapping your shortcut keys for their own ends? Me too. Usually, the problem is that they take over the Firefox slash-to-search shortcut. But what can we do about it?

First, we could tolerate it and just use ctrl+f instead. On some slash-abusing sites, like Bitbucket, this isn’t a terrible option: they rarely include long pages that require quick jumps. In api documentation, however, it’s nightmarish.

Second, we could try to stamp out the evil by filing bugs and fighting for justice. There is some hope for this: Github, for example, used to abuse the slash key and no longer does. In general, however, it devolves into Whac-A-Mole. Django rightly rejected slash abuse in in 2008, only to have it sneak into the Django docs in 2015.

Finally, we could just fix it. This Greasemonkey script lets you list known abusers and prevent them from seeing slash keystrokes. After some time, however, I realized that my Greasemonkey approach did not go far enough: it only prevents abuse of one keystroke and only on selected sites.

In fact, there are only a handful of sensible reasons for any website to capture a keystroke, ever. Why not just stop them all?

So, I welcome Keyboard Savior Xtreme. Take back all your keystrokes.

Comefrom0x10 released

After a long hiatus while I built Text Collector, last week I finally returned to my paradigm shifting language, Comefrom0x10. It now has a home page on Read the Docs that features a tutorial, standard library documentation and more.

Except for a couple minor bugs, its implementation was actually functionally complete eight months ago. I hesitated to release it, however, because of rather embarrassing performance problems.

Now, it wouldn’t be fair to say that Cf0x10 is just slow. It’s catastrophically slow. The brainfuck.cf0x10 program takes 10 seconds to run helloworld.b on the laptop I’m using to write this, and gets dramatically worse as the program gets longer.

What went wrong?

It’s not a fundamental problem with the comefrom paradigm, but a consequence of the twisted way the language took on a life of its own during implementation. I started with the idea that I was building a rather ordinary stack-based interpreter, but Cf0x10 would have none of it. As it evolved, the original idea became a disfigured mutant: I can demonstrate with tests that it works, but it’s too convoluted to allow necessary optimizations.

Oh well, as they say, first make it right, then make it fast release it.

Print text messages: video edition

Today I published my first YouTube video, How to print text messages on Android:

I already knew the obvious choices of software to use for some elements: Audacity to record and edit narration, Pixly to draw the hand pointer animation. I’m a novice at making videos, however, so I spent a good deal of time figuring out what program I should use to edit the video.

First, I tried Kdenlive, and managed to put together the entire video how I wanted it, only to run into a fatal error: I couldn’t export successfully.

Eagle-eyed viewers may notice that the part where I demonstrate a purchase doesn’t use an actual currency. There are actually several layers of compositing in this shot:

Purchase screen with generic currency symbol

When Kdenlive attempted to render this, it just produced glitches: it flashed images like the sad face emoji, from completely unrelated parts of the video. No settings tweak I found solved the problem.

So, on to OpenShot, whose interface feels largely comparable to Kdenlive. I soon discovered, however, that it lacked some basic effects that I needed, particularly freeze frame. Apparently version 2 lost a number of effects that were present in version 1.

Finally, I moved to Blender. I guess, deep down, I always suspected it would come to this.

Blender is a ridiculously capable program, especially when you consider how lightweight it is. It manages to include 3D modeling, rigging, rendering, animation, compositing, video editing, a game engine and more in a download between 80 to 150 megabytes, depending on your operating system. Compare to, say Maya, which can take days to download.

How Blender accomplishes this is surely black magic, but it’s not for fear of the dark side that I avoided it till last: it’s the user interface. To the uninitiated, Blender feels like learning how to use a computer for the first time.

Want to select something in the timeline, or, in Blender-speak, the “video sequence editor?” It’s right-click, not left-click. Want to move it? You can click and drag, but it won’t release when you let up on the mouse button. You need to click again to release. Scroll wheel zooms. Ctrl+scroll wheel scrolls. And so on.

In other words, Blender’s interface is comparable to Dwarf Fortress.

Nonetheless, it only took me about day and a half to re-cut my video in Blender. On the bright side, if I ever need to add a 3D Text Collector mascot and some explosions, I’ll already be in the right program.

How Unicode can save math: part 2

It’s widely known that decimal – or “numbers” to most of us – is an inferior system. Decimal doesn’t work well for computers, which prefer base two and it doesn’t work well for humans either, at least not when compared to dozenal.

Dozenal is also called “duodecimal,” or “base 10” (when writing in dozenal) and it is a much more natural system for humans than decimal. The usual example of why is a clock. Look how neat it is with the number 10 right at the top:

Dozenal clock face
From the Dozenal Society of Great Britain

Dozenal has a big problem though, as we can see from the clock. What number does 10 represent, when you see it out of context? You just don’t know.

For decades, we’ve solved this problem in computer programming with funny prefixes. To a programmer, dozenal and decimal might be “base 0xA” and “base 0xC”. Likewise, in a dozenal world, we might write “hexadecimal” as “base 0z14” or something. If we need to start writing all our numbers with warts to indicate the base, however, dozenal seems doomed.

But wait, there’s hope. Unicode already contains the digits for “dek” and “el.” (That’s ten and eleven, if you’re not a cool dozenal kid.) If your browser doesn’t have a suitable font, refer to the clock above. If it does, they look like this:

↊ ↋

Now all we need is nine more Unicode symbols for the rest of the digits. Zero is special: for zero, there need be only one.

Text Collector version one released

Today, Legal Text Collector graduated from beta to production, version 1.0. It’s been a long time coming: I started working on it seven months ago. I made my first notes on the idea about five years ago.

Exactly thirty years ago today, Reagan said, “open this gate. Mr. Gorbachev, tear down this wall,” and in a small way, I think I can sympathize with Gorbachev. The main change from beta to production is that now people can start posting reviews: I’m opening the gates of public criticism.

Google allows public reviews only after a production release. In the safety of beta, Google provides a private feedback option, which nobody used. Many people did, however, send me questions and bug reports via email.

It’s hard to overstate the the value of that feedback. My friend Trish, my aunt Meg and my erstwhile colleague Jon of Sandline were especially helpful in testing the early alpha versions; strangers who stumbled upon later public betas kindly sent me information to resolve some of the final bugs. Through 14 alphas and 16 betas, Text Collector grew from 3.5 thousand lines to 5.5 thousand, a testament to how long the long tail can be.

So it is a dramatic moment. Much remains to do, but at some point I must tear down the wall to public criticism. Private feedback has been overwhelmingly positive, so this doesn’t worry me too much, but only time will tell.