How Unicode can save math: part 2

It’s widely known that decimal – or “numbers” to most of us – is an inferior system. Decimal doesn’t work well for computers, which prefer base two and it doesn’t work well for humans either, at least not when compared to dozenal.

Dozenal is also called “duodecimal,” or “base 10” (when writing in dozenal) and it is a much more natural system for humans than decimal. The usual example of why is a clock. Look how neat it is with the number 10 right at the top:

Dozenal clock face
From the Dozenal Society of Great Britain

Dozenal has a big problem though, as we can see from the clock. What number does 10 represent, when you see it out of context? You just don’t know.

For decades, we’ve solved this problem in computer programming with funny prefixes. To a programmer, dozenal and decimal might be “base 0xA” and “base 0xC”. Likewise, in a dozenal world, we might write “hexadecimal” as “base 0z14” or something. If we need to start writing all our numbers with warts to indicate the base, however, dozenal seems doomed.

But wait, there’s hope. Unicode already contains the digits for “dek” and “el.” (That’s ten and eleven, if you’re not a cool dozenal kid.) If your browser doesn’t have a suitable font, refer to the clock above. If it does, they look like this:

↊ ↋

Now all we need is nine more Unicode symbols for the rest of the digits. Zero is special: for zero, there need be only one.

Text Collector version one released

Today, Legal Text Collector graduated from beta to production, version 1.0. It’s been a long time coming: I started working on it seven months ago. I made my first notes on the idea about five years ago.

Exactly thirty years ago today, Reagan said, “open this gate. Mr. Gorbachev, tear down this wall,” and in a small way, I think I can sympathize with Gorbachev. The main change from beta to production is that now people can start posting reviews: I’m opening the gates of public criticism.

Google allows public reviews only after a production release. In the safety of beta, Google provides a private feedback option, which nobody used. Many people did, however, send me questions and bug reports via email.

It’s hard to overstate the the value of that feedback. My friend Trish, my aunt Meg and my erstwhile colleague Jon of Sandline were especially helpful in testing the early alpha versions; strangers who stumbled upon later public betas kindly sent me information to resolve some of the final bugs. Through 14 alphas and 16 betas, Text Collector grew from 3.5 thousand lines to 5.5 thousand, a testament to how long the long tail can be.

So it is a dramatic moment. Much remains to do, but at some point I must tear down the wall to public criticism. Private feedback has been overwhelmingly positive, so this doesn’t worry me too much, but only time will tell.

It’s a good day to print text messages

Even though my elevator pitch for Legal Text Collector is, “It lets you print your text messages,” in all the time that I’ve been writing it, I never actually printed any messages.

Text Collector doesn’t (yet) print text messages directly. Instead, it converts them to pdf, and you’re free to copy and print pdf as you please. In development, I just examined output pdf. Many, many times. Printing is a formality.

Still, this gives me an itchy feeling. I find it unsatisfying to leave the final step untried.

So I celebrated the beta 15 release by actually printing messages. On paper. In color and grayscale.

Why celebrate beta 15? I expect it to be the last beta release before 1.0. Every 1.0 feature is in and every 1.0 bug is fixed. In other words, beta 15 is Version One. All that remains is to take a deep breath and bless it with a “production” designation.

 

How Unicode can save math: part 1

Every casual math enthusiast has by now heard of the raging war between tau and pi.

The what?

Ok, I mean, tau’s gaining a little ground, but really, pi has the weight of history behind it, so “raging” and “war” might be overstating things a little. The point is that 3.14, et cetera, is a bad circle constant and there’s a more intuitive option, “tau.”

Take a 90-degree angle. In radians, it’s half pi, but one quarter tau makes much more sense:

A quarter cut out of a pizza
Half pi!

If we call 2 times pi “tau,” this slice is one quarter, and things just make more sense.

Doesn’t seem to be catching on.

Not really, no. And tau has some problems too: for example τ=2π, but the tau glyph has only one leg and pi has two, so it looks like pi is twice tau. Shouldn’t tau have four legs? Clearly, the problem is this symbol: we need something more familiar. How about we just redefine pi to be twice itself?

Madness. Only confusion and chaos can result.

But not if we change the spelling. We’ll say that pie = 2pi.

Mathematicians will never go for it.

Perhaps, but food-based math has rich history. If you’re worried that it lacks a succinct one-character symbol, well, that’s where Unicode comes into the picture. In 2017, we finally have pie emoji, 🥧.

Where have my emoji gone?

Bob, what were your six biggest challenges in developing pdf?

Dave, that’s easy. Fonts, fonts, fonts, fonts, fonts and fonts. In that order.

The most noticeable limitation of Text Collector is that when it converts your text messages to pdf, it drops emoticons. Instead of smiley faces, it displays diamond-question mark things, “replacement characters.”

As ever, fonts are the problem.

When you need to display something laid out just so, such that it prints on a page and everything fits correctly, text alone isn’t enough.

Text, nowadays, usually means some encoding of Unicode. Technically, it’s a “Unicode transformation format,” which is where we get the “utf” in utf-8. Impress your friends and family with that.

Unicode represents abstract concepts, like the idea of the letter A or a smile. Emoji, just like letters A through Z are Unicode text, but precisely how letter A or winky face actually gets written to the page depends on your font.

A A

If you want to ensure that your text fits on the page, doesn’t run over or under the things around it, or, indeed displays at all, you must also include a font. In pdf terms, this is called “font embedding,” and it is what Android’s built-in pdf library does.

So far so good: just embed the emoji font and smiles or piles of poo come along for free, right? Wrong.

Most fonts specify letter outlines and leave color of the letters up to the user. Thus, I can say that I want letter A in blue or orange and both are the same font:

A A

Not so, Google’s emoji font. It includes cats, dragons and piles of poo in glorious, and heavyweight, color. Including this font makes for a massive pdf: almost 10 megabytes, on Android 6. It only takes two or three such documents to blow past Gmail’s attachment size limit. Typical collections would run into hundreds of megabytes, or gigabytes. They could easily consume all available space on the device.

This isn’t a new problem, and there’s a well-known solution: font subsetting. That is, only include glyphs for the small handful of characters that you actually use. Android’s built-in pdf library can’t do that.

So, to get emoji, I have to bring in an entirely new pdf library… and I’ve been working on this long enough. Version 1 needs to ship. Emoji will have to wait till version 2.

Painless Android releases

Android apps require not one, but two version numbers:

  • Version code: an integer that Android uses to check whether one version is more recent than another
  • Version name: a friendly version to display to the user, conventionally something like 1.2.3

This means that when you want to build a new release of your app, you have two things to manually update, and that is two things too many. You will make mistakes.

Luckily, it’s not too hard to automate this away in your Gradle build script.

Gradle inherited much of its design from Apache Maven. Maven defined a standard release feature that automatically handles typical pitfalls and mindless details of making a release: tagging in source control and incrementing your version number. For Gradle, there is a nice third-party implementation, the gradle-release plugin. So long as you don’t fight Maven-style version conventions, it can make cutting releases almost entirely automatic, modulo prompting you to confirm that it guessed correct version numbers.

If your project only has one version number, you just apply the release plugin and you’re done, but Android’s two-version-number system takes some customization.

I only discuss version numbers here, but the release plugin also does several other useful sanity checks.

First, move the versions out of your app/build.gradle into app/gradle.properties. They should look like so:

app/gradle.properties

version=1.0-SNAPSHOT
versionCode=1

app/build.gradle

android {
    // ...
    defaultConfig {
        versionCode project.versionCode.toInteger()
        versionName project.version
        // ...

“SNAPSHOT” is Maven’s convention for “between releases”. Version 1.0-SNAPSHOT means the code leading up to version 1.0. This convention is how the release plugin guesses what version number you are releasing: it just lops off the suffix.

When you run ./gradlew release, the release plugin updates the version thus:

  1. Edits gradle.properties, removing the “snapshot” part
    1.0-SNAPSHOT becomes 1.0
  2. Commits the change and tags this as version 1.0 in source control
  3. Builds the release
  4. Edits gradle.properties again, to next dev version
    1.0 becomes 1.1-SNAPSHOT
  5. Commits so you can immediately start working on version 1.1

Thus, out of this box, this handles the user-friendly version number, but not the “version code.”

Updating the version code

When Android installs an update to an app, it knows by version code whether the update is newer than what it currently has installed. 3 is newer than 2 and so on.

Thus, the obvious strategy for updating your version code is to add one on every release. If using the release plugin, you might do this as a manual step after it finishes a release. If you forget, you’ll accidentally build your next release with the same version code as you just used. If you have other branches, you need to remember to update them as well. Ouch.

There is a better way. Version codes need not be sequential, so instead of incrementing 1,2,3…, we can derive it from the date. A format like [2-digit year][month][day][0-9] works nicely. A release today gets version code 1704080, tomorrow, 1704090.

This format will cover you for 82 years at up to ten releases a day. If that’s not enough for you, use a four-digit year and a two-digit suffix, but watch out for integer overflow in 130 years or so.

The date-based strategy, however, means that you have to set your “version code” immediately before you release, instead of after. To do this, add a Gradle task right before updating version name.

app/build.gradle

task setVersionCode { doLast {
    // Add a task that updates version code
    def current = project.versionCode.toInteger()
    def releaseAs = new Date().format('YYMMdd0', TimeZone.getTimeZone('UTC'))
    if (releaseAs.toInteger() <= current) {
        // More than one release today
        releaseAs = current + 1
    }
    def releaser = project.plugins[net.researchgate.release.ReleasePlugin]
    def propsFile = releaser.findPropertiesFile()
    def props = new Properties()
    propsFile.withInputStream { props.load(it) }
    props.versionCode = releaseAs.toString()
    propsFile.withOutputStream { props.store(it, null) }
}}
// Execute our task before unSnapshotVersion, provided by the release plugin:
unSnapshotVersion.dependsOn setVersionCode

With this simple build script change (plus applying the release plugin), a single command updates both version numbers:

./gradlew release

The release plugin also runs the “build” task at the point of release, so this single command leaves you with both a release .apk and your working directory updated to the tip (snapshot) code ready to start work on the next release. There’s still a problem though: if you haven’t configured your build script to sign the build, you won’t be able to publish the release .apk.

Signing the build

To make Gradle sign a build, you need to add a “signingConfig”:

android {
    // ...
    signingConfigs {
        release {
            storeFile file('/home/myname/.javakeys/mykeys.jks')
            keyAlias 'myappsigningkey'
            // These two lines make gradle believe that the signingConfigs
            // section is complete. Without them, tasks like installRelease
            // will not be available! (see http://stackoverflow.com/a/19350401)
            storePassword "notYourRealPassword"
            keyPassword "notYourRealPassword"
        }
    }
    buildTypes {
        release {
            signingConfig signingConfigs.release
            // ...

This fails, so you put your real password in the “password” config place and get pwned. Your wife leaves you, and your dog dies. You didn’t that, right?

So where should you put your password? The top-voted answer on Stack Overflow says ~/.gradle/gradle.properties, presumably protected by 600 permissions. I don’t see the point. If you’re relying on file system permissions to keep the password secure, why have the password at all? You could just protect the keystore with file system permissions.

What you need is a prompt for the password.

Thanks to bug 1251, Gradle running in daemon mode (the default) doesn’t let you use System.console().readPassword("Password:"). You can disable daemon mode, but then you run afoul of (orphaned?) bug 2357 because Android Studio generates a default gradle.properties that includes jvmargs. Once you remove that configuration, you find that prompts don’t display when you build not in daemon mode (bug 869). That’s a pain because you can’t see the version number confirmation prompts.

As a result of this epic adventure, you’ll eventually find that the only reliable way to prompt for password is via Swing. No, I’m not joking. It’s not as gruesome as it sounds, thanks to Groovy’s Swing builder, so pop over to where Tim Roes documented how to do it.

Why Android threading isn’t good enough

Text Collector exports your text messages organized by conversation. Conversations – also known as message threads or strings – are crucial to understanding the context of text messages. If Text Collector did something more naive: put every message in one big document according to date, perhaps, some messages would find themselves incomprehensibly woven together with other unrelated messages.

Apps on Android access messages through “content providers,” and since Android does include a content provider that lists messages according to conversation, the easy way to get message threads is to ask that provider. It’s not what Text Collector does.

The trouble with Android’s threading implementation is that it allows a message to be part of one, and only one, thread.

Android’s design is sensible enough when there are only two people on a conversation. In that case, messages intuitively belong to just one thread. Suppose Alice is having a conversation with Bob, and separate conversation with Charlie: she simply has two threads, one for Charlie, the other for Bob.

A more nuanced way of looking at message threads might also divide messages by topic. In email, for example, threads can be based on the subject line. For text messages however, a simple categorization by recipients is more intuitive.

Eventually, Alice finds herself discussing the same topic with both her friends, so she copies Bob and Charlie on a message that says, “Hey Bob, Charlie says you’re wrong.” This group message has to go into a thread, but Android doesn’t know whether to put it in the Bob thread or the Charlie thread. And lo, Android creates a third thread.

Intuitively, the group message actually continues both conversations, but since Android can only give it one thread identifier, it can’t represent that. People can join conversations, leave conversations and have related conversations on the side, but Android’s threading model represents none of this.

While using the phone, these broken threads aren’t too confusing, because you read the messages with context still fresh in your mind. They can be very confusing, however, when reading messages long after they were sent. If Text Collector used Android’s threading model, you would often need to cross-reference several documents to understand where messages really belong.

Instead, Text Collector puts the message in both the Bob document and the Charlie document. There’s a bit more to it, but if you want gory details, check the website.

No matter which document you read, Text Collector’s threading strategy makes the whole conversation understandable, and it has another happy effect for legal review: all Bob messages appear in a single document, and same for Charlie. Thus, in the common case where you want to collect Alice’s messages and review everything she discussed with Bob, you don’t have to worry that they might be scattered across a mess of different files.

Mind you, this is not magic. If Bob does something tricky, like change phone numbers, Text Collector won’t know about it.

Build one to throw away

Most software projects inadvertently plan to build a throwaway product for delivery to customers, instead of heeding Fred Brooks:

In most projects, the first system built is barely usable…

Plan one to throw away; you will, anyhow.

Luckily, when building Text Collector, only my own management expectations burdened me, so I had an opportunity to build a real pilot and chuck it.

I wrote the pilot in Java and it included all the major features I knew I would need. Thanks to Steve Yegg popping up for the first time in a few years, I heard about Kotlin, so I switched to Kotlin for the real program.

Now that it’s in alpha, I can reasonably compare the size of the two programs:

Java Kotlin
Lines  1.9k  3.5k
Words 6k  13k
Bytes 71k  146k

By many standards, this is small, and smaller yet when you consider that I’ve counted using simply the “wc” command, so the numbers include whitespace and comments.

The pilot included all major functionality, but ignored most edge cases. It featured mms and sms collection with:

  • Date filtering
  • Message preview (small collections only)
  • Pdf creation
  • Pdf sharing
  • Inline image rendering (no scaling)
  • Zooming (without panning or clamping)

For the real program, I kept all the pilot features except contact filtering, added handling for many edge cases, and added features:

  • Organization by conversation
  • Zip creation
  • Reporting usage statistics and errors
  • Failure diagnostics and debug features
  • Option to cancel collections in progress
  • Pre-calculated layout (allows preview for large collection)

So, the Kotlin app has roughly twice the features, with roughly twice the amount of code. At first glance, it doesn’t seem like that significant a difference, but breaking it down this way lets me count up the lines associated with new features only: 1.3 thousand. Thus, the part roughly equivalent to the pilot weighs in at 2.2k lines of Kotlin to Java’s 1.9.

In other words, Kotlin, handling edge cases, is only 15% larger than the equivalent Java that ignores edge cases with wild abandon.

My commit log shows that the Java version took about one month, while the Kotlin version took three. That seems pretty dismal: two thousand lines per month in Java and closer to one thousand in Kotlin, but…

The Java pilot had no tests, the real, Kotlin version adds 6.6 thousand lines worth of test code.

Secure email monitor

I segment my online accounts into two groups: valuable accounts and everything else. “Valuable” can vary a little by personal priorities, but for most of us, our most valuable accounts will be those with direct access to cash: banking and investments.

By transitivity, any account that can allow access to those accounts is also in the valuable and high-risk group. These include financial aggregators and any email address used for account recovery.

I would like to keep the only computer with access to the valuable accounts locked away in a dungeon guarded by trolls, but highly restricted access also makes it difficult to monitor activity. I want to be able to see notifications on all my devices while not actually allowing them access.

Enter Google Apps script.

Set up the script

You’ll need an everyday – normal – email address, one you access from anywhere. Then, you’ll need another that you only access from secured devices, the restricted – high risk – account.

Make a restricted Gmail account for yourself.

  • Don’t use an existing email address in for recovery
  • Do use a password manager and make sure you have a backup
  • Do set up two-factor auth

Go to https://script.google.com/

Untitled Google Apps script

Paste:

function digestAndArchive() {
  // CHANGE THIS TO YOUR NORMAL EMAIL ADDRESS:
  var monitor = "youreverydayemail@example.com"

  // Docs say that if you have many threads, for some unspecified value of "many", you
  // should use the paginated version of getInboxThreads, as the simple version will fail.
  //
  // It turns out that means "fail silently", returning at most some arbitrary number of
  // threads, and there is no obvious way to know there are more. I suspect the "correct"
  // way is to keep calling the paginated version with an increasing start index until
  // it returns nothing, but that seems ridiculous. For practical purposes, this function
  // returns more threads than you are likely to receive in a day.
  //
  // So, upon first installing this script on a long-ignored inbox, it might need to run
  // several times before it clears out the inbox, but that shouldn't hurt anyone.
  var threads = GmailApp.getInboxThreads()
  var bySender = {}
  for (var i = 0; i < threads.length; i++) {
    // I'm assuming this is a receive-only email address, so all messages in a thread
    // presumably have the same sender (or similar). Organizing by sender isn't
    // strictly necessary, but I think the final digest is more understandable.
    //
    // The docs don't say whether the first message is the most recent or not, but that
    // generally should not matter.
    var message = threads[i].getMessages()[0]
    var sender = message.getFrom()
    bySender[sender] = bySender[sender] || []
    bySender[sender].push(message.getSubject())
  }
  var body = ''
  var indent = '\n  - '
  for (var sender in bySender) {
    body += sender + indent + bySender[sender].join(indent) + '\n'
  }
  // Experimentally, it seems that GmailApp.sendEmail encodes the body as text/plain
  // so it should be safe to drop any old string in it. Would be nice to find
  // documentation to that effect. It munges astral plane characters, but for my
  // purposes here, I don't care.
  GmailApp.sendEmail(monitor, "Daily digest for " + new Date(), body)
  for (var i = 0; i < threads.length; i++) {
    // GmailApp.moveThreadsToArchive() can move multiple threads at once, but throws an
    // error and moves nothing for more than 100 threads. That's a pretty low limit when
    // you first run this on an inbox you haven't been regularly cleaning, so move one
    // by one.
    GmailApp.moveThreadToArchive(threads[i])
  }
}

Remember to change the email address in the script above to your normal, every day email. Save, as, for example “daily-digest”

To run by hand, go to the Run menu or just click the play button in the toolbar. At your normal address, you should see an email like this:

Subject: Daily digest for Fri Dec 02 2016 08:11:39 GMT-0500 (EST)
From: [your restricted address @gmail.com]

Andy from Google <andy-noreply@google.com>
– Josiah, welcome to your new Google Account

Run on a timer

To schedule, click Resources (menu) -> Current Project Triggers -> Click the link to add a trigger.

Set up to run daily on a timer. The time option is in your local time, as detected by Google.
Google apps time-driven trigger example

When you save the trigger, it will prompt you to authorize the trigger to run. Click “Review permissions”, which opens a new popup window, then allow.

Run the script by hand again after setting up the trigger. It should prompt for another required permission.

Daily summaries of your restricted account should now start to appear in your normal email.


If you’re paranoid – and if you’ve gotten this far, you probably are – delete the digest emails from time to time. Deleting the digests removes a possible attack vector on your high-value account because Google’s account recovery process asks the date you created an account as a form of identity verification.