Why Android threading isn’t good enough

Text Collector exports your text messages organized by conversation. Conversations – also known as message threads or strings – are crucial to understanding the context of text messages. If Text Collector did something more naive: put every message in one big document according to date, perhaps, some messages would find themselves incomprehensibly woven together with other unrelated messages.

Apps on Android access messages through “content providers,” and since Android does include a content provider that lists messages according to conversation, the easy way to get message threads is to ask that provider. It’s not what Text Collector does.

The trouble with Android’s threading implementation is that it allows a message to be part of one, and only one, thread.

Android’s design is sensible enough when there are only two people on a conversation. In that case, messages intuitively belong to just one thread. Suppose Alice is having a conversation with Bob, and separate conversation with Charlie: she simply has two threads, one for Charlie, the other for Bob.

A more nuanced way of looking at message threads might also divide messages by topic. In email, for example, threads can be based on the subject line. For text messages however, a simple categorization by recipients is more intuitive.

Eventually, Alice finds herself discussing the same topic with both her friends, so she copies Bob and Charlie on a message that says, “Hey Bob, Charlie says you’re wrong.” This group message has to go into a thread, but Android doesn’t know whether to put it in the Bob thread or the Charlie thread. And lo, Android creates a third thread.

Intuitively, the group message actually continues both conversations, but since Android can only give it one thread identifier, it can’t represent that. People can join conversations, leave conversations and have related conversations on the side, but Android’s threading model represents none of this.

While using the phone, these broken threads aren’t too confusing, because you read the messages with context still fresh in your mind. They can be very confusing, however, when reading messages long after they were sent. If Text Collector used Android’s threading model, you would often need to cross-reference several documents to understand where messages really belong.

Instead, Text Collector puts the message in both the Bob document and the Charlie document. There’s a bit more to it, but if you want gory details, check the website.

No matter which document you read, Text Collector’s threading strategy makes the whole conversation understandable, and it has another happy effect for legal review: all Bob messages appear in a single document, and same for Charlie. Thus, in the common case where you want to collect Alice’s messages and review everything she discussed with Bob, you don’t have to worry that they might be scattered across a mess of different files.

Mind you, this is not magic. If Bob does something tricky, like change phone numbers, Text Collector won’t know about it.

Build one to throw away

Most software projects inadvertently plan to build a throwaway product for delivery to customers, instead of heeding Fred Brooks:

In most projects, the first system built is barely usable…

Plan one to throw away; you will, anyhow.

Luckily, when building Text Collector, only my own management expectations burdened me, so I had an opportunity to build a real pilot and chuck it.

I wrote the pilot in Java and it included all the major features I knew I would need. Thanks to Steve Yegg popping up for the first time in a few years, I heard about Kotlin, so I switched to Kotlin for the real program.

Now that it’s in alpha, I can reasonably compare the size of the two programs:

Java Kotlin
Lines  1.9k  3.5k
Words 6k  13k
Bytes 71k  146k

By many standards, this is small, and smaller yet when you consider that I’ve counted using simply the “wc” command, so the numbers include whitespace and comments.

The pilot included all major functionality, but ignored most edge cases. It featured mms and sms collection with:

  • Date filtering
  • Message preview (small collections only)
  • Pdf creation
  • Pdf sharing
  • Inline image rendering (no scaling)
  • Zooming (without panning or clamping)

For the real program, I kept all the pilot features except contact filtering, added handling for many edge cases, and added features:

  • Organization by conversation
  • Zip creation
  • Reporting usage statistics and errors
  • Failure diagnostics and debug features
  • Option to cancel collections in progress
  • Pre-calculated layout (allows preview for large collection)

So, the Kotlin app has roughly twice the features, with roughly twice the amount of code. At first glance, it doesn’t seem like that significant a difference, but breaking it down this way lets me count up the lines associated with new features only: 1.3 thousand. Thus, the part roughly equivalent to the pilot weighs in at 2.2k lines of Kotlin to Java’s 1.9.

In other words, Kotlin, handling edge cases, is only 15% larger than the equivalent Java that ignores edge cases with wild abandon.

My commit log shows that the Java version took about one month, while the Kotlin version took three. That seems pretty dismal: two thousand lines per month in Java and closer to one thousand in Kotlin, but…

The Java pilot had no tests, the real, Kotlin version adds 6.6 thousand lines worth of test code.