October 2016 – Should Be Simple

Use the string literals

I think that some of my college classes took points off your grade for using literal strings instead of #define. Likewise, linters and coding standards typically want to prevent programmers from using literals.

Many indoctrinated programmers, therefore, insist that the correct way of writing a select statement is something like this:

"select " + COLUMN_NAME + ", " + COLUMN_EMAIL
+ " from " + TABLE_PEOPLE
+ " where " + COLUMN_ID + " = ? "

This jihad gains its supposed righteousness from the idea that literals make your code less maintainable. But, aside from thorough tests, what does make code maintainable? In order of importance:

Readability
Fewer dependencies
All else equal, shorter is better

So, assume for a moment that the most maintainable code is the most readable code, and compare:

"select name, email from people where id = ?"

I suggest that the second statement is far more readable, and therefore more maintainable. It is also shorter, even without including the constant declarations, and it removes a dependency on constants defined somewhere else.

Ha! You’re ignoring the dependency on the database structure. If I want to change a column or table name, the first code is dryer.

Perhaps, but structural database changes are complex and should not be undertaken with the flippant attitude that you can do them simply by changing the value of a constant somewhere. Consider, just for starters, that if you change column name “email” to, say, “primary_email”, you will probably want to change the name of the constant to match and you will still have to search for any places the string literal might have been used, just in case.

Ok, but at least I will spend less time tracking down bugs related to spelling errors.

Sure, but seriously, how much time do you spend on spelling errors? They are typically among the easiest bugs to fix. The cumulative effort of fixing typos is less than the effort involved in managing your constants.

Life without literals is burdensome. Where should the constant declaration go? Does a definition for the same value already exist in scope? If it does, and has the same name as what you want to use, does it refer to the same thing? You add a “department” table that also has a “name” column; should you (a) use the existing constant, (b) make a new constant with a non-conflicting name, (c) rename the existing constant or, (d) both b and c?

But we have a coding standard that says what to do, so I can just follow that.

That’s just one more thing you need to remember. Do you remember precisely what your coding standard says to do? Does it even cover every case?

Does your coding standard say that uppercase identifiers must never be built from external data? Even if it does, unless you can see the constant definition in the same screenful of text as where it’s used, your code is not obviously safe from injection.

Instead of taking on this travail, just funnel your energy into picking good names for your tables, urls, or whatever in the first place. You’ll need fewer structural changes later.

Good points, but constants can add meaning, instead of being “magic numbers” sprinkled about.

True, adding meaning (see “readability”) is the one good reason for a constant instead of a literal, but usually this applies only to numbers, not strings.

What about localization?

Well… there are some good arguments for externalizing strings that appear to users, but that’s another article.

Matias Ergo Pro: the review

Mechanical keyboards have gained popularity in recent years, but there is still a terribly under-served class: split mechanicals. The $200 Ergo Pro is a rare entrant that I very much want to like. First the good: it feels solid and the switches are very quiet for mechanicals.

Tragically, the rest is mostly bad.

Some faults are minor matters of taste: first, Matias may have sacrificed too much for quiet switches. I like my switches clickier and with a more noticeable point of engagement, but they feel soft on the Ergo Pro. Second, dedicated cut/copy/paste keys just waste space.

Layout

Slightly more problematic, it is difficult to adjust to some bizarre layout choices.

I don’t mind moving the home/end/etc. cluster but isolating the delete key makes the arrangement more unintuitive than necessary.

Next, right control migrated under the right index finger; it is easier to reach than normal and I almost like that choice.

By contrast, the escape key has gotten much more difficult to reach. The horizontal face of escape actually slopes down toward the back, away from you, and it moved far to the upper left. I guess we know what editor the designers prefer. This choice is extra perplexing because they seem to have shifted the function keys rightward to make room for a proper escape key placement, then, presumably, forgotten to move the key.

Pressing escape on this keyboard is the typing equivalent of outfitting an expedition to the north pole, crossing the key desert above the tilde mountains, trekking westward past the edge of where ordinary keys reside and finally pressing escape on your return journey.

Nonetheless, after a week of use, I was beginning to adjust to the strange layout. I had to abandon the keyboard for other reasons.

Ergonomics

Keyboard manufacturers habitually call any split or curved keyboard “ergonomic”, ignoring all but one factor in the dance that is ergonomics.

Despite being split and its name, the Ergo Pro is decidedly not ergonomic. It suffers a subtle but severe enter key misplacement: the enter key is in a straight line a full inch from the semicolon key.

Most split designs include curved rows of keys:

Microsoft Sculpt keyboard layout — Microsoft Sculpt uses curved rows

The Ergo Pro, however, does not curve the enter key toward to the right pinky, nor does it adjust the enter key inward. The Sculpt’s enter key is an eighth inch closer to semicolon, and closer the the palm by virtue of a curved row.

In total, this pushes the enter key half an inch further from where my palm rests on the Matias keyboard than on the Sculpt. Since my pinky is only two and a half inches long, this is a whopping 20% difference that causes, for me, severe strain after only a week’s use.

But isn’t this still better than the same difference on a straight keyboard?

No. On a straight keyboard, the unnaturally outward curved wrists incline you to rotate your hands inward when reaching for keys with the little finger. Consequently, your outside fingers naturally move forward when needed. This reduces finger-overextension at the cost of wrist strain. A split keyboard straightens the wrists, meaning that the pinky has no assistance when reaching for the outer keys.

Perhaps this does not matter if you rarely use the enter key, such as when typing prose, but I am a programmer so enter is my second most popular key. It probably didn’t help that I was using the Ergo Pro when developing Qwerty War. Though I wanted to love it, I had to abandon the Ergo Pro after a week due to crippling pain.

Beware string concatenation

The Owasp top 10 webapp vulnerability list is due to update soon; I wager its top three won’t change from the 2013 list:

Injection [mainly sql injection]

Broken Authentication and Session Management

Cross-Site Scripting (XSS)
…

What do two of the top three vulnerabilities have in common? Cross-site scripting is a form of injection attack that is so popular it deserves its own category; both cross-site scripting and “injection” often result from unsafe string concatenation:

"select name from students where " + ...
html = "<div>" + ...

I bet one simple rule would stop most webapp compromises:

Avoid string concatenation.

Production code should very rarely concatenate strings; when it must be done, it should be written to make it obvious that the concatenation is safe.

Assuming most programmers do not intend to write vulnerable code, it is safe to say that they consider the vulnerable code they are writing to be obviously safe. That is just a special case of “programmers don’t think about the bugs they are writing.” By extension, they don’t think about the vulnerabilities they are writing.

So, I dislike rigid coding standards, but if I were to implement one, it would be to say that any place string concatenation is used must be accompanied by comments to answer:

Where is the data coming from?
What could happen if a malicious actor provided the data?
What makes this safe?

If you have so much string concatenation in your webapp that this seems onerous, I guarantee your site is full of security holes.

Comefrom0x10: unary minus

In the original Comefrome0x10 grammar, I skipped unary minus. In retrospect, its presence has important consequences that I should have considered earlier, so unary minus made me reverse a couple decisions.

Line continuation

Comefrom0x10 uses newlines to delimit statements, which is fairly common in modern languages. What is not so common is that a standalone expression has a side effect: it writes to program output, as in PowerShell. This makes hello world succinct:

'hello world'

Prior to adding the unary minus operator, I allowed line continuation if an operator started a line, so these would be equivalent:

x = 2 + 3

x = 2
  + 3

But unary minus causes problems. These lines could be either “3 - 2” or “print 3; print -2“:

3
-2

I could continue to allow operators at the end of a line to indicate line continuation, but I’ve never liked end-of-line line markers for continuation as they are too hard to see. Comefrom0x10, therefore, now has no facility at all for line continuation. I will figure it out when I have a burning need.

Concatenation

Unary minus causes a second problem, when interacting with cf0x10’s concatenation behavior.

In cf0x10, space is the concatenation operator: “'foo' 'bar' is 'foobar' # true“. Originally, I allowed the concatenation space to be optional, so “name'@'domain” would be equivalent to “name '@' domain“.

Enter unary minus. In most languages, all three of these expressions mean “a minus b”, but in the original cf0x10 grammar, they could indicate either concatenation with negative b or subtraction:

a -b
a-b
a - b

Other languages do allow space concatenation, but normally only for literals. In Python, for example:

>>> 'a' 'b'
'ab'
>>> a, b = 'a', 'b'
>>> a b # ILLEGAL in Python, but LEGAL in cf0x10
SyntaxError: invalid syntax

In css, space is even an operator and “-” can appear at the beginning of an identifier. The css grammar, avoids ambiguity, however, by making “-” mathematical only when followed by a number, as in “-2px” versus “-moz-example“; “--foo” is illegal. Css gets away with this because it has no variables, making the operand’s type known during parsing.

Since the cf0x10 parser cannot know operand types, Comefrom0x10 has two options. It could decide whether to subtract or concatenate at runtime, or it could make the grammar a bit less forgiving of how you use spaces. Spacing is already significant, so, on balance, I think that making the spacing rules restrictive is the more clear:

a -b # concatenation
a-b # subtraction
a - b # subtraction
a- b # syntax error

The rules are that concatenation requires at least one space and unary minus cannot be followed by a space.

Comefrom0x10: day 5

A few days have passed building a brand new language. About a day and a half getting most of the grammar worked out and another day and a half building the guts of the interpreter. But now cf0x10 programs can basically run, albeit with a limited set of operators and without entirely correctly scope resolution.

Of course, I spent much of the time on the interpreter working out the ideal semantics for the comefrom statement, a much neglected field of computer science. The natural question is, “what happens when more than one statement comes from a single other statement?”

In Intercal, this is simply an error. Though that might seem like a cowardly evasion, it can simply be credited to Intercal’s being a lower-level language: the Intercal programmer needs to handle these situations explicitly by abstaining and so forth.

Comefrom0x10, on the other hand, is a high-level language designed to minimize runtime errors, so Comefrom0x10 has well-defined semantics for all variations of coming from a place.

First, eligible comefroms receive control in the order they appear in the source. Alone, however, this rule causes trouble breaking out of blocks. Consider this program that prints the Fibonacci sequence up to ten thousand:

fib
  b = 1
  comefrom fib
  a
  next_b = a + b
  a = b
  b = next_b

  comefrom fib if a > 10000

If the first comefrom took precedence, the loop would never end. Changing the first line to “comefrom fib if a < 10000” would fix the problem, but require reconfiguring how the variables a and b get swapped. I think the existing version is more clear: last-first wins allows a common idiom of placing your exit conditions at the end of a block.

The first exception to source order, therefore, is that when multiple comefroms are eligible within the same block, the last one wins.

There are other exceptions, but that’s another article.

Comefrom0x10: day 1

Today, I am finally creating a language.

Inspired by Come Here and Come From, Comefrom0x10, pronounced “Come from sixteen”, will be a modern language based entirely around the much-maligned “come from” control structure.

I am writing the first interpreter in Python. I was hoping to have a grammar worked out by the end of day one, but failed to make significant progress. Originally, I thought I would use a parsing expression grammar, but soon reverted to Backus-Naur. The trouble is that I want cf0x10 to include syntactic indentation (as it’s a modern, Pythonic language). This means that I need indent and dedent constructions, which parsing expression grammars can’t easily express.

So, I decide to switch to lalr. As a bonus, using lalr, will, of course, let people easily build million-line cf0x10 programs without worrying about slow compile times.

The Python parser generator landscape is surprisingly bleak. Many of the parser generators available are abandoned or half-assed. Ply seems to be the only real contender. There are two main problems with Ply: first, its mechanism of embedding grammar specification within docstrings makes the parser obnoxiously verbose; second, it doesn’t provide an obvious way to generate a parser that does not require installing Ply.

Ply has a very strong point though: it provides outstanding debug output for diagnosing problems with your grammar. I’ll just live with the verbosity; I might try PlyPlus if it gets too annoying. As far as removing the Ply dependency goes, I’ll wait until the Enterprise customers for cf0x10 want to fork out money to fund writing a build script so I can use the generated parse table without depending on Ply.

Ply’s scanner generator has the usual Lex push and pop state capabilities, plus it makes saving arbitrary lexer state easy. I, therefore, spend a good deal of day one trying to avoid hand-writing a lexer. As usual, this is a waste of time.

The insurmountable problem is that Ply helpfully stops you from providing regexes that match an empty string, presumably assuming they would cause infinite loops. In addition to syntactic indentation, Comefrom0x10 uses syntactically significant blank lines and without either a facility for pushback or empty-matching regex, it is very hard to write a sensible grammar. But that’s another article.

So finally, I just decide to hand-write the lexer and begin to make progress near the end of day one.

Qwerty War: day 5

On day five, I plan to get the leaderboard persistence built and deployed: the only part of the project that requires a database back end. I figure that I’ll build this with Django and deploy on Heroku.

So I switch from JavaScript to Python and naturally this requires that I instate my traditional key customization that maps shift+space to underscore. Previously, I’ve used xmodmap, but the going wisdom around the Internet is that xkb is the newer, better way to do things.

… hours pass …

I can finally type underscores without over-extending my pinky, so I turn back to the problem of Django. Django normally uses a great many subdirectories to organize things and make them portable, which is great for medium to large applications, but annoying for something as small as Qwerty War. The article Django as a micro-framework is a good start to really stripping Django down, but it’s significantly out of date. To bring it up to date with Django latest, the main thing to realize is the django-admin command line should now read thus:

django-admin runserver --settings settings --pythonpath .

From there, you’ll find there are a series of errors that are all pretty self-explanatory and easily fixed. I skip most middleware, but keep the csrf and clickjacking parts.

Soon enough, Django is serving up my static files: I haven’t built any database stuff yet, but I figure I should try deplying the static bits on Heroku. First hurdle: Heroku wants me to install its command-line tool by piping stuff from wget into a shell. The script would install Heroku’s apt repository and go from there.

Isn’t the whole deployment done with git anyway? Why do I need the heroku command line interface? Much less another dpkg repo. It turns out this is easy enough, if undocumented

Add an ssh public key to your account via the Web gui
Use the Web gui to create a project
Push to the project using the funky-looking location git@heroku.com:[appname].git

Mercurial, of course, does not speak Git natively, so I decide to try to the hg-git extension. Its homepage says to install via the old easy_install. Not an auspicious start, but ok, I just try pip instead, and in Pythonic fashion, it just works.

So far, so good. The first push via hg-git succeeds, but naturally, the app fails to start on Heroku, as I haven’t yet created a Procfile or wsgi.py. I add them, push again, and… nothing. Mercurial reports there are no changes to push. Did my first push really work at all? I figure I’ll try cloning the repo from Heroku to check, using actual Git…. time passes (but much less than spent on xkb) …I figure the problem is that either hg-git or Heroku or both dislike that I pushed some rather large files. In particular, I included a full history of my edits to the binary blobs that are audio files, plus the originals in .wav and .flac format.

Fair enough, pushing all that stuff is pretty wasteful anyway. I abandon the hg-git extension and opt for a simple shell script that deploys via Git proper. It works, so on day five, I at least have a deployable application, if still very much in development mode and without a database.

A practical web of trust?

In order of difficulty, there are three basic parts to using gpg:

Generate your keys
Keep your keys secure
Decide who to trust

Most people will never even get to step one, much less the far more difficult steps that follow. I’ve written about this problem before [article no longer online]:

The web of trust is based on the idea that you can reduce a complex human dynamic, namely trust, into a mathematical system.

Even if you more or less understand the levels of trust gpg offers, it’s hard to be certain how much trust to assign any given key, since you can never know how the owner might use a key.

But what if the goal were simplified? Instead of something complex like “trust”, we might say you have only one decision to make about a key: is this the person I think it is?

Such a system might, I think, gain large-scale acceptance if correctly implemented. Your email address already is your identity, so use email for key signing. A large email provider, say Google, could generate a key for every address. So when Alice signs up for a Gmail account, Gmail transparently signs her outgoing mail with the key held in escrow.

This signing should be implemented such that mail agents can validate it if they know how, but those that don’t transparently ignore it. Perhaps it can be done with some form of multipart/alternative.

When using a client that is aware of the signature, the mail client prompts the user, unobtrusively. So, when Bob gets a message from Alice he sees something like this off to the side:

Are you sure this message is from Alice?

Bob can either click “yes” or ignore the prompt. If Bob clicks yes, he signs Alice’s key under the covers with his own key, also held in escrow.

Implemented in a large enough community, this system could rapidly build a network of signed keys. It’s not clear how we might use this, but if a large scale network of reliably signed keys existed, who knows?