Comefrom0x10: day 1

Today, I am finally creating a language.

Inspired by Come Here and Come From, Comefrom0x10, pronounced “Come from sixteen”, will be a modern language based entirely around the much-maligned “come from” control structure.

I am writing the first interpreter in Python. I was hoping to have a grammar worked out by the end of day one, but failed to make significant progress. Originally, I thought I would use a parsing expression grammar, but soon reverted to Backus-Naur. The trouble is that I want cf0x10 to include syntactic indentation (as it’s a modern, Pythonic language). This means that I need indent and dedent constructions, which parsing expression grammars can’t easily express.

So, I decide to switch to lalr. As a bonus, using lalr, will, of course, let people easily build million-line cf0x10 programs without worrying about slow compile times.

The Python parser generator landscape is surprisingly bleak. Many of the parser generators available are abandoned or half-assed. Ply seems to be the only real contender. There are two main problems with Ply: first, its mechanism of embedding grammar specification within docstrings makes the parser obnoxiously verbose; second, it doesn’t provide an obvious way to generate a parser that does not require installing Ply.

Ply has a very strong point though: it provides outstanding debug output for diagnosing problems with your grammar. I’ll just live with the verbosity; I might try PlyPlus if it gets too annoying. As far as removing the Ply dependency goes, I’ll wait until the Enterprise customers for cf0x10 want to fork out money to fund writing a build script so I can use the generated parse table without depending on Ply.

Ply’s scanner generator has the usual Lex push and pop state capabilities, plus it makes saving arbitrary lexer state easy. I, therefore, spend a good deal of day one trying to avoid hand-writing a lexer. As usual, this is a waste of time.

The insurmountable problem is that Ply helpfully stops you from providing regexes that match an empty string, presumably assuming they would cause infinite loops. In addition to syntactic indentation, Comefrom0x10 uses syntactically significant blank lines and without either a facility for pushback or empty-matching regex, it is very hard to write a sensible grammar. But that’s another article.

So finally, I just decide to hand-write the lexer and begin to make progress near the end of day one.

Leave a comment