Comefrom0x10 released

After a long hiatus while I built Text Collector, last week I finally returned to my paradigm shifting language, Comefrom0x10. It now has a home page on Read the Docs that features a tutorial, standard library documentation and more.

Except for a couple minor bugs, its implementation was actually functionally complete eight months ago. I hesitated to release it, however, because of rather embarrassing performance problems.

Now, it wouldn’t be fair to say that Cf0x10 is just slow. It’s catastrophically slow. The brainfuck.cf0x10 program takes 10 seconds to run helloworld.b on the laptop I’m using to write this, and gets dramatically worse as the program gets longer.

What went wrong?

It’s not a fundamental problem with the comefrom paradigm, but a consequence of the twisted way the language took on a life of its own during implementation. I started with the idea that I was building a rather ordinary stack-based interpreter, but Cf0x10 would have none of it. As it evolved, the original idea became a disfigured mutant: I can demonstrate with tests that it works, but it’s too convoluted to allow necessary optimizations.

Oh well, as they say, first make it right, then make it fast release it.

Comefrom0x10 patterns: Ayarbtu

Blocks in Comefrom0x10 look like functions in other languages and they create scopes like functions, but they are not functions – they do not take parameters or return values. The scope thing aside, block names are analogous to statement labels. Blocks let you use meaningful names instead of line numbers.

Since cf0x10 block names really are just labels that happen to denote scope, you are free to assign a variable of the same name. For example:

foo = 'bar'
foo
  comefrom if foo
  'I am now in the block called foo.'
  'The block should not be confused with the variable ' foo

This variable is often both the signal for jumping into a block and a value the block modifies, as if the block “foo” were returning by assigning to the variable “foo”.

I call this pattern, “All your assignment are belong to us,” or Ayarbtu for short. The standard library block “car” uses it, so you can get the first character of a string like so:

car = "Hello"
car # prints "H"

Here is how you can roll your own Ayarbtu block:

salute
  comefrom if salute
  salute = "Hello, " salute
  comefrom salute if salute

salute = "world"
salute # prints "Hello, world"

Ayarbtu is just a convention, but assignment as return value has precedent – in QBasic, for example – and cooperative assignment is easily done in, say, Java. You might know it as “aspect-oriented” programming.

Comefrom0x10: unary minus

In the original Comefrome0x10 grammar, I skipped unary minus. In retrospect, its presence has important consequences that I should have considered earlier, so unary minus made me reverse a couple decisions.

Line continuation

Comefrom0x10 uses newlines to delimit statements, which is fairly common in modern languages. What is not so common is that a standalone expression has a side effect: it writes to program output, as in PowerShell. This makes hello world succinct:

'hello world'

Prior to adding the unary minus operator, I allowed line continuation if an operator started a line, so these would be equivalent:

x = 2 + 3

x = 2
  + 3

But unary minus causes problems. These lines could be either “3 - 2” or “print 3; print -2“:

3
-2

I could continue to allow operators at the end of a line to indicate line continuation, but I’ve never liked end-of-line line markers for continuation as they are too hard to see. Comefrom0x10, therefore, now has no facility at all for line continuation. I will figure it out when I have a burning need.

Concatenation

Unary minus causes a second problem, when interacting with cf0x10’s concatenation behavior.

In cf0x10, space is the concatenation operator: “'foo' 'bar' is 'foobar' # true“. Originally, I allowed the concatenation space to be optional, so “name'@'domain” would be equivalent to “name '@' domain“.

Enter unary minus. In most languages, all three of these expressions mean “a minus b”, but in the original cf0x10 grammar, they could indicate either concatenation with negative b or subtraction:

a -b
a-b
a - b

Other languages do allow space concatenation, but normally only for literals. In Python, for example:

>>> 'a' 'b'
'ab'
>>> a, b = 'a', 'b'
>>> a b # ILLEGAL in Python, but LEGAL in cf0x10
SyntaxError: invalid syntax

In css, space is even an operator and “-” can appear at the beginning of an identifier. The css grammar, avoids ambiguity, however, by making “-” mathematical only when followed by a number, as in “-2px” versus “-moz-example“; “--foo” is illegal. Css gets away with this because it has no variables, making the operand’s type known during parsing.

Since the cf0x10 parser cannot know operand types, Comefrom0x10 has two options. It could decide whether to subtract or concatenate at runtime, or it could make the grammar a bit less forgiving of how you use spaces. Spacing is already significant, so, on balance, I think that making the spacing rules restrictive is the more clear:

a -b # concatenation
a-b # subtraction
a - b # subtraction
a- b # syntax error

The rules are that concatenation requires at least on space and unary minus cannot be followed by a space.

Comefrom0x10: day 5

A few days have passed building a brand new language. About a day and a half getting most of the grammar worked out and another day and a half building the guts of the interpreter. But now cf0x10 programs can basically run, albeit with a limited set of operators and without entirely correctly scope resolution.

Of course, I spent much of the time on the interpreter working out the ideal semantics for the comefrom statement, a much neglected field of computer science. The natural question is, “what happens when more than one statement comes from a single other statement?”

In Intercal, this is simply an error. Though that might seem like a cowardly evasion, it can simply be credited to Intercal’s being a lower-level language: the Intercal programmer needs to handle these situations explicitly by abstaining and so forth.

Comefrom0x10, on the other hand, is a high-level language designed to minimize runtime errors, so Comefrom0x10 has well-defined semantics for all variations of coming from a place.

First, eligible comefroms receive control in the order they appear in the source. Alone, however, this rule causes trouble breaking out of blocks. Consider this program that prints the Fibonacci sequence up to ten thousand:

fib
  b = 1
  comefrom fib
  a
  next_b = a + b
  a = b
  b = next_b

  comefrom fib if a > 10000

If the first comefrom took precedence, the loop would never end. Changing the first line to “comefrom fib if a < 10000” would fix the problem, but require reconfiguring how the variables a and b get swapped. I think the existing version is more clear: last-first wins allows a common idiom of placing your exit conditions at the end of a block.

The first exception to source order, therefore, is that when multiple comefroms are eligible within the same block, the last one wins.

There are other exceptions, but that’s another article.

Comefrom0x10: day 1

Today, I am finally creating a language.

Inspired by Come Here and Come From, Comefrom0x10, pronounced “Come from sixteen”, will be a modern language based entirely around the much-maligned “come from” control structure.

I am writing the first interpreter in Python. I was hoping to have a grammar worked out by the end of day one, but failed to make significant progress. Originally, I thought I would use a parsing expression grammar, but soon reverted to Backus-Naur. The trouble is that I want cf0x10 to include syntactic indentation (as it’s a modern, Pythonic language). This means that I need indent and dedent constructions, which parsing expression grammars can’t easily express.

So, I decide to switch to lalr. As a bonus, using lalr, will, of course, let people easily build million-line cf0x10 programs without worrying about slow compile times.

The Python parser generator landscape is surprisingly bleak. Many of the parser generators available are abandoned or half-assed. Ply seems to be the only real contender. There are two main problems with Ply: first, its mechanism of embedding grammar specification within docstrings makes the parser obnoxiously verbose; second, it doesn’t provide an obvious way to generate a parser that does not require installing Ply.

Ply has a very strong point though: it provides outstanding debug output for diagnosing problems with your grammar. I’ll just live with the verbosity; I might try PlyPlus if it gets too annoying. As far as removing the Ply dependency goes, I’ll wait until the Enterprise customers for cf0x10 want to fork out money to fund writing a build script so I can use the generated parse table without depending on Ply.

Ply’s scanner generator has the usual Lex push and pop state capabilities, plus it makes saving arbitrary lexer state easy. I, therefore, spend a good deal of day one trying to avoid hand-writing a lexer. As usual, this is a waste of time.

The insurmountable problem is that Ply helpfully stops you from providing regexes that match an empty string, presumably assuming they would cause infinite loops. In addition to syntactic indentation, Comefrom0x10 uses syntactically significant blank lines and without either a facility for pushback or empty-matching regex, it is very hard to write a sensible grammar. But that’s another article.

So finally, I just decide to hand-write the lexer and begin to make progress near the end of day one.