Comefrom0x10 released

After a long hiatus while I built Text Collector, last week I finally returned to my paradigm shifting language, Comefrom0x10. It now has a home page on Read the Docs that features a tutorial, standard library documentation and more.

Except for a couple minor bugs, its implementation was actually functionally complete eight months ago. I hesitated to release it, however, because of rather embarrassing performance problems.

Now, it wouldn’t be fair to say that Cf0x10 is just slow. It’s catastrophically slow. The brainfuck.cf0x10 program takes 10 seconds to run helloworld.b on the laptop I’m using to write this, and gets dramatically worse as the program gets longer.

What went wrong?

It’s not a fundamental problem with the comefrom paradigm, but a consequence of the twisted way the language took on a life of its own during implementation. I started with the idea that I was building a rather ordinary stack-based interpreter, but Cf0x10 would have none of it. As it evolved, the original idea became a disfigured mutant: I can demonstrate with tests that it works, but it’s too convoluted to allow necessary optimizations.

Oh well, as they say, first make it right, then make it fast release it.

Comefrom0x10 patterns: Ayarbtu

Blocks in Comefrom0x10 look like functions in other languages and they create scopes like functions, but they are not functions – they do not take parameters or return values. The scope thing aside, block names are analogous to statement labels. Blocks let you use meaningful names instead of line numbers.

Since cf0x10 block names really are just labels that happen to denote scope, you are free to assign a variable of the same name. For example:

foo = 'bar'
foo
  comefrom if foo
  'I am now in the block called foo.'
  'The block should not be confused with the variable ' foo

This variable is often both the signal for jumping into a block and a value the block modifies, as if the block “foo” were returning by assigning to the variable “foo”.

I call this pattern, “All your assignment are belong to us,” or Ayarbtu for short. The standard library block “car” uses it, so you can get the first character of a string like so:

car = "Hello"
car # prints "H"

Here is how you can roll your own Ayarbtu block:

salute
  comefrom if salute
  salute = "Hello, " salute
  comefrom salute if salute

salute = "world"
salute # prints "Hello, world"

Ayarbtu is just a convention, but assignment as return value has precedent – in QBasic, for example – and cooperative assignment is easily done in, say, Java. You might know it as “aspect-oriented” programming.

Comefrom0x10: unary minus

In the original Comefrome0x10 grammar, I skipped unary minus. In retrospect, its presence has important consequences that I should have considered earlier, so unary minus made me reverse a couple decisions.

Line continuation

Comefrom0x10 uses newlines to delimit statements, which is fairly common in modern languages. What is not so common is that a standalone expression has a side effect: it writes to program output, as in PowerShell. This makes hello world succinct:

'hello world'

Prior to adding the unary minus operator, I allowed line continuation if an operator started a line, so these would be equivalent:

x = 2 + 3

x = 2
  + 3

But unary minus causes problems. These lines could be either “3 - 2” or “print 3; print -2“:

3
-2

I could continue to allow operators at the end of a line to indicate line continuation, but I’ve never liked end-of-line line markers for continuation as they are too hard to see. Comefrom0x10, therefore, now has no facility at all for line continuation. I will figure it out when I have a burning need.

Concatenation

Unary minus causes a second problem, when interacting with cf0x10’s concatenation behavior.

In cf0x10, space is the concatenation operator: “'foo' 'bar' is 'foobar' # true“. Originally, I allowed the concatenation space to be optional, so “name'@'domain” would be equivalent to “name '@' domain“.

Enter unary minus. In most languages, all three of these expressions mean “a minus b”, but in the original cf0x10 grammar, they could indicate either concatenation with negative b or subtraction:

a -b
a-b
a - b

Other languages do allow space concatenation, but normally only for literals. In Python, for example:

>>> 'a' 'b'
'ab'
>>> a, b = 'a', 'b'
>>> a b # ILLEGAL in Python, but LEGAL in cf0x10
SyntaxError: invalid syntax

In css, space is even an operator and “-” can appear at the beginning of an identifier. The css grammar, avoids ambiguity, however, by making “-” mathematical only when followed by a number, as in “-2px” versus “-moz-example“; “--foo” is illegal. Css gets away with this because it has no variables, making the operand’s type known during parsing.

Since the cf0x10 parser cannot know operand types, Comefrom0x10 has two options. It could decide whether to subtract or concatenate at runtime, or it could make the grammar a bit less forgiving of how you use spaces. Spacing is already significant, so, on balance, I think that making the spacing rules restrictive is the more clear:

a -b # concatenation
a-b # subtraction
a - b # subtraction
a- b # syntax error

The rules are that concatenation requires at least on space and unary minus cannot be followed by a space.

Comefrom0x10: day 5

A few days have passed building a brand new language. About a day and a half getting most of the grammar worked out and another day and a half building the guts of the interpreter. But now cf0x10 programs can basically run, albeit with a limited set of operators and without entirely correctly scope resolution.

Of course, I spent much of the time on the interpreter working out the ideal semantics for the comefrom statement, a much neglected field of computer science. The natural question is, “what happens when more than one statement comes from a single other statement?”

In Intercal, this is simply an error. Though that might seem like a cowardly evasion, it can simply be credited to Intercal’s being a lower-level language: the Intercal programmer needs to handle these situations explicitly by abstaining and so forth.

Comefrom0x10, on the other hand, is a high-level language designed to minimize runtime errors, so Comefrom0x10 has well-defined semantics for all variations of coming from a place.

First, eligible comefroms receive control in the order they appear in the source. Alone, however, this rule causes trouble breaking out of blocks. Consider this program that prints the Fibonacci sequence up to ten thousand:

fib
  b = 1
  comefrom fib
  a
  next_b = a + b
  a = b
  b = next_b

  comefrom fib if a > 10000

If the first comefrom took precedence, the loop would never end. Changing the first line to “comefrom fib if a < 10000” would fix the problem, but require reconfiguring how the variables a and b get swapped. I think the existing version is more clear: last-first wins allows a common idiom of placing your exit conditions at the end of a block.

The first exception to source order, therefore, is that when multiple comefroms are eligible within the same block, the last one wins.

There are other exceptions, but that’s another article.

Comefrom0x10: day 1

Today, I am finally creating a language.

Inspired by Come Here and Come From, Comefrom0x10, pronounced “Come from sixteen”, will be a modern language based entirely around the much-maligned “come from” control structure.

I am writing the first interpreter in Python. I was hoping to have a grammar worked out by the end of day one, but failed to make significant progress. Originally, I thought I would use a parsing expression grammar, but soon reverted to Backus-Naur. The trouble is that I want cf0x10 to include syntactic indentation (as it’s a modern, Pythonic language). This means that I need indent and dedent constructions, which parsing expression grammars can’t easily express.

So, I decide to switch to lalr. As a bonus, using lalr, will, of course, let people easily build million-line cf0x10 programs without worrying about slow compile times.

The Python parser generator landscape is surprisingly bleak. Many of the parser generators available are abandoned or half-assed. Ply seems to be the only real contender. There are two main problems with Ply: first, its mechanism of embedding grammar specification within docstrings makes the parser obnoxiously verbose; second, it doesn’t provide an obvious way to generate a parser that does not require installing Ply.

Ply has a very strong point though: it provides outstanding debug output for diagnosing problems with your grammar. I’ll just live with the verbosity; I might try PlyPlus if it gets too annoying. As far as removing the Ply dependency goes, I’ll wait until the Enterprise customers for cf0x10 want to fork out money to fund writing a build script so I can use the generated parse table without depending on Ply.

Ply’s scanner generator has the usual Lex push and pop state capabilities, plus it makes saving arbitrary lexer state easy. I, therefore, spend a good deal of day one trying to avoid hand-writing a lexer. As usual, this is a waste of time.

The insurmountable problem is that Ply helpfully stops you from providing regexes that match an empty string, presumably assuming they would cause infinite loops. In addition to syntactic indentation, Comefrom0x10 uses syntactically significant blank lines and without either a facility for pushback or empty-matching regex, it is very hard to write a sensible grammar. But that’s another article.

So finally, I just decide to hand-write the lexer and begin to make progress near the end of day one.

When All You Have Is a Class

I care little for static typing, but when a language gives you static types, use the type system.

Say we have a domain class like

public class User {
  public User(String firstName,
              String lastName,
              String email,
              long balanceInDollars) {
  ...
}

I’m in agreement with Stephan that using String and primitive types here is wimping out. That is weak typing. Weak! Chuck Norris does not approve!

The post goes on to describe a more typesafe and relatively elegant implementation. While obnoxiously verbose, its type system is not Java’s real problem. The actual deficit that makes Java less expressive than many others is that it only supports one modeling paradigm, the ubiquitous class.

Take an interview question as an example:

Given an input string, write a program that lists the top n words. So for the sentence, “Do or do not,” the top 2 tokens would be:

do = 2
not = 1

Sort first by frequency, then lexically by token.

A reasonable solution takes four conceptual steps:

  1. Tokenize
  2. Count and map tokens to frequency
  3. Sort
  4. Display the first n

In Java, sorting causes the most code. The most obvious way to maintain type safety demands an entirely new class; it contains two properties, token and frequency; it implements Comparable, equals and hashcode. The implementation then sticks instances of this class into a sorted data structure.

An alternative implementation could eschew the new class, instead jamming data into two-element Object arrays then sort using a Comparator. This would likely be shorter but lose type checking along the way.

The first feels unnatural because the data and the sorting behavior have no intuitive connection. Word frequency means nothing outside the sentence while sort order means nothing outside the problem. The two should live apart but the language foists that artificial marriage on the program. To make the error in this design more clear, consider if the problem specified that sort order should be interchangeable. For example, sort same-frequency tokens by length instead.

The second approach, though more conceptually pure, loses type safety. Using a value class containing token and frequency improves it but once again that class lacks conceptual justification. The token itself is unrelated to its frequency when apart from the sentence.

Like many problems, this one only involves scraps of data and a way to sort them but both Java programs require class implementations in a problem with no natural need for modeling anything as a class.

Not only does Java make you think in classes even when inappropriate, it simply refuses to acknowledge that any other problem model exists. You want collection literals to group data? Forget type safety. First class functions? Better define an interface. Mixins? Sorry, not even multiple inheritance. Declarative domain specific languages? Good luck. When all you have is a class, everything looks like an object.

Groovy, The Gateway Drug

Once upon a time, the only languages I trusted used static typing. This now seems silly, but is a fairly common attitude and for good reason. To a developer trained in C++ and Java formality, using Ruby or Python feels like anarchy. And there is that pejorative term “scripting language.”

At my first job after college I built Java Enterprise Web Services on the gruesome Servicemix platform. My task involved deploying the xml beasts and synchronizing our Cvs repository with the Devil’s own source control. Just maybe, this was a job for a scripting language. Just a taste couldn’t hurt…

So I found Groovy and got myself a copy of Groovy In Action.

At first, my Groovy scripts looked like Java, semicolons and all, since almost any Java program is also valid Groovy. An established codebase full of closures and duck types would have been alien, but to the programmer tiptoeing at the edge of dynamic language, Groovy felt very safe. After time, the same comforting safety eventually made Groovy feel like lugging Java baggage, but I highly recommend it for the programmer monogamous with Java. Just start renaming your Java files with Groovy extensions.

Next, drop a semicolon here and there and those ridiculous “throws Exception” clauses. Then, discover the “each” method; it seems like syntactic sugar, but explore. It involves some wacky thing called a closure. Write a function that takes a closure argument.

Every once in a while, try writing “def foo” instead of “String foo.” Surely dangerous, but promise yourself it will just be this one more time. The examples all do it; how bad could it be?

Ask yourself again, “What’s so great about closures?” Google says they sprout like weeds among the Ruby community. That’s still not a real language, but maybe Groovy is; it works with Java, after all.

Soon enough, you will realize Java feels stifling. “Dicing all this xml sure would be easier in Groovy,” you will think. You will notice and understand dynamic language zealots. The cool kids mostly run Ruby and Python. Java virtual machine startup delay is a drag for scripts, so try one of those.

No variable declarations at all? No damn curly brackets? You will be hooked.

Object Disorientation

Ask not what you can do to your objects, but what your objects can do for you.

The object-oriented primer tells us that objects are collections of data and behaviors. Sadly, modern Java de-emphasizes the behaviors, telling us we just need a bunch of beans wired to a few business functions and… tada… object-oriented magic.

This sort of programming tends to not be object-oriented at all. It generally leads to reams of what Martin Fowler calls “getter confetti,” plus a few thousand-line methods that do all the real work in about as procedural a way as you can make an object-oriented language.

I nominate this pattern for the name “object-disorientation,” or “object-disoriented.” Yes, someone already called it the blob, but the blob differs in its cause. The blob results from “sloth” and “haste.” Object disorientation results from careful, intentional misapplication of object-oriented principles. Someone says to encapsulate data, and someone carefully encapsulates fields, missing the more subtle encapsulation and allocation of behavior.

I suspect it happens because people think of programs as recipes for solving problems rather than reflections of the problems themselves. We tend to ask the steps from here to there rather than heeding Hannibal Lecter and asking of each particular thing, “what is it in itself… what [is] its causal nature… and what is it doing in the world?

Reluctantly, I admit that I have sometimes created getters, particularly in Groovy, which supports them syntactically, but immutable properties only need apply. In all cases, the class in question must mainly exist to do other useful things.

Instead of beans, if you want a collection of data, use a Collection. Pure mindless beans are not object-oriented; they existed in C too but we called them “structs.”

Nothing Exceeds Like Exceptions

As a bright-eyed computer science student, I fell in love with exceptions. Of course, growing up in C++ land, no one explained the idea of an exception type hierarchy, so when I threw them, they were usually ints. But even at that naive time, something in me recognized the value of losing all those “if errorlevel” statements. Something else recognized the tingle of a problem unanswered.

After uprooting my ratty recliner and other worldly belongings for a move to Java land, I discovered the magic of checked exceptions. Wow. The compiler tells me when I should handle an error? Brilliant. Fantastic. That must have been the itch; I could never really specify what I might throw. Then I started writing Java.

// Java
try {
    InputStream captain = new FileInputStream("kirk.txt");
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Well, that was obnoxious.

# Python
science = open('spock.txt')

Alright, Python is arguably a scripting language, ill-equipped to handle the rigors of large pieces of software; that that is not the point. Here, Python and Java do almost the same thing, but Python does it with only one line. The Java snippet silently suppresses an error, if you happen to not be constantly watching your output stream. So do your todo and re-throw FileNotFound as a RuntimeException. Make your code at least as fail-fast as if it were Python.

The point is also not that Python beats Java. Although strong typing may be fascism, the real question is whether the checked exception added any value.

What could I do with that exception? If I were writing a “good-enough” utility, I would want to crash swiftly and furiously. If I were writing end-user worthy code, I would have checked for the file’s existence before trying to open it. In that case, the odds of someone removing the file between my existence check and open attempt would be low enough that the method could reasonably return null, simplify my code, and disintegrate later when I actually tried to read something. My file reading code would presumably be at least as careful as my file opening code, so it could own the error handling for that situation.

So maybe the itch was actually the feeling that excellent code is almost entirely error handling. “Have your functions return error values when things go wrong, and deal with these explicitly, no matter how verbose it might be,” said Joel Spolsky, about the time I was entering my first class on object orientation. His points are true, but he proposes no solution to the need for a heart-wrenchingly ugly fail-fast mechanism in a language. For language designers, I propose this half-baked idea:

Make exceptions uncatchable.

First, this implies that you no longer need the exception class hierarchy because any throw becomes an exit with an error code. So how is this better than System.exit(1)? It gives you a stack trace. Replace all throws with asserts; asserts that really work, that is.

Second, libraries would have to use assert carefully, but when they did, they could use it as a real teaching experience. You might think the file opening method that crashed your entire process ludicrous, but you might learn to check for the file’s existence before opening it. The generated empty catch block has probably never taught anyone this lesson.

On the down side, you now have no way to record these catastrophic errors. Even if the virtual machine printed the stack trace to standard error with a dying breath, you might have forwarded that to /dev/null. Allowing a shutdown hook to intercept the error might help, but you know that would tempt you to do too much with a program in an unknown state.

Oh well. It’s still a good idea because I just thought of it.