A prefix notation programming language

Prefix notation?

Have you ever dreamed of a language which uses strictly prefix (a.k.a. polish, Łukasiewicz) notation?

No? Well, I can’t say I’m surprised. Lisp is often called a prefix notation language, but I’ll let you in on a secret, it’s not purely prefix notation. It uses another notation you’ve probably never heard of: outfix notation.

Outfix notation?

I’d say I made outfix notation up, but I found a reference to it on abstractmath.org, so I at least have something to back this claim up with. Basically, the parentheses are a function which says, “put these items into a list.”

Of course, Lisp uses lists for everything, so you can hardly call it a prefix notation language any more.

Real prefix notation

Now, how about making a real prefix notation language? A real prefix notation language needs no parentheses because it knows how many arguments each function takes, so it can simply pull in the next two expressions following the function name.

A real prefix notation language is a piece of cake to implement, as long as every function has a fixed arity and that arity is known at compile time. Of course, then how do we represent things such as lists with varying amounts of items. How do we pass a variable number of arguments to a function?

The same way Lisp does, we use a list.

Using lists without list syntax

Of course, we have to encode our list a bit differently. For example, take the following simple Lisp snippet:

(setq numbers ‘(1 2 3))

The set function takes a fixed number of arguments, so that’s not a problem. The list at the end, how do we do that? Well, if we assume we have standard lisp functions, we do this:

setq numbers cons 1 cons 2 cons 3 nil

Now that wouldn’t look half bad if it weren’t for the conses of doom at the end there. So what can we do? Well, Lisp has these handy things called macros. Compile-time functions, basically.

Compile time functions

Of course, since we don’t know ahead of time how many items are going to be in the list. Lisp hands macros a list of the arguments to the macro based on what was in the parentheses with the macro name, but once again, we don’t have (or really want, honestly. I’m trying to do something new here!) the parentheses, so this is where we must start deviating from Lisp a bit.

setq numbers list 1 2 3 .

What’s happening here, is that list is a compile-time function which simply requests items from the reader (the thing that takes characters and interprets them into symbols, numbers, and strings) until it gets back the ‘.’ symbol. If it’s anything else, it just has to tell the compiler to compile it as it would a normal expression. This means that we can call functions as we please. The difference between this an a “normal” outfix operator is that it is controlled by the code itself, rather than built into the compiler. The idea is that you could define your own compile-time functions to extend or change the “syntax” as you see fit.

For example, if you think the above list function is a bit too wordy:

def-compile ( :
  list-while /= setq token reader-next
                   ’)
    compile token

Assuming “list-while” is a function which will build a list with the value of each iteration, this code does the following:

  • Define a compile-time function bound to the symbol “(” which does the following:
    • Read the next token
    • If the token is not “)”, have the compiler compile the token as usual, returning the result
    • If the token is “)”, then we are done.

With this new compile time function, we can write nested lists as easily as we can in lisp:

setq nested-numbers ( 1 2 ( 3 4 5 ( 6 7 ) ( 8 ) ) )

Well, that’s almost as easy as lisp, there’s one problem: those spaces are mandatory. Since “(” and “)” are just plain old symbols, the whole thing would be read as one symbol if we took the spaces out. This can be solved by modifying the reader to recognize “(” and “)” as complete symbols.

Reprogramming the reader

Without going into too many specifics of how the reader is implemented, making it possible to leave out the spaces around parentheses would be something like the following:

push-reader (
  expr "(" ( add-char output-symbol
    push-reader (
      expr ")" ( add-char output-symbol pop-reader )
      symbol ")" ( output-symbol ) ) )

This makes the reader treat a single “(” as a symbol when it’s expecting an expression, and also pushes a new reader state including “)” onto the reader stack. This is popped off when a “)” is found. This means that “)” will stop being special once the one matching the first “(” is reached. This shows a bit of the power of the reader implementation, allowing the syntax to be extended for just a portion of the compilation.

For example, something like the following wouldn’t be too hard to implement using this system:

setq json-snippet json {
  "a": [1, 2, 3],
  "b": ["x", "y", "z"],
  "foo": "bar"
 }

It would simply have to modify the reader to recognize “{”, “}”, “[", "]“, “,”, and “:” as separate symbols with or without whitespace, and add functions for “{” and “[” which also take the commas and colons into account.

No longer lisp, but lisp-like

At this point what we’ve got probably can’t be called “a Lisp” anymore.

There are some things that this language will need that Lisp doesn’t. One thing is that it will need to keep track of functions as they are defined in the code, noting how many arguments they take so it can handle them being called later on. When a function is passed as an argument to another function, it will need to be called using something along the lines of funcall or be given a manual arity hint if you want to call it like other functions.

The “everything is a list” feature of Lisp is kind of lost, though the internal representation could end up being basically the same.

This language doesn’t exist … yet

This is something which is more concept in my brain and on paper at the moment. I’ve done some prototype implementations of the reader, some prototype implementations of the prefix notation function calling, and some prototype compile-time functions (implemented in the host language, however).

What’s left now is to put the whole thing together into a system in which every part is well defined and accessible from within the language itself.

I’m also thinking that I’d like to build this thing from the ground up with an underlying object-oriented system. Not class-oriented, mind you, just objects. Something prototype-based I’m thinking, like JavaScript, Io, Pepsi, and so on. In the end a user-mutable object-functional language is what I think I’ll end up with.

Any comments, suggestions, ridicules, etc… would be well appreciated. Also, if anyone has seen anything like this already in existence from which I could draw inspiration and ideas, please let me know. I’ve already looked at Cola/Coke/Pepsi and gotten some good ideas from there.

Tags: , , ,

13 Responses to “A prefix notation programming language”

  1. Brian Says:

    Generally it is called ‘circumfix’.

  2. John Wright Says:

    Have you checked out the Rebol language yet? Your language definitely seems to be in the same family. The fixed arity thing was first done in Logo, I think.

  3. Fabien Says:

    What I can’t find in your post is: what problem is there with outfix notation that would be fixed by Strict-Prefix-Fixed-Arity notation?

    I suspect that your issue is that Common Lisp, and to a slightly lesser extent Scheme, are exceedingly verbose; this is an issue specific to these languages, though, nothing intrinsic to S-exprs: look at modern Lisps such as Qi, Arc or Clojure (by far my favorite) for counter-examples.

    My biggest fear with SPFA notation is that it might significantly reduce readability, unless you enforce by convention regularity rules (which would mostly turn it back into an outfix notation). Notice that in dynamic languages, variable arity functions are used quite pervasively, and limiting them is a rather significant concession.

    However, the only right way to assess an idea is to try it :) You might want to have a look at Metalua, as a platform to rapidly get a prototype running on an efficient VM (http://metalua.luaforge.net)

  4. Mark Lee Smith Says:

    Check out the Rebol programming language. It’s intrinsically similar to the language you defined here, but without the forth-like compile-time functions. Perhaps importantly to you, it uses this “pure” prefix notation.

  5. sampo Says:

    Well, I don’t know if this is helpful but I’ll tell what kind of language I have been thinking about:

    1. Lispy & Smalltalkish

      • metaprogramming
      • reflection
      • “live” environment
      • everything is still an object (at least can be treated as one)
    2. MLish

      • optional static typing
      • type inference
      • first class functions
      • pattern matching

    Other important points: - parallelizable - good and easy to use FFI (for example C-interface) - good performance (very good at least with certain tricks)

    The syntactic approach you described sounds really good.. :-)

    How about implementing it in OCaml on LLVM (http://llvm.org/docs/tutorial/OCamlLangImpl1.html) I’d be very interested in contributing ;-))

  6. bob jones Says:

    So… You’re reinventing Forth? only… backwards. A queue-based language?

  7. pib Says:

    Fabien: There’s not anything wrong with outfix notation, and I’m not necessarily advocating pure fixed-arity prefix notation.

    My original idea was to make an esoteric language which was purely prefix notation, but in researching how to implement that, it occurred to me that something actually powerful and/or useful could be created with the simple parser and compile-time function system that I’d come up with.

    The prefix notation simplifies the parser to the point where it’s really easy to extend and add new syntax, including compile-time functions which have a variable arity. The difference is that it’s up to the compile-time function to decide what arguments it takes, if any. The built-in compiler code doesn’t have to know anything about arity or pulling arguments. Each function can do that on its own as well as temporarily modify the syntax for its own purposes.

    I’m already planning on list and dictionary/object literal “syntax” being a standard part of the language, or at least the standard library of the language, so it won’t be as unreadable as some may be imagining here.

  8. Sébastien Says:

    If you’re looking for an infrastructure to implement your language, you can have a look at LambdaFactory (git clone http://www.ivy.fr/lambdafactory/lambdafactory.git) it’s not publicly announced/released but it’s been in production use for 2 years.

    LF allows you to generate a program using a very simple API following the factory design pattern, this program can then be translated in Python, ActionScript and JavaScript. You basically just need your parser to invoke the factory to create the program (instead of creating an AST).

  9. pib Says:

    I actually looked at Rebol quite a while ago. At first it seemed like just what I was thinking, but then it occurred to me more and more that it was too much in the wrong direction for what I wanted to do.

    Rather than try to explain why I don’t like Rebol, just read this and assume that I pretty much agree with his arguments against even wanting to learn the whole language: http://arcanesentiment.blogspot.com/2008/08/why-i-havent-looked-at-rebol-much.html

  10. Konrad` Says:

    Hi,

    I don’t see why the commas nad colons should be necessary inside {} and [] leaving it at whitespace separation seems adequate. OK allowing colons for key value pairs might imporove readability a little.

    Also How would you handle conditionals and loops. I’ve had a play with it but couldn’t really come up with an intuitive way to express an if else if else structure. especially if each condition might have more then one statement inside it.

  11. pib Says:

    Konrad: The commas and colons are only there for the JSON example, since I was showing how you could support different formats.

    For the actual list and dictionary literals I was planning on leaving the commas and colons out.

    As for conditionals and loops, I was thinking something along the lines of how Ruby does blocks.

    As for if .. else if, that’s easy. You just make if always have an else clause, and then for the else clause you just put another if. That’s actually how else if works in languages like C. Perhaps define elseif to be an alias for if so it reads nicely.

    if something do
        do-some
        things here
      end
    elseif some other thing do
        do some
        more stuff
      end nil

    that nil at the end there is the else condition for the second if.

  12. Konrad Says:

    hmm… keywords instead of braces. I think I’d lean towards pythonic whitespace delimited block syntax. or just stick with the parentheses. adding explicit end keywords seems like 1 step forward and three steps back (at least to me anyway).

  13. pib Says:

    Konrad: Yeah, I guess I wasn’t clear enough. I’m not suggesting using keywords instead of braces, I was just emphasizing that it was a compile-time function implementing them rather than hard-coded syntax. Notice how later I went on to show how parentheses would be implemented…

    There will be braces in the language, for sure, but they will be more flexible than in your usual programming language.

    Python-type block syntax would also be possible. You’d just have to pull out the portion of the reader that skips whitespace and make it turn the whitespace into a symbol if it comes after a newline or something, then keep track of the current indentation.

Leave a Reply

What I'm Listening to

Loading...