Posts Tagged ‘Programming’

A prefix notation programming language

Sunday, November 16th, 2008

Prefix notation?

Have you ever dreamed of a language which uses strictly prefix (a.k.a. polish, Ɓukasiewicz) notation?

No? Well, I can’t say I’m surprised. Lisp is often called a prefix notation language, but I’ll let you in on a secret, it’s not purely prefix notation. It uses another notation you’ve probably never heard of: outfix notation.

Outfix notation?

I’d say I made outfix notation up, but I found a reference to it on abstractmath.org, so I at least have something to back this claim up with. Basically, the parentheses are a function which says, “put these items into a list.”

Of course, Lisp uses lists for everything, so you can hardly call it a prefix notation language any more.

Real prefix notation

Now, how about making a real prefix notation language? A real prefix notation language needs no parentheses because it knows how many arguments each function takes, so it can simply pull in the next two expressions following the function name.

A real prefix notation language is a piece of cake to implement, as long as every function has a fixed arity and that arity is known at compile time. Of course, then how do we represent things such as lists with varying amounts of items. How do we pass a variable number of arguments to a function?

The same way Lisp does, we use a list.

(more…)

Quick bash one-liner to find a rogue newline

Wednesday, July 16th, 2008

It’s been far too long since I’ve posted, so I’m writing a short post about a quick one-line I just used to solve a problem.

The problem was a rogue newline appearing at the beginning of some generated XML files, which is against the rules for XML.

This problem, and a similar one involving data being sent before headers can be sent, often happens in PHP when an extra newline is included after the closing “?>”. One way to fix it is to just leave off the closing bit, since PHP is smart enough to realize the file has ended in PHP mode.

Anyway, we had to track down which file had this problem in it, and the solution ended up being this:

for i in `find . -name ‘*.php’`; do echo $i:`tail $i -n 1` | grep -v \?>’; done

That finds each php file and checks its last line for “?>”, printing it out if it’s not there.

Of course, there will be some false positives for PHP files which have HTML after their PHP code or don’t have the closing “?>”, but it’s good enough to track down those potentially offending files.

In Search of Perfect Software

Friday, April 11th, 2008

The Problem

There are many pieces of software in the world, but very few of them are anywhere near what one would consider “perfect.”

I don’t blame anyone in particular for this. Writing perfect software is quite difficult if the software is to accomplish any sort of non-trivial task. Some come close to the lofty goal of perfection.

Even still, there are certain areas where the mark has always been missed, if only enough to be slightly annoying. I’m being purposely vague here because there are lots of different things wrong with lots of different pieces of software. I’ll get more specific later.

Probably the biggest contributer to this problem is the fact that nobody really knows what perfect software is. I usually know what annoys me about a particular piece of software, but I’m not always sure what would be a better solution.

My solution

There are some types of software which have gotten closer than others to perfection. Web browsers, for example. I’ll readily admit that there are things which annoy me about Firefox, but on the whole it’s a good piece of software, and improving all the time. I have some ideas of my own about the was a browser should do certain things, and I’m not about to go hacking about in the monstrous labyrinth that is the Mozilla source code, so I’ve [started my own browser project][mybrowser]. It will probably never catch up with the other, “real” browsers, but it will keep me entertained and provide me a way to prototype various ideas I have for user interfaces.

There are also some good programming languages out there. Languages like Python, Erlang, and many others make programming “fun again.” Still, for every language there’s something that’s missing, or something that might be cool to be able to do, some area which has been left mostly unexplored. Once again, I’ve started [some][booter] [prototyping][stacklang] so I can play around with some new ideas and push some boundaries which I think should be pushed (or at least ones which look as though they may be fun to push).

So what’s the way to write perfect software?

(more…)

On productivity, flow, and the “big block”

Tuesday, January 15th, 2008

The “Big Block” Method

Joshua Clanton posted a guest post on Jarkko Laine’s blog yesterday, talking about how to get into a state of “flow” to increase your productivity.

I commented that I seem to be able to get into a state of flow better when I have a large chunk of time allocated to work on a task.

To me there’s nothing worse than coming back to something I had to stop in the middle of. It takes far too much time to get back up to speed on what I was doing, and often I’ll have forgotten about something essential, which results in bugs.

Joshua pointed out in another blog post that it helps flow to have not only big blocks of time but also to have your tasks divided into big blocks as well.

I definitely agree with Joshua on the “big block method,” though this is the first time I’ve put much conscious thought into it. Looking back, it seems that I’ve done better work when I’ve used this method, and I’ve always hated working in short bursts, since it feels like I’m wasting too much time in “context switching.”

How big of a block?

There’s one question I’m not really sure of the answer to yet: How big should a big block be?

The first answer that comes to mind is “Just big enough, but no bigger” (a modified Einstein quote)

So far my technique has been to just pick what seems to be a single functional unit, or perhaps a piece of functionality that all belongs together.

I suppose that different people will have different definitions of what a big block task is, but to me it works out that it’s something that is a complete unit. A complete unit can be a full module, or one step in the implementation of a full module. The key is, though, that there is a clear stopping point and something isn’t left unfinished.

For example, working on my browser, I have index cards (old business cards, actually) which each have an individual task on them. I sort these cards by the order I plan to do them in. These cards contain something like “Implement DOM 2 HTML” or “Rendering Engine support for inline elements.”

The former is a complete module which could (and was) finished in one big block section of time.

The latter won’t result in the final version of the layout engine, but it will get it to a point where it is functioning, and then the next step would be to support block elements. Absolutely and relatively positioned elements would be yet another separate task.

I’d be very interested to hear more ideas and thoughts about the big block method. How about adding a comment or writing a blog post with a trackback?

Python Web Browser on the way

Sunday, December 30th, 2007

I’ve spent a good chunk of my vacation working on some of what will become the internals of a web browser written in Python.

Some of the goals of the browser include:

  • Full conformance to all DOM 2 Modules (and equivalent DOM 3 modules when they become recommendations). This goal is already about 60% done.
  • Full conformance to CSS2.1, and eventually CSS3
  • Javascript support.
  • SVG and Canvas support
  • Little or no explicit support for deprecated standards and technologies (yes, this is a feature).

The most important features, and thus the ones getting the most attention, will the standards compliance and JavaScript support. Standards compliance is important because I want this browser to be an example of a browser which people should take seriously. I won’t, however, do extra work because somebody out there decided to not follow the standard when designing their webpage. JavaScript support is important for the same reason. Nobody is going to take a browser seriously (or be able to use it for any modern website) if it doesn’t support JavaScript.

So far I’ve got a complete (but also completely untested) implementation of the DOM 2 HTML, which took me a good amount of time longer than expected.

I started with a good base: pxdom, a complete implementation of DOM 3 Core and LS (Load and Save), and implemented my additions on top of that. It’s still a separate module, though, but there are a few places where I rely on some of the implementation specific details of pxdom. I have plans to remove the dependency at some point so that I can swap in other DOM core implementations.

On top of that I built my DOM HTML implementation, and laid a little bit of groundwork for DOM StyleSheets and CSS. I’ll be using cssutils, which is a mostly complete python DOM CSS implementation. The cssutils version 0.9.4b1 was just released with mention of some sort of selector support being added in version 0.9.5, which will hopefully make it so I don’t have to do a full CSS cascade implementation myself.

I’m taking a break from working on DOM implementations now and moving back to something which will actually allow me to see results: the rendering engine. I’m starting with just a stub implementation of the ViewCSS interface which allows me to use the getComputedStyle function to get the default style for any given element. With that, I should be able to render any HTML document as if it had no style applied to it. Later on I will hopefully be able to use the upcoming selector support in cssutils to make getComputedStyle work as expected.

This browser is something that I’ve been wanting to do for a very long time. I even sort of started to implement the rendering engine a long time ago using pycairo as the backend. I’m going to stick with that because it seems to be an ideal rendering backend for webpages (which would explain why it will be the only thing used for rendering as of Firefox 3.0 :) ), and for eventual SVG and Canvas support. Once I get to the part where I’m actually working on the user interface portion, I’m planning on writing a Python binding for glitz, which will provide the browser with OpenGL accelerated rendering by default.

Doing it Right from the Beginning

Monday, December 17th, 2007

My coworker Scott Martin recently posted a list of things to do from the beginning of a project..

I had planned to post something along those same lines myself, but I guess he beat me to it.

I’ll throw in my two cents with a much shorter and vaguer list, with my comments and a few extra items as well. This will make a lot more sense if you read the list at the above link first.

  • Internationalization + Smarty

    These two items go together. Basically, there shouldn’t be any text which makes its way to the users which comes from anything but a template file. Seriously, don’t put text in your code.

  • Code standards

    Here’s a secret: what your standards are really doesn’t matter. It’s the fact that you have standards that matters. It’s really annoying to read code with two different layouts. Pick one and stick with it.

  • Code generation

    I disagree with Scott about code generation being bad. The one thing that needs to be done for it to be okay, however, is for it to be generated, and updated, automatically. The first code you should write isn’t the code to generate the code, but the code to call the code to generate the code when the source of the data used to generate the code is changed.

    For example, if you are reading in one file to generate another, then before you read in the file, run some code which checks the last modified time of the generated file and compares that with the last modified time of the source file, if the source file is newer, regenerate.

    If you feel so inclined, also check the timestamp on the code which generates the file, and if it is newer than the generated file, regenerate also. This will get rid of the instance where you change the generator code but don’t change the input file.

    Alternatively, if you have some sort of interface or tool which generates the input file, have that tool run the script to update the generated file.

  • URL rewriting

    As we all know, Cool URIs don’t change. So make some sensible decisions at the start of your site design, and stick with the URLs from there on out. All you need is a single script to redirect all URLs to (except images and CSS and things), which then delegates the work out to some actual PHP files. This is also helpful in that it makes it so you don’t have to have all of your PHP files inside of your document root, just the one that does the routing, which brings me to my first new rule…

  • Keep non-documents out of your document root.

    The fewer things you actually have in your document root, the fewer unexpected security holes there will be. If a file is not going to be directly accessed from a web browser, don’t put it where a web browser can get to it. All of your configuration files and almost all of your code should be kept out of your document root.

    A structure like the following works well:

    • application root
      • htdocs - static HTML files, images, CSS files, a single code file to load in other files based on the requested URL
      • src - all of your actual code. Set your inclide_path to here and keep everything neatly organized into subdirectories
      • templates - Smarty, or whatever templates you are using.
      • cache - whatever sort of disk cache you need. This includes Smarty cache, RSS feed cache, etc…
  • Don’t Repeat Yourself (DRY)

    This is basic stuff right here, but it can never be said enough: Don’t write code that does the same thing as code you’ve already written. Or if you have to do so, then combine the two pieces of code together. It is a huge pain when you have to figure out why something is working the old way when you already changed the code that does that thing to do it a new way.

    If you are copying and pasting code, then something is probably wrong.

  • Don’t Repeat Other People/Don’t Reinvent the Wheel

    This one is a bit different. What I mean by this is don’t write any code that you don’t have to. If somebody else has a library that already does what you want, use that, unless there is a very, very compelling reason for you not to. If a library has licensing issues, or is really slow, or doesn’t fit into your platform somehow, it might makes sense to write it yourself, but first check to see if there’s already something else which meets your needs.

    Spending an hour searching something is going to take less time than writing it yourself.

What guidelines do you follow from the start with every project that you do?

BooTer Reduced

Friday, November 30th, 2007

Edit: Apparently the way to prove a new esoteric language is turing-complete is to implement a BF interpreter. This technique has been used to show that LOLCODE is turing complete, so I suppose I could target that as the first non-trivial program to write as soon as I get a working BooTer interpreter :D Or, I suppose I could implement a LOLCODE interpreter in BooTer in which I could run a BF interpreter…

Thinking about BooTer, I decide that I wasn’t making it esoteric enough..so I’ve slimmed down my specification for simpler implementation, and much more difficulty doing anything useful with it :)

The only thing allowed is simple expressions and boolean ternary expressions. No comma-separated lists of expressions, no assigning expressions to variables, no symbols, no quoting of expressions.

Maybe in the future I’ll re-expand it out into a more “real” language, but for now I just want to do something that I can do simply and easily without accidentally building an entire broken LISP.

So, with all that stuff gone..what’s left? What’s different now?

  • The boolean-ternary operator is in an order which won’t drive me insane to write programs in. The boolean expression comes first. If true, the expressions evaluates to the second expression, if not, it evaluates to the third. This now works basically just like the ternary operator in most other languages…but with a different syntax and the possibility of using the re-eval operator.

  • A BooTer program is now a single boolean-ternary (booter) expression, with optional nested booter expressions.

  • The center portion of a booter expression (the “boo” part), must evaluate to either true or false. False is represented by the integer 0, or by null. Anything else is true.

  • null is represented by the lack of a value. This is useful for instances where the left or right portion of an expression will never be used, for example:

( *…do something here…* : x = 5 : )
  • There are only numbers and strings as primitive types.

  • Variable names can start with upper or lowercase letters.

  • There’s only a single array type. Array items are assigned and accessed using the standard [] subscript operator. Creation of an array is implicit with the assignment of the first item to that array. If you use the subscript operator to make an assignment to a previously existing variable, the old variable is overwritten with the new array. The previous array literal syntax still applies. Only primitive values and variable names can be used in the array literal syntax.

  • Math and boolean order of operations are the same as in most programming languages these days.

  • The re-eval operator, ^, still works as previously stated.

  • Comments start with a semicolon and continue to the end of a line.

The previous example programs are both a bit more complex than they were previously (but hey, the parser and interpreter will be much easier to implement!):

Infinite NOOP loop:

( ^ : : )

100 bottles of beer:

(n = 101 :
  (
    (n = n-1 : 
      print [n, "bottles of beer on the wall!\n"] : 
    ) : 
    ^ : 
  ) : 
)

In the beer example, assume that print prints the items passed into its array to the stdout and then evaluates to the same string it printed out.

  • While n is greater than 0, the sub-expression will evaluate to the string returned by print, which by the rules above equates to true, thus causing the re-eval operator to be evaluated, looping back to the expression (n = n-1).

  • When n finally gets assigned 0, the expression will evaluate to the third, empty, sub-expression, which equates to false, which means the center of the outer expression now evaluates to false as well, and thus the third expression of the outer expression is evaluated. It evaluates to null, which is the return of the program.

Here’s a BooTer “for loop” with more of an explanation of what’s going on:

(i = -1 :              ; initialize loop variable
  (
    (100 > i = i + 1 : ; increment loop variable, check it against condition
      ...              ; do things, must evaluate to true
    :
      ...              ; this one must evaluate to false (leaving it empty works)
    ) 
  :
    ^                  ; re-evaluate this expression, causing the loop
  :
    ...                ; whatever can go here, this will be evaluated after the loop is done
  )
: )                    ; done, this last sub-expression will never be run

Even with this new “simplified” syntax, it’s enough to make your head hurt. Now I just need to actually write a reference implementation with some standard library calls, then try to write something non-trivial with it. Who wants to try their hand at writing a BooTer web server?

Fun With Bookmarklets

Saturday, November 24th, 2007

I’ve got a project planned for the near future which will require some sort of interaction with webpages other than my own. There are two ways I’m considering doing this: browser plugin, or bookmarklet(s). The problem with doing a browser plugin is that I’d have to write a separate plugin for each browser or, worse, only support a single browser. So I’m leaning toward bookmarklets to begin with. As with most things, I figured I’d get a bit of experience with bookmarklets before trying to do anything too complicated.

My first bookmarklet

According to this page on bookmarklets, the longest I can make a bookmarklet and have it still run in every browser (read: IE6) is 488 characters. That’s hardly enough to do any sort of cool Web2.0 stuff, so I need to load in an external script. Despite what the previously linked site says, it is indeed possible to insert script tags into the header of more than just IE, so that’s what I’ll do:

(function() {
    var head = document.getElementsByTagName(‘head’)[0];
    var newscript = document.createElement(’script’);
    newscript.type=‘text/javascript’;
    newscript.src=‘http://blog.paulbonser.com/files/bm/js/google.js’;
    head.appendChild(newscript);
})()

There, only 250 characters when I take out all the whitespace and add “javascript:” to the beginning. Notice that everything is wrapped in anonymous functions; this avoids putting unnecessary stuff into global namespace, which should prevent this code from conflicting with any code already on the page it’s being inserted into. The one thing that I do put into the global namespace is the blog_paulbonser_com_close_google_box() function, which I named to avoid conflicts.

google.js:

function blog_paulbonser_com_close_google_box() {
    var gb = document.getElementById(‘blog_paulbonser_com_google_box’);
    gb.parentNode.removeChild(gb);    
}

(function() {
    var body = document.getElementsByTagName(‘body’)[0];
    var head = document.getElementsByTagName(‘head’)[0];
   
    var newstyle = document.createElement(‘link’);
    newstyle.href = "http://blog.paulbonser.com/files/bm/css/google.css";
    newstyle.rel="stylesheet";
    head.appendChild(newstyle);
   
    body.innerHTML += ‘<div id="blog_paulbonser_com_google_box"><div><a href="#" onclick="blog_paulbonser_com_close_google_box(); return false;">Close</a></div><iframe src="http://google.com"></iframe></div>’;
})();

Go ahead, give it a try.

Example Bookmarklet: open an iframe

There you go, google popping up in the middle of my page. The real power of this, however, is that it should work on any page (except for pages with frames, since they don’t have a top-level body element). If you drag the link up to your bookmark toolbar, or rightclick and select “Bookmark this link”, etc, you will then be able to go to any page, click on that bookmark, and have a google search page pop up right in the middle.

My first “useful” bookmarklet

That’s a good start, but how about something that might actually be useful? Here’s an attempt at just that.

Lightboxify Bookmarklet: lightboxify page

This bookmarklet, plus a slightly modified version of the wonderful Lightbox plugin by Lokesh Dhakar, will take any links to images on the current page and “lightboxify” them. Give it a try on this page with the links below (some random images pulled from my computer). Before running the bookmarklet, it will navigate away from this page to the image. After running it: instant fancy image gallery. (Okay, there are still no thumbnails, but that would require a bit of extra work.)

If you feel so inclined, you can also drag this bookmarklet to your bookmarks toolbar, go to the Apache-generated directory listing, and give it a whirl there, as well.

1 2 3 4 5 6 7 8

An Introduction to BooTer

Friday, November 23rd, 2007

Intro

For a long time I’ve had lots of ideas for a programming language with all sorts of advanced features. Of course, I’ve never written a programming language before, so I’d like to get a bit of experience before attempting to create the next killer language. So what I needed was a simpler language to implement first. BooTer (Boolean-Ternary) is the result of some brainstorming of a “simple” first language.

I’m calling it an Esoteric Programming Language, since it probably won’t be that useful for any sort of real programming. It will be Turing Complete, if a bit difficult to write anything non-trivial in.

I don’t have the whole language figured out yet, so any of the information here will be subject to change at any time.

Syntax

A BooTer program is a series of nested expressions. Every expression evaluates to a value. An expression can be one of the following:

  • A simple value.

    These are your standard primitive values, such as integers, floating point numbers, strings, symbols, or variables:

      1
      1.1
      "blah blah blah"
      fooSymbol
      BarVar

    A symbol is a symbolic constant, with no specified value. Symbols start with a lowercase letter, while variables start with an uppercase letter.

  • An array or hash.

    These are similar to javascript array and object literals:

      [1, 2, "blah blah", 4.2]
      {blah: "blah blah blah", foo: "foo foo foo", aNumber: 124}
    

    Both arrays and hashes can hold any expressions, including more arrays and hashes, of course. Hash keys can be any of the simple types, numbers, strings, or symbols.

  • A comma-separated list of expressions.

    This will evaluate to the value of the last expression in the list:

     1, 2, "blah blah", 4.2
    

    Each expression will be evaluated in order. This is useful for doing things like processing input and output or changing the value of a variable before evaluating another expression using it.

  • A variable assignment.

    This will evaluate to the value of the variable after it is assigned:

     Avariable = "something"
    

    The right side of the assignment can be any expression, optionally wrapped in a set of parenthesis for disambiguation, in case the variable is being assigned the value of a comma-separated list of expressions.

    Avariable = 1, 2, 3, 4

    Will assign the Avariable to the interger 1 and then go on to evaluate the rest of the expression, whereas

    Avariable = (1, 2, 3, 4)

    will assign the value of Avariable to be the integer 4 after evaluating the rest of the expression.

  • A boolean expression.

    These are expressions with the standard ==, <, > <=. >=, !=, !, &&, ||. The result of any boolean expression is either “true” or “false”, and in the case of !, &&, and ||, those are the only valid values to pass in to them. This means that !Avar is not a valid boolean expression unless Avar’s value is currently one of the two boolean symbols.

  • A boolean-ternary expression.

    ("yes" : X < 42 : "no")
    

    This expression was the inspiration for the BooTer language (hence the name), and is the sole form of flow control that the language has. In the center is a boolean expression. If the boolean expression evaluates to true, then the expression evaluates to the evaluation of the subexpression on the left. Likewise, if the center expression evaluates to false, then the expression evaluates to the evaluation of the subexpression to the right.

    Why the emphasis on “the evaluation”? Because the subexpressions are not evaluated at all unless the center expression evaluates in their favor, at which point the value of the whole boolean-ternary expression becomes the value of the evaluated subexpression.

  • Special forms. For lack of a better name, everything else that is needed for a complete language will be called a special form.

    • built-in functions.

      These are functions built into the language which allow for things such as IO and Math. I haven’t fully made up my mind about the syntax of calling one of these built-in functions, but I’m thinking it will be the name of the function followed by a single expression, allowing for single values to be passed in for simple functions, or arrays or hashes to be passed into functions requiring more parameters. Passing in a single expression keeps the syntax simple.

    • ^ - The Re-eval operator.

      This operator causes the parent expression of the current expression to be re-evaluated, or the parent of a parent, etc.. This, combined with the boolter operator, provides the language with a means for looping. An endless loop could be acheived with the following complete BoolTer program:

      print "Hello World!", ^
      

    The simplest possible BoolTer program could consist of a single ^, thus looping forever doing nothing. A (slightly) more useful example would be a program to print the song 100 Bottles of Beer on the Wall:

      N = 100,
      (N = N - 1, ^ : print [N, " bottle(s) of beer on the wall!\n"], N > 0 : done)
      

    “done” has no meaning, it’s just used as something to evaluate to, since the expression has to evaluate to something. For going back up more than one parent-expression, simply put the appropriate number of re-eval operators next to eachother, without a comma. For example, changing “done” to “^^” in the previous example would cause the program to loop endlessly, counting down from 100 beers again and again.

    • ‘ - The quote operator.

      This operator, causes the following expression to evaluate to the expression itself, rather than to the evaluation of the expression. In other words, it defers the evaluation to a later time. This can be used to save expressions into variables or pass expressions into built-in functions. Most importanty, it allows expressions to be used as something like functions.

    • Temporary bind.

      I haven’t figured out how I want to do this yet, but I want to allow temporary binding of variables for a certain expression, something like let in lisp.

There are still a few details that I need to work out in regards to how everything is going to work, but I do have a working lexer and a mostly-complete parser, thanks to Flex and Lemon. All I need to do now is decide how I want to represent the parse tree, build it, and then get a working interpreter going. Due to the way the language is set up, it seems like it wouldn’t be too hard to generate assembler from it and make compilable programs…but first I want to see the language running at all.

Extending Javascript: Tail Recursion

Sunday, November 18th, 2007

Javascript is a very powerful, yet underrated, programming language. Despite its name, it is neither Java-like nor “just a scripting language.” It uses a generally unfamiliar concept known as Prototype-based Programming, which is object-oriented, but without classes. A couple of common questions about Javascript are “How can I run multiple threads?” and “How do I pause execution of my program for X seconds?”

The common answer is “You can’t, but you can do something similar with setTimeout.”

If you want to wait for X milliseconds before doing something, you simply setup a callback with

   setTimeout(’something’, X);

However, any callbacks will have to wait until the current function finished executing before executing, because Javascript is single threaded, and setTimeout simply puts its function call on a queue of things to be done later. After each function is finished running, the queue is checked for scheduled tasks, and any task whose timeout has expired will be run. Firefox even seems to defer updating of page content until after the currently running Javascript function returns. This is why on some pages with poorly crafted Javascript browsers will give warnings aboout scripts running for too long or consuming too much processor time.

So what do you do if you want to do large amounts of calculations, perhaps in loops that will run for many seconds or minutes? Once again, setTimeout is the way to go. As long as each function only runs for a very short period of time, you can run several different tasks “simultaneously” without killing your browser.

There is one thing, however, that bothers me about using setTimeout: It makes for some ugly code, and usually requires you to either wrap a string or anonymous function around your callbacks. Worse, it requires you to think about callbacks at all. So something as simple as running a loop that won’t hijack your browser becomes a pain. For example, here’s a simple function to count between two numbers within an element on the page:

(more…)

What I'm Listening to

Loading...