Archive for April, 2008

Dead code in Python-generated bytecode

Tuesday, April 22nd, 2008

So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):

  • As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.

  • I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:’), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.

  • You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.

  • I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.

  • You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.

So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.

Why is this annoying? Because the Python compiler generates dead code all the time.

What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.

For example, take the following, very simple .py file:

while True:
        if True:
                continue
        break

This is disassembled into the following:

   SETUP_LOOP
  label0:
    LOAD_NAME True
    JUMP_IF_FALSE label3
    POP_TOP
    LOAD_NAME True
    JUMP_IF_FALSE label1
    POP_TOP
    JUMP_ABSOLUTE label0
    JUMP_FORWARD label2
  label1:
    POP_TOP
  label2:
    BREAK_LOOP
    JUMP_ABSOLUTE label0
  label3:
    POP_TOP
    POP_BLOCK
    LOAD_CONST None
    RETURN_VALUE

Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMP_ABSOLUTE after BREAK_LOOP.

Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.

I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(

PPyA: Python Assembler

Friday, April 18th, 2008

Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.

Each of those days I started to write up this blog post but then got distracted working on it some more

It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.

If you want to check it out, the gitweb project page is here: http://git.paulbonser.com/?p=ppya.git;a=summary

or you can git clone it:

git clone git://git.paulbonser.com/git/ppya.git/

or, if you’re behind a firewall or something

git clone http://git.paulbonser.com/git/ppya.git/

PPya Overview

A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.

(more…)

Python IMPORT_NAME bytecode mystery

Monday, April 14th, 2008

I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).

There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.

I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.

IMPORT_NAME

IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:

IMPORT_NAME namei

Imports the module co_names[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace.

Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.

Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.

import sys

is compiled by the python compiler into the following (as output by dis.disassemble):

   0 LOAD_CONST               1 (-1)
   3 LOAD_CONST               0 (None)
   6 IMPORT_NAME              0 (sys)
   9 STORE_FAST               0 (sys)

I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”

Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin __import__ function, fromlist and level.

According to help(__import__) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.

from sys import argv, subversion, byteorder

compiles into

    0 LOAD_CONST               1 (-1)
    3 LOAD_CONST               2 ((‘argv’, ’subversion’, ‘byteorder’))
    6 IMPORT_NAME              0 (sys)
    9 IMPORT_FROM              1 (argv)
   12 STORE_FAST               0 (argv)
   15 IMPORT_FROM              2 (subversion)
   18 STORE_FAST               1 (subversion)
   21 IMPORT_FROM              3 (byteorder)
   24 STORE_FAST               2 (byteorder)
   27 POP_TOP            

It’s not immediately obvious why it needs to do the extra calls to IMPORT_FROM, but the reason seems to be that __import__ doesn’t actually do anything with the fromlist argument.

Anyway, I need to sleep now.

Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631

Update:

I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.

After looking at the docs for __import__, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.

I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where __import__ is redefined for custom importing, like Thomas said.

In Search of The Perfect Editor

Saturday, April 12th, 2008

I’ve been searching for the perfect editor ever since I started programming. I’ve found many along the way which have worked to get the job done, but I’ve never really found “The One True Editor.”

Yes, I have used Emacs, and no, it is not it. There are many things I enjoy about Emacs, but there are many things I dislike, and recently Emacs got to the point for me where the bad outweighed the good. I could write a whole post on why I’ve forsaken Emacs, but I’m not going to.

I’ve found a couple of editors which meet my minimum requirements of being small, simple, and easily extendable. They also have some cool features built in, which always helps.

(more…)

In Search of Perfect Software

Friday, April 11th, 2008

The Problem

There are many pieces of software in the world, but very few of them are anywhere near what one would consider “perfect.”

I don’t blame anyone in particular for this. Writing perfect software is quite difficult if the software is to accomplish any sort of non-trivial task. Some come close to the lofty goal of perfection.

Even still, there are certain areas where the mark has always been missed, if only enough to be slightly annoying. I’m being purposely vague here because there are lots of different things wrong with lots of different pieces of software. I’ll get more specific later.

Probably the biggest contributer to this problem is the fact that nobody really knows what perfect software is. I usually know what annoys me about a particular piece of software, but I’m not always sure what would be a better solution.

My solution

There are some types of software which have gotten closer than others to perfection. Web browsers, for example. I’ll readily admit that there are things which annoy me about Firefox, but on the whole it’s a good piece of software, and improving all the time. I have some ideas of my own about the was a browser should do certain things, and I’m not about to go hacking about in the monstrous labyrinth that is the Mozilla source code, so I’ve [started my own browser project][mybrowser]. It will probably never catch up with the other, “real” browsers, but it will keep me entertained and provide me a way to prototype various ideas I have for user interfaces.

There are also some good programming languages out there. Languages like Python, Erlang, and many others make programming “fun again.” Still, for every language there’s something that’s missing, or something that might be cool to be able to do, some area which has been left mostly unexplored. Once again, I’ve started [some][booter] [prototyping][stacklang] so I can play around with some new ideas and push some boundaries which I think should be pushed (or at least ones which look as though they may be fun to push).

So what’s the way to write perfect software?

(more…)

Mini Stack Language in Python

Thursday, April 10th, 2008

Are you tired of the lack of challenge in programming? Do you pine for a syntax reminiscent of your old HP RPN calculator? Well then do I have the programming language for you!

Over the last few days I’ve been working on a little stack-based language in Python.

I’ve got lots of idea for programming languages, but a lot of them are pretty complex. I don’t really have any experience implementing programming languages, so I figured I’d start small and work my way up.

In this case, small means 135 lines of code (not counting the GPL notice) without many docstrings or comments. It’s small, so you should be able to figure it all out by the code (I hope you can, because I didn’t comment it that much).

Here’s an outline of some of the features of the language:

Programs are written in Reverse Polish Notation. Datum, made up of strings, numbers, or code blocks are pushed onto the stack as they are encountered. When a ‘word’ is encountered, it is executed, and may do stuff to the stack.

There is a consistent syntax, or rather, mostly a lack thereof. The only reserved characters are single and double quotes and curly brackets (’{’ and ‘}’), after whitespace. Quote characters can be used as parts of words, even, just not as the first character.

All other language functionality is handled through the calling of words. There is no distinction made between built-in words and user_defined ones.

Examples

The ever-popular Hello world program:

‘Hello, World!’ echo_nl

Pretty straightforward. Push the string onto the stack and then print it out (with a newline). The echo and echo_nl commands both pop the top item from the stack and print it out.

Defining new words is as simple as pushing a name and block onto the stack and then calling the add_word word.

’square’ {dup *} add_word

This example calls dup, which pushes a copy of the top item on the stack. It then calls * which pops to top two items from the stack and pushes the result of their multiplication.

Some user input and if and else:

’story’ {
    ‘do you want to hear a story?’ echo_nl
    read_string
    ‘yes’
    =
        {‘Once upon a time…TODO: write story’ echo_nl}
    if
        {"Aww, you don’t want to hear my story." echo_nl}
    else
} add_word
story

Note that since it’s postfix notation that the words ‘if’ and ‘else’ appear after the block which will be run. Also, these are not keywords, they are just like any other built-in word. ‘if’ pushes a 1 to the stack after running its block, or a 0 if it doesn’t run its block. Then ‘else’ (once I actually write it) will run its block if the top value on the stack is a 0, otherwise it won’t run it’s block. I might also add an ‘elif’ function which will do the same as else while also pushing a result back onto the stack as well.

(I just realized that I didn’t actually implement the else function..I’ll fix that later.)

Check out the code for more details. Really, it’s not so hard to read..I think.

Work in progress

This language is somewhat usable at the moment, but it’s still in progress. I’ll probably change the names of all the words, and probably just start calling them functions rather than words, since that name is confusing (shame on you, FORTH, for inspiring me to call them that).

I’ll probably add in more words..err…functions to enable functional programming a-la Joy.

I don’t yet have a specific goal for this language, since I’m writing it for my own education. It might dawn on me that there is a perfect application for this language, which will then shape the future direction. If nothing else, it will be a fun toy for those who would like to create their own simple programming language, since it’s such a small piece of code at the moment.

I might also try to write a compiler to compile it down into Python bytecode, again as an exercise in personal education.

If you’re interested, the code can be gotten with git:

git clone http://paulbonser.com/code/stacklang.git

or the current version (0.0.1.1) is available here: stacklang.py

Got any great ideas of what I could do with this language? Using the code for something cool yourself? Got a patch for me? Let me know!

NSLU2 Almost Set up

Sunday, April 6th, 2008

I got myself an NSLU2 and set NSLU2 Debian up on it. It’s working alright except that the clock battery is dead so it refuses to boot up with NTP turned on.

Because of that, I get to deal with some fun stuff like this:

pib@home ~]sudo fsck /dev/sda2
fsck 1.40.5 (27-Jan-2008)
e2fsck 1.40.5 (27-Jan-2008)
/dev/sda2: recovering journal
/dev/sda2 has gone 13975 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sda2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda2: 23941/3541440 files (0.9% non-contiguous), 441602/7080640 blocks

So apparently this partition, which I created 2 days ago, hasn’t been checked in over 38 years.

Also, there’s stuff like this this:

pib@home ~]ssh slug                      
pib@slug's password: 
Linux slug 2.6.18-6-ixp4xx #1 Tue Feb 12 00:57:53 UTC 2008 armv5tel

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
No mail.
Last login: Wed Dec 31 18:02:18 1969 from 192.168.15.177
pib@slug:~$

Yeah, I guess I’ve got to go find a new battery.

We’ve been Street-Viewed!

Thursday, April 3rd, 2008

Sometime fairly recently Google added Austin, TX and the surrounding area to their Google Street View.

Part of that surrounding area included my street, and therefore, my house!

Sadly it appears that both Angie and I were at work when this happened, so neither of our cars are visible and you can’t see me sitting at the computer I’m at right now through the window.

Well, you wouldn’t have been able to see that anyway, but whatever.

Anyway, check it out:

New Blog Policy: Three Posts a Day

Tuesday, April 1st, 2008

As you may have noticed, this is my third post of the day.

This is part of my new policy to post thrice daily.

It has occurred to me that when it comes to blogs, quantity is of much higher importance than quality.

I think everyone will agree that three posts each day is much better than a good post every few days or weeks.

Don’t you think so?

If you don’t you are tottaly wrong and stupid.

Site Style Updated

Tuesday, April 1st, 2008
Update: As you probably guessed, this was an April Fools’ joke. For those who missed it, you can check it out here: PIBlog on April 1st

As you may have noticed, I’ve updated the design of my blog a bit.

Using my superior web design and CSS skillz.

I think everyone will agree that this is much better.

Don’t you think so?

If you don’t you are tottaly wrong and stupid.

What I'm Listening to

Loading...