Archive for the ‘Programming’ Category

Quick bash one-liner to find a rogue newline

Wednesday, July 16th, 2008

It’s been far too long since I’ve posted, so I’m writing a short post about a quick one-line I just used to solve a problem.

The problem was a rogue newline appearing at the beginning of some generated XML files, which is against the rules for XML.

This problem, and a similar one involving data being sent before headers can be sent, often happens in PHP when an extra newline is included after the closing “?>”. One way to fix it is to just leave off the closing bit, since PHP is smart enough to realize the file has ended in PHP mode.

Anyway, we had to track down which file had this problem in it, and the solution ended up being this:

for i in `find . -name ‘*.php’`; do echo $i:`tail $i -n 1` | grep -v \?>’; done

That finds each php file and checks its last line for “?>”, printing it out if it’s not there.

Of course, there will be some false positives for PHP files which have HTML after their PHP code or don’t have the closing “?>”, but it’s good enough to track down those potentially offending files.

Storing Hierarchical Data in CouchDB

Friday, July 4th, 2008

Much to my surprise, my last post generated more traffic in a single day than my blog has ever gotten in a single month. Apparently people are quite interested in making web applications with Python. I’ve started on part two, but since so many people showed interest I want to spend more time on it than I spent on the last one. So instead, you get this post.

So I’ve been fiddling around with CouchDB lately. Since it’s common to store tree-based data, and it’s kind of a pain to do so in your standard relational DB, I thought it would be a good exercise to see how hard it is to store hierarchical data in CouchDB.

Turns out it’s pretty easy.

(more…)

Building a Python Web Application, Part 1

Thursday, June 26th, 2008
Edit: I’ve cleaned up the longer example, using Python’s string.Template module for the templates. I’ve also set up a git repo for the source that will go along with posts to this series: Python Webapp Gitweb

Recently, I’ve been interested in writing web applications in Python, and one of the fun things that I discovered was the Python Web Server Gateway Interface, which is a standard interface for Python web servers, web applications, and something called middleware which can sit between the two.

One of the coolest things about WSGI is the fact that you now don’t have to decide on a specific web server before you start coding. In fact, the Python wsgiref module comes with a built-in simple web server which allows you to start coding up your web application with nothing but a bare install of Python 2.5 (or higher, of course)!

There are plenty of overviews of WSGI out there, so I won’t bother creating yet another in-depth explanation. What I will do, though, is show you how easy it is to get started.

Your basic “Hello, World!” application can be accomplished, server and all, with as little as the following:

(more…)

CouchDB looks Awesome, my Slug is borked

Tuesday, June 24th, 2008

In lieu of a real blog post, and as a way to break the 3-week silence that has recently hung over this blog, I’d just like to say this:

I’ve been looking at CouchDB, and it looks awesome.

For those who don’t know, CouchDB is a RESTful, distributed, schema-free, document-oriented database.

It looks like it’s the answer to all those times when I was thinking to myself “Man, this really doesn’t need a relational database, it needs…something else.”

Well, from what I’ve seen so far, this just might be the something else.

I would have more to say about it, some examples even, except I’ve spent my last two evenings fiddling with my NSLU2, trying to get it actually up and working after having several hard drive issues. I’ve given up on one of my hard drives, but now the Debian install won’t finish, it dies half way through saying it failed to finish the configuration or some such thing.

Anyway, I may just have to give up on it sooner or later and start doing more interesting things, and once I do that, I’ll have something more to say about CouchDB, with some example code even.

Wordpress 2.5.1 Borked Maintenance Mode

Monday, May 19th, 2008

I was going to write a nice blog post during lunch today, but I ran into some issues which I had to spend the time fixing instead.

Before I did the upgrade, I enabled Maintenance Mode.

Then I did the upgrade.

Then I couldn’t do anything.

None of the pages which I tried to go to, including upgrade.php and wp-admin, would load. All of them gave me Maintenance Mode messages. Now that I look at the plugin page, it seems it was only compatible up to version 2.3 of WordPress…

Whoops.

So I had to figure out how disable a Wordpress plugin manually. A bit of mysql command line and php deserialize and serialize later, I was back up, with my lunch hour all but completely consumed.

So here’s my post for the day, much later and probably less interesting.

Dead code in Python-generated bytecode

Tuesday, April 22nd, 2008

So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):

  • As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.

  • I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:’), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.

  • You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.

  • I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.

  • You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.

So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.

Why is this annoying? Because the Python compiler generates dead code all the time.

What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.

For example, take the following, very simple .py file:

while True:
        if True:
                continue
        break

This is disassembled into the following:

   SETUP_LOOP
  label0:
    LOAD_NAME True
    JUMP_IF_FALSE label3
    POP_TOP
    LOAD_NAME True
    JUMP_IF_FALSE label1
    POP_TOP
    JUMP_ABSOLUTE label0
    JUMP_FORWARD label2
  label1:
    POP_TOP
  label2:
    BREAK_LOOP
    JUMP_ABSOLUTE label0
  label3:
    POP_TOP
    POP_BLOCK
    LOAD_CONST None
    RETURN_VALUE

Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMP_ABSOLUTE after BREAK_LOOP.

Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.

I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(

PPyA: Python Assembler

Friday, April 18th, 2008

Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.

Each of those days I started to write up this blog post but then got distracted working on it some more

It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.

If you want to check it out, the gitweb project page is here: http://git.paulbonser.com/?p=ppya.git;a=summary

or you can git clone it:

git clone git://git.paulbonser.com/git/ppya.git/

or, if you’re behind a firewall or something

git clone http://git.paulbonser.com/git/ppya.git/

PPya Overview

A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.

(more…)

Python IMPORT_NAME bytecode mystery

Monday, April 14th, 2008

I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).

There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.

I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.

IMPORT_NAME

IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:

IMPORT_NAME namei

Imports the module co_names[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace.

Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.

Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.

import sys

is compiled by the python compiler into the following (as output by dis.disassemble):

   0 LOAD_CONST               1 (-1)
   3 LOAD_CONST               0 (None)
   6 IMPORT_NAME              0 (sys)
   9 STORE_FAST               0 (sys)

I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”

Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin __import__ function, fromlist and level.

According to help(__import__) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.

from sys import argv, subversion, byteorder

compiles into

    0 LOAD_CONST               1 (-1)
    3 LOAD_CONST               2 ((‘argv’, ’subversion’, ‘byteorder’))
    6 IMPORT_NAME              0 (sys)
    9 IMPORT_FROM              1 (argv)
   12 STORE_FAST               0 (argv)
   15 IMPORT_FROM              2 (subversion)
   18 STORE_FAST               1 (subversion)
   21 IMPORT_FROM              3 (byteorder)
   24 STORE_FAST               2 (byteorder)
   27 POP_TOP            

It’s not immediately obvious why it needs to do the extra calls to IMPORT_FROM, but the reason seems to be that __import__ doesn’t actually do anything with the fromlist argument.

Anyway, I need to sleep now.

Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631

Update:

I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.

After looking at the docs for __import__, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.

I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where __import__ is redefined for custom importing, like Thomas said.

In Search of The Perfect Editor

Saturday, April 12th, 2008

I’ve been searching for the perfect editor ever since I started programming. I’ve found many along the way which have worked to get the job done, but I’ve never really found “The One True Editor.”

Yes, I have used Emacs, and no, it is not it. There are many things I enjoy about Emacs, but there are many things I dislike, and recently Emacs got to the point for me where the bad outweighed the good. I could write a whole post on why I’ve forsaken Emacs, but I’m not going to.

I’ve found a couple of editors which meet my minimum requirements of being small, simple, and easily extendable. They also have some cool features built in, which always helps.

(more…)

In Search of Perfect Software

Friday, April 11th, 2008

The Problem

There are many pieces of software in the world, but very few of them are anywhere near what one would consider “perfect.”

I don’t blame anyone in particular for this. Writing perfect software is quite difficult if the software is to accomplish any sort of non-trivial task. Some come close to the lofty goal of perfection.

Even still, there are certain areas where the mark has always been missed, if only enough to be slightly annoying. I’m being purposely vague here because there are lots of different things wrong with lots of different pieces of software. I’ll get more specific later.

Probably the biggest contributer to this problem is the fact that nobody really knows what perfect software is. I usually know what annoys me about a particular piece of software, but I’m not always sure what would be a better solution.

My solution

There are some types of software which have gotten closer than others to perfection. Web browsers, for example. I’ll readily admit that there are things which annoy me about Firefox, but on the whole it’s a good piece of software, and improving all the time. I have some ideas of my own about the was a browser should do certain things, and I’m not about to go hacking about in the monstrous labyrinth that is the Mozilla source code, so I’ve [started my own browser project][mybrowser]. It will probably never catch up with the other, “real” browsers, but it will keep me entertained and provide me a way to prototype various ideas I have for user interfaces.

There are also some good programming languages out there. Languages like Python, Erlang, and many others make programming “fun again.” Still, for every language there’s something that’s missing, or something that might be cool to be able to do, some area which has been left mostly unexplored. Once again, I’ve started [some][booter] [prototyping][stacklang] so I can play around with some new ideas and push some boundaries which I think should be pushed (or at least ones which look as though they may be fun to push).

So what’s the way to write perfect software?

(more…)

What I'm Listening to

Loading...