Posts Tagged ‘Python’

Python HTML Layout Engine Progress

Saturday, September 13th, 2008

I’ve made some progress on my Python web browser. It’s nothing earth-shattering at the moment, but it does take all the text from a web page and render it.

It currently treats each element (including the ones in the head, actually, I need to fix that) as an inline text element. It doesn’t quite do proper whitespace compression between elements, either, leading to some multiple spaces in certain places. What it does do nicely is the splitting on lines in reasonable places.

The part I’m working on next is the application of CSS rules to the document. I’m considering a couple of different possibilities for methods of walking the DOM tree and cascading the rules into each element. Either way it ends up boiling down to walking the tree and matching CSS selectors against each element, and applying rules for those elements which match, of course taking into account the specificity of the matching selector to make sure the proper rule ends up taking precidence.

I haven’t sat down and figured the bit O of them yet, but I think it’s going to be a memory vs. speed decision.

Once I have styles applying to elements, I’ll probably work on getting all the standard HTML4 CSS rules rendering properly. After that, it will be on the more thorough block element handling, replaced elements (images, form elements), and probably psuedo-classes and psuedo-selectors.

After that, I dunno… Acid2?

Anyway, that’s me getting ahead of myself. I mean, it doesn’t even render block elements yet (since it has no way of setting an elment to be a block element, due to the whole no CSS being applied yet thing).

If you are curious, you can check it out from my public git repo.

I warn you, the code is not really commented too much (except for in the layout section, there’s a whole outline of how that’s all supposed to work in there). If you want to see the magic of how it renders now, go ahead and run getgoogle.py (which, ironically, doesn’t even get Google at this point, since Google has some JavaScript which it wants to render as text…). You’ll need pygame installed to run it.

I’ll save you some time, though. It looks like this:

But hopefully not for long :)

Storing Hierarchical Data in CouchDB

Friday, July 4th, 2008

Much to my surprise, my last post generated more traffic in a single day than my blog has ever gotten in a single month. Apparently people are quite interested in making web applications with Python. I’ve started on part two, but since so many people showed interest I want to spend more time on it than I spent on the last one. So instead, you get this post.

So I’ve been fiddling around with CouchDB lately. Since it’s common to store tree-based data, and it’s kind of a pain to do so in your standard relational DB, I thought it would be a good exercise to see how hard it is to store hierarchical data in CouchDB.

Turns out it’s pretty easy.

(more…)

Building a Python Web Application, Part 1

Thursday, June 26th, 2008
Edit: I’ve cleaned up the longer example, using Python’s string.Template module for the templates. I’ve also set up a git repo for the source that will go along with posts to this series: Python Webapp Gitweb

Recently, I’ve been interested in writing web applications in Python, and one of the fun things that I discovered was the Python Web Server Gateway Interface, which is a standard interface for Python web servers, web applications, and something called middleware which can sit between the two.

One of the coolest things about WSGI is the fact that you now don’t have to decide on a specific web server before you start coding. In fact, the Python wsgiref module comes with a built-in simple web server which allows you to start coding up your web application with nothing but a bare install of Python 2.5 (or higher, of course)!

There are plenty of overviews of WSGI out there, so I won’t bother creating yet another in-depth explanation. What I will do, though, is show you how easy it is to get started.

Your basic “Hello, World!” application can be accomplished, server and all, with as little as the following:

(more…)

Dead code in Python-generated bytecode

Tuesday, April 22nd, 2008

So I’ve made a couple of changes to Papaya (yeah, it’s called Papaya now):

  • As suggested by Phillip J. Eby, rather than generating the bytecode myself, I’m now using BytecodeAssembler, which has shortened and simplified my code a bit (though honestly not as much as I originally thought it would). I had already considered doing this before I wrote it all myself, but I wanted to get the educational benefit of doing it all from scratch.

  • I’ve changed the syntax for function definitions to match that of Python’s (minus the closing ‘:’), which also means that I’ve added support for *args and **kwargs parameters. Also, since I’m using BytecodeAssembler, you get the automatic parameter unpacking described here when using nested positional arguments. Of course, this will currently get duplicated if you decompile and then recompile code. I haven’t decided what to do about this yet.

  • You no longer need to specify a label for any of the SETUP_* instructions, since BytecodeAssembler handles this for you as well.

  • I added a setup.py file, which uses ez_setup and can build a .egg file and other fancy things. I will add this project into pypi as soon as I resolve the issue I’m about to talk about below.

  • You no longer need to specify the stack size of a given block of code, it will be calculated for you by BytecodeAssembler.

So, due to my use of BytecodeAssembler, I get free stack size calculations, but I get another feature which is somewhat annoying: dead-code prevention.

Why is this annoying? Because the Python compiler generates dead code all the time.

What this means is, if you decompile any non-trivial (and some quite-trivial) .pyc files created by Python, and then try to recompile then, then it will fail with an “AssertionError: Unknown stack size at this location” message.

For example, take the following, very simple .py file:

while True:
        if True:
                continue
        break

This is disassembled into the following:

   SETUP_LOOP
  label0:
    LOAD_NAME True
    JUMP_IF_FALSE label3
    POP_TOP
    LOAD_NAME True
    JUMP_IF_FALSE label1
    POP_TOP
    JUMP_ABSOLUTE label0
    JUMP_FORWARD label2
  label1:
    POP_TOP
  label2:
    BREAK_LOOP
    JUMP_ABSOLUTE label0
  label3:
    POP_TOP
    POP_BLOCK
    LOAD_CONST None
    RETURN_VALUE

Note the double JUMP. This is generated any time you have a continue statement, despite the fact that the second jump cannot ever be run. Also unecessary is the JUMP_ABSOLUTE after BREAK_LOOP.

Both of these cause an error in BytecodeAssembler because it has no context from which to determine the stack size at that point. Of course, that doesn’t really matter since the code will never be run.

I’m currently stumped as to the best way to solve this issue, and I’m tired and don’t want to think about it any more. :(

PPyA: Python Assembler

Friday, April 18th, 2008

Over the last few of days I’ve hacked together a Python Assembler/Disassembler. I’ve called it PPya (pronounced like “papaya,” the fruit) Paul’s Python assembler. The ‘a’ is left lowercase because it looks better that way.

Each of those days I started to write up this blog post but then got distracted working on it some more

It’s at the point now where it is fairly usable, both as a learning tool and as a tool for writing Python modules in assembly if you feel so inclined.

If you want to check it out, the gitweb project page is here: http://git.paulbonser.com/?p=ppya.git;a=summary

or you can git clone it:

git clone git://git.paulbonser.com/git/ppya.git/

or, if you’re behind a firewall or something

git clone http://git.paulbonser.com/git/ppya.git/

PPya Overview

A .pya file consists of a series of bytecodes (well, strings representing them, anyway) followed by parameters for those instructions which take parameters. When assembled, these parameters are converted to indices into a tuple in a python Code object, one of co_names, co_consts, co_varnames, co_cellvars, or co_freevars.

(more…)

Python IMPORT_NAME bytecode mystery

Monday, April 14th, 2008

I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).

There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.

I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.

IMPORT_NAME

IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:

IMPORT_NAME namei

Imports the module co_names[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace.

Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.

Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.

import sys

is compiled by the python compiler into the following (as output by dis.disassemble):

   0 LOAD_CONST               1 (-1)
   3 LOAD_CONST               0 (None)
   6 IMPORT_NAME              0 (sys)
   9 STORE_FAST               0 (sys)

I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”

Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin __import__ function, fromlist and level.

According to help(__import__) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.

from sys import argv, subversion, byteorder

compiles into

    0 LOAD_CONST               1 (-1)
    3 LOAD_CONST               2 ((‘argv’, ’subversion’, ‘byteorder’))
    6 IMPORT_NAME              0 (sys)
    9 IMPORT_FROM              1 (argv)
   12 STORE_FAST               0 (argv)
   15 IMPORT_FROM              2 (subversion)
   18 STORE_FAST               1 (subversion)
   21 IMPORT_FROM              3 (byteorder)
   24 STORE_FAST               2 (byteorder)
   27 POP_TOP            

It’s not immediately obvious why it needs to do the extra calls to IMPORT_FROM, but the reason seems to be that __import__ doesn’t actually do anything with the fromlist argument.

Anyway, I need to sleep now.

Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631

Update:

I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.

After looking at the docs for __import__, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.

I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where __import__ is redefined for custom importing, like Thomas said.

Mini Stack Language in Python

Thursday, April 10th, 2008

Are you tired of the lack of challenge in programming? Do you pine for a syntax reminiscent of your old HP RPN calculator? Well then do I have the programming language for you!

Over the last few days I’ve been working on a little stack-based language in Python.

I’ve got lots of idea for programming languages, but a lot of them are pretty complex. I don’t really have any experience implementing programming languages, so I figured I’d start small and work my way up.

In this case, small means 135 lines of code (not counting the GPL notice) without many docstrings or comments. It’s small, so you should be able to figure it all out by the code (I hope you can, because I didn’t comment it that much).

Here’s an outline of some of the features of the language:

Programs are written in Reverse Polish Notation. Datum, made up of strings, numbers, or code blocks are pushed onto the stack as they are encountered. When a ‘word’ is encountered, it is executed, and may do stuff to the stack.

There is a consistent syntax, or rather, mostly a lack thereof. The only reserved characters are single and double quotes and curly brackets (’{’ and ‘}’), after whitespace. Quote characters can be used as parts of words, even, just not as the first character.

All other language functionality is handled through the calling of words. There is no distinction made between built-in words and user_defined ones.

Examples

The ever-popular Hello world program:

‘Hello, World!’ echo_nl

Pretty straightforward. Push the string onto the stack and then print it out (with a newline). The echo and echo_nl commands both pop the top item from the stack and print it out.

Defining new words is as simple as pushing a name and block onto the stack and then calling the add_word word.

’square’ {dup *} add_word

This example calls dup, which pushes a copy of the top item on the stack. It then calls * which pops to top two items from the stack and pushes the result of their multiplication.

Some user input and if and else:

’story’ {
    ‘do you want to hear a story?’ echo_nl
    read_string
    ‘yes’
    =
        {‘Once upon a time…TODO: write story’ echo_nl}
    if
        {"Aww, you don’t want to hear my story." echo_nl}
    else
} add_word
story

Note that since it’s postfix notation that the words ‘if’ and ‘else’ appear after the block which will be run. Also, these are not keywords, they are just like any other built-in word. ‘if’ pushes a 1 to the stack after running its block, or a 0 if it doesn’t run its block. Then ‘else’ (once I actually write it) will run its block if the top value on the stack is a 0, otherwise it won’t run it’s block. I might also add an ‘elif’ function which will do the same as else while also pushing a result back onto the stack as well.

(I just realized that I didn’t actually implement the else function..I’ll fix that later.)

Check out the code for more details. Really, it’s not so hard to read..I think.

Work in progress

This language is somewhat usable at the moment, but it’s still in progress. I’ll probably change the names of all the words, and probably just start calling them functions rather than words, since that name is confusing (shame on you, FORTH, for inspiring me to call them that).

I’ll probably add in more words..err…functions to enable functional programming a-la Joy.

I don’t yet have a specific goal for this language, since I’m writing it for my own education. It might dawn on me that there is a perfect application for this language, which will then shape the future direction. If nothing else, it will be a fun toy for those who would like to create their own simple programming language, since it’s such a small piece of code at the moment.

I might also try to write a compiler to compile it down into Python bytecode, again as an exercise in personal education.

If you’re interested, the code can be gotten with git:

git clone http://paulbonser.com/code/stacklang.git

or the current version (0.0.1.1) is available here: stacklang.py

Got any great ideas of what I could do with this language? Using the code for something cool yourself? Got a patch for me? Let me know!

Python DOM HTML functional

Sunday, January 13th, 2008

DOM HTML Progress

Well, I’ve made some progress on the HTML layout engine, but it still isn’t complete enough to run yet.

When I got to the point where I needed to call ViewCSS.getComputedStyle from the DOM, I stopped to actually implement it, and decided that it was a good time to actually see if the DOM HTML code I had written would run.

It didn’t, of course.

So I spent some time fixing all the little bugs here and there, and set up some test code to pull a page from the internet and parse it into a full HTML DOM tree.

Since I’m using pxdom’s parsing functions and pxdom only knows how to parse proper XML, I also run the HTML through the python tidy lib first to ensure that it’s proper XHTML. Without doing that I couldn’t even parse the Google home page.

Here it is if you’d like to check it out. It needs pxdom to work. The parseString function will take a string containing HTML and return an HTMLDocument.

Right now it’s usable for manipulating and traversing the DOM with all the attributes you are used to being able to access from Javascript, and if you combine it together with this simpleget module (requires domhtml and utidy from above), you can use it for some basic web scraping purposes.

Remember, it is basically pre-alpha code, since I haven’t tested everything yet. I might get around to writing up some unit tests at some point, but until then I can’t guarantee that there are no errors.

JavaScript update

I didn’t spend all my time in the last couple of days just fixing up my DOM HTML implementation. I also did some research on JavaScript interpreters, and I think I’ve decided that I’m going to wrap SEE, a JavaScript interpreter written in C, with a Python module and use it for the scripting engine for my browser.

I considered using Spidermonkey, but after looking through the documentation for each, SEE seems like it will be much easier to wrap, and as far as I can tell, it supports JavaScript exceptions in an easier-to-use (and easier-to-integrate-with-Python-exceptions) way than Spidermonkey does.

SEE also handles memory management for you and you can fully separate interpreter instances so you don’t have to worry about thread safety, which is two less things to have to worry about.

Quick Python Browser update

Friday, January 11th, 2008

A quick update since I’ve not posted anything for over a week now.

I’ve written some, and outlined a bunch more, of the code for the browser layout engine.

I don’t have a name for the browser yet, but there are a few ideas floating around.

I’m considering my options for a JavaScript Engine. There are several implementations in C or some such similar language. There’s also the possibility of a translator into Python, which could then be bytecode compiled and run inside a sandbox environment. Or a pure Python implementation..but I think that would be too much work.

I’m going to be sure to keep the layout engine separate from the rendering engine (or as much as possible, at least), so that I can change my mind later about what I want to use if I need to.

I will probably have more to talk about tomorrow or Saturday, but I needed to break the silence.

As an added bonus, and to make this post longer, here’s the output of the really crappy, incomplete HTML rendering engine that I hacked together in a single night for a class project (click the image for the full page render):

Old Renderer Test Thumbnail

Notice the lack of padding and the failure of most lines to wrap properly. Hey, I wrote it in a very short period of time. The point was to write a program which was object-oriented, not one that did anything useful :)

In case you were wondering, that’s a render of my old blog, which was hosted on the Oregon State servers (still is, actually, but they should be taking it down any day now…)

So hopefully by tomorrow night I should have a similar, but better rendered picture to post. Does anyone have suggestions for the best font engine for Python? I know there are some FreeType wrappers written, but is there anything which adds more Python-y functionality as well?

Python Web Browser on the way

Sunday, December 30th, 2007

I’ve spent a good chunk of my vacation working on some of what will become the internals of a web browser written in Python.

Some of the goals of the browser include:

  • Full conformance to all DOM 2 Modules (and equivalent DOM 3 modules when they become recommendations). This goal is already about 60% done.
  • Full conformance to CSS2.1, and eventually CSS3
  • Javascript support.
  • SVG and Canvas support
  • Little or no explicit support for deprecated standards and technologies (yes, this is a feature).

The most important features, and thus the ones getting the most attention, will the standards compliance and JavaScript support. Standards compliance is important because I want this browser to be an example of a browser which people should take seriously. I won’t, however, do extra work because somebody out there decided to not follow the standard when designing their webpage. JavaScript support is important for the same reason. Nobody is going to take a browser seriously (or be able to use it for any modern website) if it doesn’t support JavaScript.

So far I’ve got a complete (but also completely untested) implementation of the DOM 2 HTML, which took me a good amount of time longer than expected.

I started with a good base: pxdom, a complete implementation of DOM 3 Core and LS (Load and Save), and implemented my additions on top of that. It’s still a separate module, though, but there are a few places where I rely on some of the implementation specific details of pxdom. I have plans to remove the dependency at some point so that I can swap in other DOM core implementations.

On top of that I built my DOM HTML implementation, and laid a little bit of groundwork for DOM StyleSheets and CSS. I’ll be using cssutils, which is a mostly complete python DOM CSS implementation. The cssutils version 0.9.4b1 was just released with mention of some sort of selector support being added in version 0.9.5, which will hopefully make it so I don’t have to do a full CSS cascade implementation myself.

I’m taking a break from working on DOM implementations now and moving back to something which will actually allow me to see results: the rendering engine. I’m starting with just a stub implementation of the ViewCSS interface which allows me to use the getComputedStyle function to get the default style for any given element. With that, I should be able to render any HTML document as if it had no style applied to it. Later on I will hopefully be able to use the upcoming selector support in cssutils to make getComputedStyle work as expected.

This browser is something that I’ve been wanting to do for a very long time. I even sort of started to implement the rendering engine a long time ago using pycairo as the backend. I’m going to stick with that because it seems to be an ideal rendering backend for webpages (which would explain why it will be the only thing used for rendering as of Firefox 3.0 :) ), and for eventual SVG and Canvas support. Once I get to the part where I’m actually working on the user interface portion, I’m planning on writing a Python binding for glitz, which will provide the browser with OpenGL accelerated rendering by default.

What I'm Listening to

Loading...