Archive for the ‘Browser’ Category

Python HTML Layout Engine Progress

Saturday, September 13th, 2008

I’ve made some progress on my Python web browser. It’s nothing earth-shattering at the moment, but it does take all the text from a web page and render it.

It currently treats each element (including the ones in the head, actually, I need to fix that) as an inline text element. It doesn’t quite do proper whitespace compression between elements, either, leading to some multiple spaces in certain places. What it does do nicely is the splitting on lines in reasonable places.

The part I’m working on next is the application of CSS rules to the document. I’m considering a couple of different possibilities for methods of walking the DOM tree and cascading the rules into each element. Either way it ends up boiling down to walking the tree and matching CSS selectors against each element, and applying rules for those elements which match, of course taking into account the specificity of the matching selector to make sure the proper rule ends up taking precidence.

I haven’t sat down and figured the bit O of them yet, but I think it’s going to be a memory vs. speed decision.

Once I have styles applying to elements, I’ll probably work on getting all the standard HTML4 CSS rules rendering properly. After that, it will be on the more thorough block element handling, replaced elements (images, form elements), and probably psuedo-classes and psuedo-selectors.

After that, I dunno… Acid2?

Anyway, that’s me getting ahead of myself. I mean, it doesn’t even render block elements yet (since it has no way of setting an elment to be a block element, due to the whole no CSS being applied yet thing).

If you are curious, you can check it out from my public git repo.

I warn you, the code is not really commented too much (except for in the layout section, there’s a whole outline of how that’s all supposed to work in there). If you want to see the magic of how it renders now, go ahead and run getgoogle.py (which, ironically, doesn’t even get Google at this point, since Google has some JavaScript which it wants to render as text…). You’ll need pygame installed to run it.

I’ll save you some time, though. It looks like this:

But hopefully not for long :)

On productivity, flow, and the “big block”

Tuesday, January 15th, 2008

The “Big Block” Method

Joshua Clanton posted a guest post on Jarkko Laine’s blog yesterday, talking about how to get into a state of “flow” to increase your productivity.

I commented that I seem to be able to get into a state of flow better when I have a large chunk of time allocated to work on a task.

To me there’s nothing worse than coming back to something I had to stop in the middle of. It takes far too much time to get back up to speed on what I was doing, and often I’ll have forgotten about something essential, which results in bugs.

Joshua pointed out in another blog post that it helps flow to have not only big blocks of time but also to have your tasks divided into big blocks as well.

I definitely agree with Joshua on the “big block method,” though this is the first time I’ve put much conscious thought into it. Looking back, it seems that I’ve done better work when I’ve used this method, and I’ve always hated working in short bursts, since it feels like I’m wasting too much time in “context switching.”

How big of a block?

There’s one question I’m not really sure of the answer to yet: How big should a big block be?

The first answer that comes to mind is “Just big enough, but no bigger” (a modified Einstein quote)

So far my technique has been to just pick what seems to be a single functional unit, or perhaps a piece of functionality that all belongs together.

I suppose that different people will have different definitions of what a big block task is, but to me it works out that it’s something that is a complete unit. A complete unit can be a full module, or one step in the implementation of a full module. The key is, though, that there is a clear stopping point and something isn’t left unfinished.

For example, working on my browser, I have index cards (old business cards, actually) which each have an individual task on them. I sort these cards by the order I plan to do them in. These cards contain something like “Implement DOM 2 HTML” or “Rendering Engine support for inline elements.”

The former is a complete module which could (and was) finished in one big block section of time.

The latter won’t result in the final version of the layout engine, but it will get it to a point where it is functioning, and then the next step would be to support block elements. Absolutely and relatively positioned elements would be yet another separate task.

I’d be very interested to hear more ideas and thoughts about the big block method. How about adding a comment or writing a blog post with a trackback?

Python DOM HTML functional

Sunday, January 13th, 2008

DOM HTML Progress

Well, I’ve made some progress on the HTML layout engine, but it still isn’t complete enough to run yet.

When I got to the point where I needed to call ViewCSS.getComputedStyle from the DOM, I stopped to actually implement it, and decided that it was a good time to actually see if the DOM HTML code I had written would run.

It didn’t, of course.

So I spent some time fixing all the little bugs here and there, and set up some test code to pull a page from the internet and parse it into a full HTML DOM tree.

Since I’m using pxdom’s parsing functions and pxdom only knows how to parse proper XML, I also run the HTML through the python tidy lib first to ensure that it’s proper XHTML. Without doing that I couldn’t even parse the Google home page.

Here it is if you’d like to check it out. It needs pxdom to work. The parseString function will take a string containing HTML and return an HTMLDocument.

Right now it’s usable for manipulating and traversing the DOM with all the attributes you are used to being able to access from Javascript, and if you combine it together with this simpleget module (requires domhtml and utidy from above), you can use it for some basic web scraping purposes.

Remember, it is basically pre-alpha code, since I haven’t tested everything yet. I might get around to writing up some unit tests at some point, but until then I can’t guarantee that there are no errors.

JavaScript update

I didn’t spend all my time in the last couple of days just fixing up my DOM HTML implementation. I also did some research on JavaScript interpreters, and I think I’ve decided that I’m going to wrap SEE, a JavaScript interpreter written in C, with a Python module and use it for the scripting engine for my browser.

I considered using Spidermonkey, but after looking through the documentation for each, SEE seems like it will be much easier to wrap, and as far as I can tell, it supports JavaScript exceptions in an easier-to-use (and easier-to-integrate-with-Python-exceptions) way than Spidermonkey does.

SEE also handles memory management for you and you can fully separate interpreter instances so you don’t have to worry about thread safety, which is two less things to have to worry about.

Quick Python Browser update

Friday, January 11th, 2008

A quick update since I’ve not posted anything for over a week now.

I’ve written some, and outlined a bunch more, of the code for the browser layout engine.

I don’t have a name for the browser yet, but there are a few ideas floating around.

I’m considering my options for a JavaScript Engine. There are several implementations in C or some such similar language. There’s also the possibility of a translator into Python, which could then be bytecode compiled and run inside a sandbox environment. Or a pure Python implementation..but I think that would be too much work.

I’m going to be sure to keep the layout engine separate from the rendering engine (or as much as possible, at least), so that I can change my mind later about what I want to use if I need to.

I will probably have more to talk about tomorrow or Saturday, but I needed to break the silence.

As an added bonus, and to make this post longer, here’s the output of the really crappy, incomplete HTML rendering engine that I hacked together in a single night for a class project (click the image for the full page render):

Old Renderer Test Thumbnail

Notice the lack of padding and the failure of most lines to wrap properly. Hey, I wrote it in a very short period of time. The point was to write a program which was object-oriented, not one that did anything useful :)

In case you were wondering, that’s a render of my old blog, which was hosted on the Oregon State servers (still is, actually, but they should be taking it down any day now…)

So hopefully by tomorrow night I should have a similar, but better rendered picture to post. Does anyone have suggestions for the best font engine for Python? I know there are some FreeType wrappers written, but is there anything which adds more Python-y functionality as well?

Python Web Browser on the way

Sunday, December 30th, 2007

I’ve spent a good chunk of my vacation working on some of what will become the internals of a web browser written in Python.

Some of the goals of the browser include:

  • Full conformance to all DOM 2 Modules (and equivalent DOM 3 modules when they become recommendations). This goal is already about 60% done.
  • Full conformance to CSS2.1, and eventually CSS3
  • Javascript support.
  • SVG and Canvas support
  • Little or no explicit support for deprecated standards and technologies (yes, this is a feature).

The most important features, and thus the ones getting the most attention, will the standards compliance and JavaScript support. Standards compliance is important because I want this browser to be an example of a browser which people should take seriously. I won’t, however, do extra work because somebody out there decided to not follow the standard when designing their webpage. JavaScript support is important for the same reason. Nobody is going to take a browser seriously (or be able to use it for any modern website) if it doesn’t support JavaScript.

So far I’ve got a complete (but also completely untested) implementation of the DOM 2 HTML, which took me a good amount of time longer than expected.

I started with a good base: pxdom, a complete implementation of DOM 3 Core and LS (Load and Save), and implemented my additions on top of that. It’s still a separate module, though, but there are a few places where I rely on some of the implementation specific details of pxdom. I have plans to remove the dependency at some point so that I can swap in other DOM core implementations.

On top of that I built my DOM HTML implementation, and laid a little bit of groundwork for DOM StyleSheets and CSS. I’ll be using cssutils, which is a mostly complete python DOM CSS implementation. The cssutils version 0.9.4b1 was just released with mention of some sort of selector support being added in version 0.9.5, which will hopefully make it so I don’t have to do a full CSS cascade implementation myself.

I’m taking a break from working on DOM implementations now and moving back to something which will actually allow me to see results: the rendering engine. I’m starting with just a stub implementation of the ViewCSS interface which allows me to use the getComputedStyle function to get the default style for any given element. With that, I should be able to render any HTML document as if it had no style applied to it. Later on I will hopefully be able to use the upcoming selector support in cssutils to make getComputedStyle work as expected.

This browser is something that I’ve been wanting to do for a very long time. I even sort of started to implement the rendering engine a long time ago using pycairo as the backend. I’m going to stick with that because it seems to be an ideal rendering backend for webpages (which would explain why it will be the only thing used for rendering as of Firefox 3.0 :) ), and for eventual SVG and Canvas support. Once I get to the part where I’m actually working on the user interface portion, I’m planning on writing a Python binding for glitz, which will provide the browser with OpenGL accelerated rendering by default.

What I'm Listening to

Loading...