Python IMPORT_NAME bytecode mystery
I’ve been messing around with Python bytecode this weekend, which is why there was no Sunday post (as I’m writing this it’s very early Monday morning…).
There’s some fun stuff to wrap your head around when it comes to Python bytecode. The structure of the virtual machine, the order of bytes in bytecode arguments, the instructions which require magical numbers to be pushed onto the stack before being called.
I’ll go into more detail about what I’m doing mucking about with the Python internals tomorrow (later today, ugh! I need to go to bed!), but for now I’ll share one of the little mysteries that I’ve run into and managed to figure out.
IMPORT_NAME
IMPORT_NAME is the opcode you use to import another module. The description from the Python bytecode docs is this:
IMPORT_NAME namei
Imports the module co_names[namei]. The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent STORE_FAST instruction modifies the namespace.
Basically, each code object has a tuple of names, namei is an index into that list of names, pointing to the name of the module you want to import. When this instruction is run you end up with the module on the stack, and you can then bind it to a name and access items within it.
Not mentioned here is the fact that you have to push two extra parameters onto the stack before calling IMPORT_NAME, otherwise the Python interpreter segfaults when it hits that instruction.
is compiled by the python compiler into the following (as output by dis.disassemble):
3 LOAD_CONST 0 (None)
6 IMPORT_NAME 0 (sys)
9 STORE_FAST 0 (sys)
I tried changing the -1 to a different number and I got “ValueError: Attempted relative import in non-package”
Looking into Python/ceval.c in the Python interpreter sourcecode, I see that these two extra parameters are passed in as the last two parameters to the builtin __import__ function, fromlist and level.
According to help(__import__) the -1 for level indicates it should try both absolute and relative imports, and the fromlist is empty because this isn’t a “from sys import …” statement.
compiles into
3 LOAD_CONST 2 ((‘argv’, ’subversion’, ‘byteorder’))
6 IMPORT_NAME 0 (sys)
9 IMPORT_FROM 1 (argv)
12 STORE_FAST 0 (argv)
15 IMPORT_FROM 2 (subversion)
18 STORE_FAST 1 (subversion)
21 IMPORT_FROM 3 (byteorder)
24 STORE_FAST 2 (byteorder)
27 POP_TOP
It’s not immediately obvious why it needs to do the extra calls to IMPORT_FROM, but the reason seems to be that __import__ doesn’t actually do anything with the fromlist argument.
Anyway, I need to sleep now.
Note to self: submit a Python bugtracker issue about the undocumented required parameters. done, issue 2631
Update:
I also posted this as a response to Thomas Lee’s response, but duplicated it here for easier finding.
After looking at the docs for __import__, it’s interesting to see that the values in fromlist aren’t actually used, but it is significant whether the fromlist is empty or non-empty.
When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned. This is done for compatibility with the bytecode generated for the different kinds of import statement; when using “import spam.ham.eggs”, the top-level package spam must be placed in the importing namespace, but when using “from spam.ham import eggs”, the spam.ham subpackage must be used to find the eggs variable.
I suppose the reason that fromlist isn’t just changed to a boolean is that it might be significant in a setup where __import__ is redefined for custom importing, like Thomas said.
April 14th, 2008 at 4:50 am
The documentation for import solves this mystery (well, sort of!):
http://docs.python.org/lib/built-in-funcs.html
“Note that even though locals() and ['eggs'] are passed in as arguments, the import() function does not set the local variable named eggs; this is done by subsequent code that is generated for the import statement. (In fact, the standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.)”
Essentially, when the IMPORT_NAME is executed for ‘from foo import bar, baz’, the import builtin is called with the fromlist (which is converted to a tuple of strings). to provide custom import handling for your Python programs. For example, you may want to prevent users of your program from writing scripts that import certain modules.
The code in Python/ceval.c for IMPORT_NAME seems to back this up (I’ve annotated it with a few comments …):
(hope the code looks okay - you need a “preview” feature :P)
April 14th, 2008 at 5:18 am
[...] was originally planned as a response to this post by Paul Bonser, but grew a little unwieldy (and his comment submission form seems to be [...]