One Very Confused Python

I’m at the stage in my career that interesting problems at work are almost as rare as guitar solos on St. Anger. I recently spent a bit of time delving into a juicy one that, on the surface, shouldn’t have been too difficult. I’ve written code before to take a Python source script, run it though an AST-processing mechanism of some sort (originally ast, but more recently, astroid because of its great inference mechanism), and do something with the resulting syntax tree. This is always a lot of fun, and you learn a lot about how code is structured when you examine it at this level.

This time though, I needed to do something a bit different. My previous efforts in AST processing were all about inspecting code and doing some form of reporting on it. This time, I needed to change the code on the fly when it was imported. This was the AST-equivalent of moving from purple belt to brown belt1. While both ast and astroid make it easy to manipulate a loaded syntax tree for a file using a node transformer, doing it at import time meant that I needed to learn the answer to the seemingly simple question “How do Python imports work?” and “How can I hook into them?”

So How Do Python Imports Work?

That’s a great question, thanks for asking! The simple answer is “it’s more complicated than you think.”

Python is able to import all sorts of different things at runtime. I’d never really given it much thought, but the things that can be loaded are scripts, eggs, wheels and more things I hadn’t thought of. These are all kind of different - scripts are just text files, wheels and eggs are packaged source libraries, and the other things that I can’t think of are whatever they are. How are all of these turned into something common that can be run by your friendly local interpreter?

The answer is importlib. A lot of folks will know this as being that thing you use when you want to import something dynamically:

import importlib

my_thing_to_import = 'mything'

importlib.import_module(my_thing_to_import)

This might be useful in a factory-like situation, where you may not know what it is you actually need to import until runtime.

Unfortunately, importlib’s documentation on some of the more subtle parts of how it all works assumes that you already know how it all works! I also found it quite hard to find much in the way of good explanations for it online, with the original PEP being pretty dry and a bit high level overall. This is my attempt at putting something out so that others can understand the dark arts too2.

At a high level, importing some kind of file into Python is done by the import statement, which ends up calling into the __import__ builtin. It is possible to monkey punch this with your own implementation, but that’s strongly discouraged. Instead, it’s suggested to use import hooks and importlib instead. These use three main components:

  • A finder. The job of a finder is to tell Python’s import mechanism if it knows about the type of file being imported or not. If it does, it returns…
  • A spec. This specification will give the import mechanism details on how to load the file in question, using something called…
  • A loader. Given a path for something to load, the loader will turn that original something into a chunk of compiled code that you can call from your Python script.

But how are all these things defined? How can I hook into these? Should I be using Rust instead? More good questions! Read on to find out more…3

Finding The Finders

The import process is controlled via a series of entries in your sys paths. sys.meta_path is a list of finders that will have their find_spec function called with params fullname, path, target. If the finder is a bit clueless about how the file should be loaded, find_spec will return None, and the import mechanism will continue down the list. If it does know how to handle the file though, the finder will return a spec object with path details and a reference to the loader to be used for loading the object. We then stop looking through the finder list. (If all of the finders return None from find_spec, we’re out of options and can complain that we don’t know how to import the file properly and move on with our lives.)

sys.path_hooks contains a list of loaders that can then be used to do the actual loading. There are a few different ones shown here, which should give you a rough idea of what kinds of things can be loaded.

Note that for both the finders and the loaders, the entries in sys can be either a type or an instantiated object. The import mechanism will take care of instantiating types into an actual object without you having to lift a claw.

Picking one of my Python projects at random, I have the following sys entries:

>>> import sys

>>> sys.meta_path
[
    <_distutils_hack.DistutilsMetaFinder object at 0x7f552d283b50>, 
    <_virtualenv._Finder object at 0x7f552d24a980>, 
    <class '_frozen_importlib.BuiltinImporter'>, 
    <class '_frozen_importlib.FrozenImporter'>, 
    <class '_frozen_importlib_external.PathFinder'>, 
    <pkg_resources.extern.VendorImporter object at 0x7f552c2196f0>,
]

>>> sys.path_hooks
[
    <class 'zipimport.zipimporter'>, 
    <function FileFinder.path_hook.<locals>.path_hook_for_FileFinder at 0x7f552d3a4af0>,
]

I pulled this from a running PyCharm instance with a virtualenv setup, so you may have some different entries, but there should be a bit of overlap here. Which one of these would be used for an actual real-world import? Let’s set up a fake finder to find out!

Here’s a test library script that we’re importing (helpfully named lib.py):

def do_thing():
    print('Hello')

And here’s the find-the-finder script:

import sys


class MyFinder:
    def find_spec(self, fullname, path, *args, **kwargs):
        for finder in sys.meta_path[1:]:
            print(f"{fullname}, {path}, {finder}: {finder.find_spec(fullname, path)}")


if __name__ == "__main__":
    sys.meta_path.insert(0, MyFinder())

    from lib import do_thing

    do_thing()

Running this gives the following output:

lib, None, <_distutils_hack.DistutilsMetaFinder object at 0x7f1832c60dc0>: None
lib, None, <_virtualenv._Finder object at 0x7f1832c2a7d0>: None
lib, None, <class '_frozen_importlib.BuiltinImporter'>: None
lib, None, <class '_frozen_importlib.FrozenImporter'>: None
lib, None, <class '_frozen_importlib_external.PathFinder'>: ModuleSpec(name='lib', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f1832c96c20>, origin='/home/yournamegoeshere/Development/Python/importers/lib.py')
Hello

So, it’s the last finder, the PathFinder, that decides it can confidently tell the interpreter which loader to use, the importlib SourceFileLoader. This loader is described as “An abstract base class for implementing source (and optionally bytecode) file loading.” Sounds about right! And given the cheery output of “Hello” at the end of the script, it looks like it’s worked properly. All good so far.

What A Load Of…

As mentioned above, there are a number of different loaders to choose from. Looking at the available base types defined in importlib, the two that seem the most promising are:

  • FileLoader - imports a file from a specific folder (provided when the file loader is instantiated). Useful for loading things from, say, a resource folder of some sort (which is presumably why it inherits from the now-deprecated ResourceLoader).
  • SourceLoader - imports a source file. This also inherits from ResourceLoader, but also from ExecutionLoader, which forces the loader to work with file-based resources.

They’re fairly similar, but we’ll stick with the SourceLoader for this example. A SourceLoader needs to provide implementations for the following functions:

  • get_filename - given an import statement, return the fully-qualified filename that the source should come from.
  • get_data - given the filename produced by get_filename, return the source data as bytes, a string, or an AST so that a Python module can be compiled from the source.

(Other loaders will have slightly different requirements for what they need to implement, but this one is nice and easy to start off with.)

For our toy example, let’s try to import… a CSV file! I created a file called data.csv in the root of my Python project that looks like this:

1,Fred,42,Programmer
2,Alice,39,Cryptographer
3,Bob,39,Cryptographer
4,Bruce,68,Singer
5,Nicko,72,Octopus

We now setup a finder that, if it finds a csv file with the same name as the import name, will return a module that has an entries list with the list of CSV rows in it. (Super contrived, I know, I know). Shoving all the code into a single Python script like so:

import csv
import importlib.abc
import importlib.util
import os
import sys


class CSVLoader(importlib.abc.SourceLoader):
    def get_data(self, path):
        # Create some source code based on what we read from the CSV.
        source = "entries = ["

        with open(path, "r") as f:
            csv_data = csv.reader(f)

            for csv_row in csv_data:
                # Being lazy for this example, just using repr instead of building it up properly, hey-ho.
                source += f"{repr(csv_row)}, "

        source += "]"

        return source

    def get_filename(self, fullname):
        return f"{fullname}.csv"


class CSVFinder:
    def find_spec(self, fullname, path, target):
        potential_name = f"{fullname}.csv"

        if os.path.exists(potential_name):
            # Note the use of spec_from_loader. This is a helper function that, when given a loader object, will construct a spec for you. Helps a
            # lot if you're lazy like me.
            return importlib.util.spec_from_loader(
                fullname, CSVLoader(), origin=potential_name
            )

        return None


if __name__ == "__main__":
    sys.meta_path.insert(0, CSVFinder())
    sys.path_hooks.insert(0, CSVLoader())

    import data

    print(data.entries)
    print(data.entries[2])

We can now run this and get:

[['1', 'Fred', '42', 'Programmer'], ['2', 'Alice', '39', 'Cryptographer'], ['3', 'Bob', '39', 'Cryptographer'], ['4', 'Bruce', '68', 'Singer'], ['5', 'Nicko', '72', 'Octopus']]
['3', 'Bob', '39', 'Cryptographer']

If you comment out the additions to sys.meta_path and sys.path_hooks in the main block, you’ll get an error instead, as it won’t be able to find a file to import. Amazing, astounding, etcetc.

More Than Meets The Eye

A more complex example is the one hinted at in the opening paragraph - taking in source code and modifying it on the fly. We’ll use the SourceLoader base class again. Let’s start with a file called things.py:

def do_thing_one():
    print("Doing the first thing")


def do_thing_two():
    print("Doing the second thing")


def do_thing_end():
    print("Doing the last thing")


def do_things():
    do_thing_one()
    do_thing_two()
    do_thing_end()

Highly impressive cutting-edge stuff.

Then our main piece of magic, using the ast.NodeTransformer functionality:

import ast
import importlib.abc
import importlib.util
import os
import sys


class Grumpifier(ast.NodeVisitor):
    def visit_FunctionDef(self, node):
        try:
            # Make it grumpy. We know that the 3 functions that print all have the same structure, so just hack into the ast.Constant node with the
            # string in it just to prove we can do this.
            node.body[0].value.args[0].value += " GRUMP"
        except IndexError:
            # Too lazy to do a fully worked example, so ignoring this bit, as it doesn't matter
            pass

        return node


class GrumpyFinder:
    def find_spec(self, fullname, path, target):
        potential_name = f"{fullname}.py"

        if os.path.exists(potential_name):
            return importlib.util.spec_from_loader(
                fullname, GrumpyLoader(), origin=potential_name
            )

        return None


class GrumpyLoader(importlib.abc.SourceLoader):
    def get_data(self, path):
        source_tree = ast.parse(open(path, "r").read())

        transformer = Grumpifier()
        transformer.visit(source_tree)

        return source_tree

    def get_filename(self, fullname):
        return f"{fullname}.py"


if __name__ == "__main__":
    sys.meta_path.insert(0, GrumpyFinder())
    sys.path_hooks.insert(0, GrumpyLoader)

    import things

    things.do_things()

The output of which is:

Doing the first thing GRUMP
Doing the second thing GRUMP
Doing the last thing GRUMP

Obviously, if you’re trying to do this kind of thing, you’ll want to do something more complex than what I did in this example, but it shows you the kind of thing that you can do if you put your mind to it…

Dark Arts - Mastered

So, if you made it this far, you’re either Guido van Rossum, Severus Snape, or someone facing a headache in trying to work out how the import system works in Python. Hopefully this helped a bit. There’s a lot of more specialised loaders and other facilities that I haven’t gotten into (mainly because I never needed them), but this will hopefully help shine a relatively low-power light into the darkness.

A big shout out to GJB for discussion and thoughts on this, as well as some helpful pointers on how this stuff works!


  1. Black belt here is for those people that delve into metaclass stuff. ↩︎

  2. Grumpy Metal Snape indeed. ↩︎

  3. Apart from the bit about Rust. You know the answer to that already, you clever thing you. ↩︎