Escolar Documentos
Profissional Documentos
Cultura Documentos
.. warning::
The language reference for import [10]_ and importlib documentation
[11]_ now supersede this PEP. This document is no longer updated
and provided for historical purposes only.
Abstract
========
This PEP proposes to add a new set of import hooks that offer better
customization of the Python import mechanism. Contrary to the current
``__import__`` hook, a new-style hook can be injected into the existing
scheme, allowing for a finer grained control of how modules are found and how
they are loaded.
Motivation
==========
The only way to customize the import mechanism is currently to override the
built-in ``__import__`` function. However, overriding ``__import__`` has many
problems. To begin with:
The situation gets worse when you need to extend the import mechanism from C:
it's currently impossible, apart from hacking Python's ``import.c`` or
reimplementing much of ``import.c`` from scratch.
There is a fairly long history of tools written in Python that allow extending
the import mechanism in various way, based on the ``__import__`` hook. The
Standard Library includes two such tools: ``ihooks.py`` (by GvR) and
``imputil.py`` [1]_ (Greg Stein), but perhaps the most famous is ``iu.py`` by
Gordon McMillan, available as part of his Installer package. Their usefulness
is somewhat limited because they are written in Python; bootstrapping issues
need to worked around as you can't load the module containing the hook with
the hook itself. So if you want the entire Standard Library to be loadable
from an import hook, the hook must be written in C.
Use cases
=========
This section lists several existing applications that depend on import hooks.
1
Among these, a lot of duplicate work was done that could have been saved if
there had been a more flexible import hook at the time. This PEP should make
life a lot easier for similar projects in the future.
Extending the import mechanism is needed when you want to load modules that
are stored in a non-standard way. Examples include modules that are bundled
together in an archive; byte code that is not stored in a ``pyc`` formatted
file; modules that are loaded from a database over a network.
The work on this PEP was partly triggered by the implementation of PEP 273,
which adds imports from Zip archives as a built-in feature to Python. While
the PEP itself was widely accepted as a must-have feature, the implementation
left a few things to desire. For one thing it went through great lengths to
integrate itself with ``import.c``, adding lots of code that was either
specific for Zip file imports or *not* specific to Zip imports, yet was not
generally useful (or even desirable) either. Yet the PEP 273 implementation
can hardly be blamed for this: it is simply extremely hard to do, given the
current state of ``import.c``.
Packaging applications for end users is a typical use case for import hooks,
if not *the* typical use case. Distributing lots of source or ``pyc`` files
around is not always appropriate (let alone a separate Python installation),
so there is a frequent desire to package all needed modules in a single file.
So frequent in fact that multiple solutions have been implemented over the
years.
The oldest one is included with the Python source code: Freeze [2]_. It puts
marshalled byte code into static objects in C source code. Freeze's "import
hook" is hard wired into ``import.c``, and has a couple of issues. Later
solutions include Fredrik Lundh's Squeeze, Gordon McMillan's Installer, and
Thomas Heller's py2exe [3]_. MacPython ships with a tool called
``BuildApplication``.
Before work on the design and implementation of this PEP was started, a new
``BuildApplication``-like tool for Mac OS X prompted one of the authors of
this PEP (JvR) to expose the table of frozen modules to Python, in the ``imp``
module. The main reason was to be able to use the freeze import hook
(avoiding fancy ``__import__`` support), yet to also be able to supply a set
of modules at runtime. This resulted in issue #642578 [4]_, which was
mysteriously accepted (mostly because nobody seemed to care either way ;-).
Yet it is completely superfluous when this PEP gets accepted, as it offers a
much nicer and general way to do the same thing.
Rationale
=========
Traversing ``sys.path_hooks`` for each path item for each new import can be
expensive, so the results are cached in another new object in the ``sys``
module: ``sys.path_importer_cache``. It maps ``sys.path`` entries to importer
objects.
A question was raised: what about importers that don't need *any* entry on
``sys.path``? (Built-in and frozen modules fall into that category.) Again,
Gordon McMillan to the rescue: ``iu.py`` contains a thing he calls the
*metapath*. In this PEP's implementation, it's a list of importer objects
that is traversed *before* ``sys.path``. This list is yet another new object
in the ``sys`` module: ``sys.meta_path``. Currently, this list is empty by
default, and frozen and built-in module imports are done after traversing
``sys.meta_path``, but still before ``sys.path``.
3
The Importer Protocol operates at this level of *individual* imports. By the
time an importer gets a request for "spam.ham", module "spam" has already been
imported.
The protocol involves two objects: a *finder* and a *loader*. A finder object
has a single method::
finder.find_module(fullname, path=None)
This method will be called with the fully qualified name of the module. If
the finder is installed on ``sys.meta_path``, it will receive a second
argument, which is ``None`` for a top-level module, or ``package.__path__``
for submodules or subpackages [5]_. It should return a loader object if the
module was found, or ``None`` if it wasn't. If ``find_module()`` raises an
exception, it will be propagated to the caller, aborting the import.
loader.load_module(fullname)
In many cases the finder and loader can be one and the same object:
``finder.find_module()`` would just return ``self``.
The ``fullname`` argument of both methods is the fully qualified module name,
for example "spam.eggs.ham". As explained above, when
``finder.find_module("spam.eggs.ham")`` is called, "spam.eggs" has already
been imported and added to ``sys.modules``. However, the ``find_module()``
method isn't necessarily always called during an actual import: meta tools
that analyze import dependencies (such as freeze, Installer or py2exe) don't
actually load modules, so a finder shouldn't *depend* on the parent package
being available in ``sys.modules``.
Note that the module object *must* be in ``sys.modules`` before the loader
executes the module code. This is crucial because the module code may
(directly or indirectly) import itself; adding it to ``sys.modules``
beforehand prevents unbounded recursion in the worst case and multiple
loading in the best.
If the load fails, the loader needs to remove any module it may have
inserted into ``sys.modules``. If the module was already in ``sys.modules``
then the loader should leave it alone.
* The ``__file__`` attribute must be set. This must be a string, but it may
be a dummy value, for example "<frozen>". The privilege of not having a
``__file__`` attribute at all is reserved for built-in modules.
There are two types of import hooks: *Meta hooks* and *Path hooks*. Meta
hooks are called at the start of import processing, before any other import
processing (so that meta hooks can override ``sys.path`` processing, frozen
modules, or even built-in modules). To register a meta hook, simply add the
finder object to ``sys.meta_path`` (the list of registered meta hooks).
5
Just like ``sys.path`` itself, the new ``sys`` variables must have specific
types:
To retrieve the data for arbitrary "files" from the underlying storage
backend, loader objects may supply a method named ``get_data()``::
loader.get_data(path)
This method returns the data as a string, or raise ``IOError`` if the "file"
wasn't found. The data is always returned as if "binary" mode was used -
there is no CRLF translation of text files, for example. It is meant for
importers that have some file-system-like properties. The 'path' argument is
a path that can be constructed by munging ``module.__file__`` (or
``pkg.__path__`` items) with the ``os.path.*`` functions, for example::
d = os.path.dirname(__file__)
data = __loader__.get_data(os.path.join(d, "logo.gif"))
The following set of methods may be implemented if support for (for example)
Freeze-like tools is desirable. It consists of three additional methods
which, to make it easier for the caller, each of which should be implemented,
or none at all::
loader.is_package(fullname)
loader.get_code(fullname)
loader.get_source(fullname)
All three methods should raise ``ImportError`` if the module wasn't found.
To support execution of modules as scripts [6]_, the above three methods for
finding the code associated with a module must be implemented. In addition to
those methods, the following method may be provided in order to allow the
``runpy`` module to correctly set the ``__file__`` attribute::
loader.get_filename(fullname)
This method should return the value that ``__file__`` would be set to if the
named module was loaded. If the module is not found, then ``ImportError``
should be raised.
The new import hooks are not easily integrated in the existing
``imp.find_module()`` and ``imp.load_module()`` calls. It's questionable
whether it's possible at all without breaking code; it is better to simply add
a new function to the ``imp`` module. The meaning of the existing
``imp.find_module()`` and ``imp.load_module()`` calls changes from: "they
expose the built-in import mechanism" to "they expose the basic *unhooked*
built-in import mechanism". They simply won't invoke any import hooks. A new
``imp`` module function is proposed (but not yet implemented) under the name
``get_loader()``, which is used as in the following pattern::
Note that this wrapper is currently not yet implemented, although a Python
prototype exists in the ``test_importhooks.py`` script (the ``ImpWrapper``
class) included with the patch.
Forward Compatibility
=====================
Existing ``__import__`` hooks will not invoke new-style hooks by magic, unless
they call the original ``__import__`` function as a fallback. For example,
``ihooks.py``, ``iu.py`` and ``imputil.py`` are in this sense not forward
compatible with this PEP.
Open Issues
===========
Modules often need supporting data files to do their job, particularly in the
case of complex packages or full applications. Current practice is generally
to locate such files via ``sys.path`` (or a ``package.__path__`` attribute).
This approach will not work, in general, for modules loaded via an import
7
hook.
* Locate data files from a standard location, rather than relative to the
module file. A relatively simple approach (which is supported by
distutils) would be to locate data files based on ``sys.prefix`` (or
``sys.exec_prefix``). For example, looking in
``os.path.join(sys.prefix, "data", package_name)``.
* Import hooks could offer a standard way of getting at data files relative
to the module file. The standard ``zipimport`` object provides a method
``get_data(name)`` which returns the content of the "file" called ``name``,
as a string. To allow modules to get at the importer object, ``zipimport``
also adds an attribute ``__loader__`` to the module, containing the
``zipimport`` object used to load the module. If such an approach is used,
it is important that client code takes care not to break if the
``get_data()`` method is not available, so it is not clear that this
approach offers a general answer to the problem.
There is no specific support within this PEP for "stacking" hooks. For
example, it is not obvious how to write a hook to load modules from ``tar.gz``
files by combining separate hooks to load modules from ``.tar`` and ``.gz``
files. However, there is no support for such stacking in the existing hook
mechanisms (either the basic "replace ``__import__``" method, or any of the
existing import hook modules) and so this functionality is not an obvious
requirement of the new mechanism. It may be worth considering as a future
enhancement, however.
Implementation
==============
8
The PEP 302 implementation has been integrated with Python as of 2.3a1. An
earlier version is available as patch #652586 [9]_, but more interestingly,
the issue contains a fairly detailed history of the development and design.
PEP 273 has been implemented using PEP 302's import hooks.
Copyright
=========
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: