Python software packagers and installers space is so crowded and fragmented that it has been the subject of its own XKCD comic.
The Python Packaging Authority (PyPA) is a working group that maintains a core set of projects used in Python packaging. The software developed through the PyPA is used to package, share, and install Python software and to interact with indexes of downloadable Python software such as PyPI, the Python Package Index. The PyPA publishes the Python Packaging User Guide as the authoritative resource on how to package, publish, and install Python projects using current tools.
While reading the packaging user guide is a good idea, here I have summarized my experience of dealing with packaging in Python as a developer, and tried to make sense of the many tools and practices that have emerged in this area through the long life of Python.
Packaging Format
A Python distribution is a set of one or more Python packages, modules or extensions, which is packaged in a certain way that allows it to be easily distributed and installed on other systems. A distribution could be a source (or pure) or binary (non-pure) distribution, depending on whether it includes at least one Python extension. It mentions what packages it provides, what packages it obsoletes, and which other Python packages it requires as a dependency. Distribution name, version, author, url and license are also specified among other metadata.
Python extensions are software which has been written to interface with the
low-level native implementation language of Python interpreter (C for CPython,
Java for Jython, etc.) and therefore needs a separate build step either
before being made into a binary distribution, or after a
source distribution is unpacked. These are typically contained in a single
dynamically loadable pre-compiled file, like .so
or .dll
.
Build instructions for Python extensions is another responsibility of
distribution description.
There are several packaging formats for Python software:
- source archive: An archive containing the raw source code for a release.
- sdist: A distribution format that provides metadata and the essential source files needed for installing the software, or generating a build distribution.
- bdist: A built distribution format introduced by
distutils
that contains compiled extensions. - egg: A built distribution format introduced by
setuptools
. Python Eggs are a way of bundling additional information with a Python project, that allows the project’s dependencies to be checked and satisfied at runtime, as well as allowing projects to provide plugins for other projects. Python eggs are being replaced by wheel format. - wheel: Current preferred format for binary distributions.
It was introduced by later versions of
setuptool
as a replacement for egg packages. - zipapp: Since Python 3.5, the
zipapp
builtin module provides tools to manage the creation of zip files containing Python code, which can be executed directly by the Python interpreter. The module provides both a Command-Line Interface and a Python API.
These are some of the tools which have been used to create and publish Python distributions in chronological order:
This tool laid the initial ground for how Python distributions should look
like. It depends on a mandatory setup.py
file which is a configuration
file written in Python and is also executable and an optional setup.cfg
file which provides default values for supported distutils
commands.
distutils
which used to be a separate project, has became part of standard
library since Python 1.6. While there has been substantial improvements
since then, the introduced distribution format and its respective
configuration files have been more or less kept backward compatible.
This tool can perform these tasks on a set of Python packages (package in
Python parlance), Python modules, or Python extensions.
- build
: only applicable to Python extensions
- clean
- check
- install
- package (sdist
and bdist
for source and binary distributions in various formats)
- register
: submit distribution metadata to the PyPI index server
- upload
: upload the actual distribution file to PyPI
distutils2
is the packaging library that was planned to supersede
distutils
, but the efforts have been merged and development has stopped.
setuptools
(which includes easy_install
) is a collection of enhancements
to the Python distutils
that allow you to more easily build and distribute
Python distributions, especially ones that have dependencies on other packages.
This tool has mostly superseded distutils
with superior functionality,
but is completely backward-compatible. It has been part of a Python
installation since 2.3. In 2013, distribute
, a fork of setuptools
,
was merged back into it, thereby making setuptools
the default choice
for packaging.
Additionally setuptools
offers the option to create Python Eggs, which
is another format for Python software distribution.
Before creating Python distributions using setuptools
, make sure that
lastest versions of necessary tools are installed:
python3 -m pip install --user --upgrade setuptools wheel
Then run this command from the same directory where setup.py
is located:
python3 setup.py sdist bdist_wheel
Primarily, the wheel project offers the bdist_wheel
setuptools extension
for creating wheel distributions. Additionally, it offers its own command
line utility for creating and installing wheels.
Wheel produces a cross platform binary packaging format (called wheels or wheel files and defined in PEP 427) that allows Python libraries, even those including binary extensions, to be installed on a system without needing to be built locally.
This is the latest tool preferred by PyPI infrastructure for uploading distributions. Twine improves the interaction with PyPI by providing more security (verified HTTPS communication and digital signature for the uploading file) and testability. It’s important to note that this tool is only made for the uploading part. Building and creating the actual distribution file still has to be done using former tools.
PyPI has a test instance at TestPyPI where you can register for an account and start testing and experimenting with your distribution before uploading to the main instance.
python3 -m pip install --user --upgrade twine
python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps <name>
There has also been many projects that have tried to create directly executable installers from Python software, mostly to match the traditional way software was targeted for Windows and Mac systems. What these tools have in common is that they do not depend on an existing Python installation on the system and bring their own. Some also offer code obfuscation and tamper resistance.
- bbfreeze
- cx_freeze
- py2exe
- py2app
- PyInstaller
- pyarmor
- PyOxidizer
Installation
In addition to aforementioned packaging tools which also offer install functionality among other things, there are tools which are specifically created to install packaged Python distributions. They provide the added value of being able to fetch the distribution file from software repositories on the Internet.
All these tools require an existing Python installation.
easy_install
easy_install
was released in 2004, as part of setuptools
. It was
notable at the time for installing packages from PyPI using requirement
specifiers, and automatically installing dependencies. It lets you
automatically download, build, install, upgrade and manage Python packages.
There are several methods to install setuptools
- on Linux and other systems with a software respository, use the
appropriate package manager like `apt`, `yum`, `brew`, etc.
For example:
apt-get install python-setuptools
- download the distribution from PyPI for you matching Python version,
then extract and run the install script.
- run `ez_setup.py` which is a Python script available on the Internet
made to bootstrap `setuptools` installation or upgrade:
curl https://bootstrap.pypa.io/ez_setup.py | python
easy_install
has been deprecated in favor of pip
. So if pip
is not
available and easy_install
is, this is the only command you need:
easy_install pip
pip
pip
came in 2008, as a replacement to easy_install
, although still
largely built on top of setuptools
components. It was notable at the
time for not installing packages as Eggs or from Eggs (but rather simply
as flat packages from sdists), and introducing the idea of
Requirements Files, which gave users the power to easily replicate
environments. pip
could also install from wheels, and uninstall Python
distributions. Newer pip
versions preferentially try to download and
install built distribution for the target platform first, but will fall
back to source archives otherwise.
pip
has become part of a standard Python installation since Python 3.4,
via the ensurepip
builtin module. The command python -m ensurepip
can
be run to bootstrap the installation of pip
command into an existing
Python installation or virtual environment. This command does not hit
the Internet and uses a bundled pip
package.
In most cases, end users of Python shouldn’t need to invoke this module
directly (as pip
should be bootstrapped by default during Python
installation), but it may be needed if installing pip
was skipped when
installing Python (or when creating a virtual environment), corrupted or
after pip
was explicitly uninstalled.
By default, ensurepip
installs the scripts pipX
and pipX.Y
(where
X.Y stands for the current version of Python), But if --default-pip
command line option is set, then pip
will be also be installed as alias
pip
in addition to the two regular scripts. This means that on Python 3,
if an external pip
has not been explicitly installed (via easy_install
or an existing pip
), the actual command name would be pip3
or pip3.Y
and there would be no pip
command.
An alternative method for installing pip
is by downloading nad running
the get-pip.py
script.
curl https://bootstrap.pypa.io/get-pip.py | python
pip
can also be installed using the operating system package manager:
apt-get install python-pip
pip
introduced the support for requirements.txt
file, which is a file
containing names and versions of all Pythons distributions required by
a software project to be installed or upgraded in one go.
Example command:
pip freeze > requirements.txt
and then later:
pip install --upgrade -r requirements.txt
Read this post for a simple and effective method to keep the requirements file up to date.
Repositories
Although PyPI has been the ubiquitous software repository for Python developers for a long time, the public nature of it might not meet everyone’s needs.
Note that the original Python Package Index implementation (previously hosted at [pypi.python.org]) has been phased out in favour of an updated implementation hosted at [pypi.org].
These tools provide alternatives for creating Python software repositories:
This is the current software that runs on the PyPI platform. It might be possible to host this locally to be used as an internal repository.
devpi
features a powerful PyPI-compatible server and PyPI proxy cache
with a complimentary command line tool to drive packaging, testing and
release activities with Python.
Virtual Environments
While system-wide and per-user Python distribution installation has been supported from the early days, per-project installation is the preferred installation method as of late. To meet this demand, virtual Python environments were introduced, which are basically a complete Python installation along with its installed set of third-party Python packages and modules and extensions confined in a separate directory, which can be activated and deactivated as needed. The virtual environment directory can be created and removed as needed without any impact on any other Python installation on the system. Virtual Python environments can be built upon different Python versions and even different Python interpreter implementations.
Following is a review of various tools that I’ve encountered for managing Python virtual environments. Most of these tools require an existing Python installation.
An innovative hack on Python interpreter search path that enables creating
separate and isolated directories to install Python packages and executables
in. It allows not having to worry about other Python programs when installing,
removing or updating Python packages. Getting rid of all Python packages and
executables that were installed for project is possible by simply removing
the virtualenv
directory for that project.
Combined with the ease of package installation provided by pip
and a
requirements.txt
file, virtualenv
became an invaluable part of a Python
developer’s workflow. Does anyone remember when Python packages could only
be installed by running python setup.py install
in package source directory?
requirements.txt
is a text file keeping lists all Python package
dependencies for a project along with their exact or minimum and maximum
required version numbers, and it is usually shipped as part of the Python
package. pip freeze
is the command usually used to create this file.
virtualenvs need to be “activate”d before running Python interpreter,
because they manipulate Python’s module search path and OS’s executable
search path.
virtualenvwrapper
Simply a set of shell helpers to make working with virtualenv easier.
These helpers include mkvirtualenv
, rmvirtualenv
and workon
scripts.
virtualenvwrapper
also creates a separate hidden directory in user’s home
directory and uses it as a central place to put virtualenv directories in.
This reduces developer confusion by venv
directories being scattered
all over the file system.
pew
Another wrapper on top of virtualenv
.
pyvenv
Yet another wrapper on top of virtualenv
.
venv
virtualenvs have since become such a ubiquitous part of Python development
that for Python 3.3, the community decided to include the basic
functionality provided by virtualenv
in a builtin module named venv
.
When invoked from the command line (using python -m venv
) this module
can create virtualenvs just as virtualenv
previously did.
pyenv
It allows not only creating virtual environments, but also installing separate Python interpreter instances on the same system, thus allowing to test the code on different Python versions and implementations.
pipenv
pipenv
is Python community’s answer to what npm
does for JavaScript.
This new tool is so promising that it has rapidly become the recommended
way to manage Python environments in Python documentation.
pipenv
allows a separate set of development and production list of
requirements, and keeps a record of which package was installed explicitly
by the user and which one was just a dependency (what pip freeze
can not
do). virtualenvs created by pipenv
are an essential part of workflow and
do not require activating by the user, since you are expected to
run Python package commands via pipenv run
. If needed, pipenv shell
starts a shell with virtualenv activated.
Pipenv aims to bring the best of all packaging worlds to the Python world.
It harnesses Pipfile, pip
, and virtualenv
into one single toolchain.
Pipfile
and its sister Pipfile.lock
are a higher-level
application-centric alternative to pip’s lower-level requirements.txt
file.
Poetry helps you declare, manage and install dependencies of Python projects, ensuring you have the right stack everywhere.
A modern project, package, and virtual env manager for Python. Hatch is a productivity tool designed to make your workflow easier and more efficient, while also reducing the number of other tools you need to know.
pipsi
pip
script installer
pipsi
is a wrapper around virtualenv
and pip
which installs scripts
provided by python packages into isolated virtualenvs so they do not
pollute your system’s Python packages.
Development on pipsi
has been stopped and moved to
pipx
project.
A set of tools to keep your pinned Python dependencies fresh.
conda
is the package management tool for Anaconda Python installations.
Anaconda Python is a distribution specifically aimed at the scientific
community, and in particular on Windows where the installation of binary
extensions is often difficult.
Conda is a completely separate tool to pip
, virtualenv
and wheel
,
but provides many of their combined features in terms of package management,
virtual environment management and deployment of binary extensions.
Conda does not install packages from PyPI and can install only from the
official Anaconda repositories.
Other tools
Flit is a simple way to put Python packages and modules on PyPI.
Shiv is a command line utility for building fully self contained Python zipapps as outlined in PEP 441 but with all their dependencies included! Shiv’s primary goal is making distributing Python applications fast & easy.
docker
and other application container managers in general, are
the latest trend in application packaging, distribution and deployment
which provide application images independent from the programming
language, and even the operating system. These images are the entire
isolated file system that the application can access. So everything
from the entire application code, Python dependency modules,
compiled dependencies, Python interpreter and its system dependencies
like libc and OpenSSL must have been included.
Creating a docker container just to use it as a Python package virtual
environment might be overkill, but considering all the rest of goodies
offered by docker which might come handy in a near future, it is more than
worth it to put the time to create a Dockerfile
or docker-compose config.
Although one catch is you might still need to create the virtualenv
locally (outside the container) to enable the IDE to offer its
autocomplete features.
I hope these lists help you (and future me!) to make a quick decision regarding what tool to use when working on Python projects. I’ll try to keep this guide updated as Python software distribution ecosystem evolves even more. To read more, see Python Packaging User Guide.