How virtual environment libraries work in Python
Have you ever wondered what happens when you activate a virtual environment and how it works internally? Here is a quick overview of internals behind popular virtual environments, e.g., virtualenv, virtualenvwrapper, conda, pipenv.
Initially, Python didn't have built-in support for virtual environments, and such feature was implemented as a hack. As it turns out, this hack is based on a simple concept.
When Python starts its interpreter, it searches for the site-specific directory where all packages are stored. The search starts at the parent directory of a Python executable location and continues by backtracking the path (i.e., looking at the parent directories) until it reaches the root directory. To determine if it's a site-specific directory, Python looks for the os.py
module, which is a mandatory requirement by Python in order to work.
Let's suppose our Python binary is located at /usr/dev/bin/python
. The search pattern will look as follows:
/usr/dev/lib/python3.7/os.py
/usr/lib/python3.7/os.py
/lib/python3.7/os.py
As you can see, Python adds a special prefix (lib/python$VERSION/os.py
). When interpreter finds the first occurrence of the os
module it sets the sys.prefix
and sys.exec_prefix
to the found location with prefix removed from the path. If there is none found, Python uses a hardcoded prefix.
Now, let's see how an old and well-known virtualenv
library creates its virtual environments:
user@arb:/usr/home/test# virtualenv ENV
Running virtualenv with interpreter /usr/bin/python3
New python executable in /usr/home/test/ENV/bin/python3
Also creating executable in /usr/home/test/ENV/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
After execution, it creates additional directory:
user@arb:/usr/home/test/ENV# tree -L 3
.
├── bin
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── activate_this.py
│ ├── easy_install
│ ├── easy_install-3.7
│ ├── pip
│ ├── pip3
│ ├── pip3.7
│ ├── python
│ ├── python-config
│ ├── python3 -> python
│ ├── python3.7 -> python
│ └── wheel
├── include
│ └── python3.7m -> /usr/include/python3.7m
├── lib
│ └── python3.7
│ ├── __future__.py -> /usr/lib/python3.7/__future__.py
│ ├── __pycache__
│ ├── _bootlocale.py -> /usr/lib/python3.7/_bootlocale.py
│ ├── _collections_abc.py -> /usr/lib/python3.7/_collections_abc.py
│ ├── _dummy_thread.py -> /usr/lib/python3.7/_dummy_thread.py
│ ├── _weakrefset.py -> /usr/lib/python3.7/_weakrefset.py
│ ├── abc.py -> /usr/lib/python3.7/abc.py
│ ├── base64.py -> /usr/lib/python3.7/base64.py
│ ├── bisect.py -> /usr/lib/python3.7/bisect.py
│ ├── codecs.py -> /usr/lib/python3.7/codecs.py
│ ├── collections -> /usr/lib/python3.7/collections
│ ├── config-3.7m-darwin -> /usr/lib/python3.7/config-3.7m-darwin
│ ├── copy.py -> /usr/lib/python3.7/copy.py
│ ├── copyreg.py -> /usr/lib/python3.7/copyreg.py
│ ├── distutils
│ ├── encodings -> /usr/lib/python3.7/encodings
│ ├── enum.py -> /usr/lib/python3.7/enum.py
│ ├── fnmatch.py -> /usr/lib/python3.7/fnmatch.py
│ ├── functools.py -> /usr/lib/python3.7/functools.py
│ ├── genericpath.py -> /usr/lib/python3.7/genericpath.py
│ ├── hashlib.py -> /usr/lib/python3.7/hashlib.py
│ ├── heapq.py -> /usr/lib/python3.7/heapq.py
│ ├── hmac.py -> /usr/lib/python3.7/hmac.py
│ ├── imp.py -> /usr/lib/python3.7/imp.py
│ ├── importlib -> /usr/lib/python3.7/importlib
│ ├── io.py -> /usr/lib/python3.7/io.py
│ ├── keyword.py -> /usr/lib/python3.7/keyword.py
│ ├── lib-dynload -> /usr/lib/python3.7/lib-dynload
│ ├── linecache.py -> /usr/lib/python3.7/linecache.py
│ ├── locale.py -> /usr/lib/python3.7/locale.py
│ ├── no-global-site-packages.txt
│ ├── ntpath.py -> /usr/lib/python3.7/ntpath.py
│ ├── operator.py -> /usr/lib/python3.7/operator.py
│ ├── orig-prefix.txt
│ ├── os.py -> /usr/lib/python3.7/os.py
│ ├── posixpath.py -> /usr/lib/python3.7/posixpath.py
│ ├── random.py -> /usr/lib/python3.7/random.py
│ ├── re.py -> /usr/lib/python3.7/re.py
│ ├── readline.so -> /usr/lib/python3.7/lib-dynload/readline.cpython-37m-darwin.so
│ ├── reprlib.py -> /usr/lib/python3.7/reprlib.py
│ ├── rlcompleter.py -> /usr/lib/python3.7/rlcompleter.py
│ ├── shutil.py -> /usr/lib/python3.7/shutil.py
│ ├── site-packages
│ ├── site.py
│ ├── sre_compile.py -> /usr/lib/python3.7/sre_compile.py
│ ├── sre_constants.py -> /usr/lib/python3.7/sre_constants.py
│ ├── sre_parse.py -> /usr/lib/python3.7/sre_parse.py
│ ├── stat.py -> /usr/lib/python3.7/stat.py
│ ├── struct.py -> /usr/lib/python3.7/struct.py
│ ├── tarfile.py -> /usr/lib/python3.7/tarfile.py
│ ├── tempfile.py -> /usr/lib/python3.7/tempfile.py
│ ├── token.py -> /usr/lib/python3.7/token.py
│ ├── tokenize.py -> /usr/lib/python3.7/tokenize.py
│ ├── types.py -> /usr/lib/python3.7/types.py
│ ├── warnings.py -> /usr/lib/python3.7/warnings.py
│ └── weakref.py -> /usr/lib/python3.7/weakref.py
└── pip-selfcheck.json
As you can see, the environment was created by copying Python binary to a local directory (ENV/bin/python
). Also, the parent directory contains a lib
folder, which stores a collection of symlinks to standard library files. We can't create a symlink to the executable, because it will be dereferenced by the interpreter.
Now, let's activate our environment:
user@arb:/usr/home/test# source ENV/bin/activate
This command changes the $PATH (bash environment variable) in such way that the "python" command will point to our local version.
Basically, it prepends our local path of the bin
directory at first place, so it has a priority over all other locations:
export "/usr/home/test/ENV/bin:$PATH"
echo $PATH
If you run a Python script in such environment, Python process will be executed using the /usr/home/test/ENV/bin/python
executable. Thus, the interpreter will use this location as a starting point for its package finder. In our case, the site-specific directory will be found at the /usr/home/test/ENV/lib/python3.7/
.
That is the main idea of the hack, which most of the virtual environments libraries use under the hood.
Improvements in Python 3
Since Python 3.3, there is a new PEP 405 which introduces a mechanism for lightweight virtual environments.
This PEP adds a new step to the search process. By creating a pyvenv.cfg
file instead of copying Python binary and its modules you can specify their location in the config file.
That is how standard venv module works:
user@arb:/usr/home/test2# python3 -m venv ENV
user@arb:/usr/home/test2# tree -L 3
.
└── ENV
├── bin
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── easy_install
│ ├── easy_install-3.7
│ ├── pip
│ ├── pip3
│ ├── pip3.5
│ ├── python -> python3
│ └── python3 -> /usr/bin/python3
├── include
├── lib
│ └── python3.7
├── lib64 -> lib
├── pyvenv.cfg
└── share
└── python-wheels
user@arb:/usr/home/test2# cat ENV/pyvenv.cfg
home = /usr/bin
include-system-site-packages = false
version = 3.7.0
user@arb:/usr/home/test2# readlink ENV/bin/python3
/usr/bin/python3
Thanks to the config file, Instead of a copy of the executable, venv uses a symbolic link to it. If include-system-site-packages
is set to true
then all system-installed packages will be importable from the environment by prepending a system-specific directory to the sys.path
.
Despite this improvements, most of the third-party virtual environment libraries are still using the old approach.
Very good article. A small point - If 'include-system-site-packages' is set to true in 'pyvenv.cfg', then the system directory is appended to 'sys.path' (and not prepended).