pip-tools and pipdeptree

Introduction

I was looking for a way to update python dependencies that I'd installed with pip when I stumbled upon pip-tools. I'm not particularly good about keeping everything in sync and up-to-date so I'm hoping that this will make it easier to do and thus more likely that I'll do it. It's been a little while since I first used it and I had to look it up, so these are my notes to my future self.

First pipdeptree

pip-tools installs a command named pip-compile which will use either the requirements you put in your setup.py file or a special file named requirements.in (if you call it this you don't have to pass in the filename, otherwise you have to tell it where to look). Unless there's only a few requirements I prefer to use a separate file, rather than setup.py, since it makes it clearer and more likely that I'll keep it up to date. The requirements.in file is a list of your dependencies but unlike the requirements.txt file, it doesn't have version numbers, the version numbers are added when you call the pip-compile command.

So where does the requirements.in file come from? You have to make it. But if you're editing things by hand, doesn't this kind of make it less likely you'll maintain it? Yes, which is where pipdeptree comes in. pipdeptree will list all the python dependencies you installed as well as everything those dependencies pulled in as their dependencies. It's usefull to take a look at how a dependency you didn't directly install got into your virtual environment. You can install it from pypi.

pip install pipdeptree

Here's its help output.

pipdeptree -h
usage: pipdeptree [-h] [-v] [-f] [-a] [-l] [-u] [-w [{silence,suppress,fail}]]
                  [-r] [-p PACKAGES] [-j] [--json-tree]
                  [--graph-output OUTPUT_FORMAT]

Dependency tree of the installed python packages

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -f, --freeze          Print names so as to write freeze files
  -a, --all             list all deps at top level
  -l, --local-only      If in a virtualenv that has global access do not show
                        globally installed packages
  -u, --user-only       Only show installations in the user site dir
  -w [{silence,suppress,fail}], --warn [{silence,suppress,fail}]
                        Warning control. "suppress" will show warnings but
                        return 0 whether or not they are present. "silence"
                        will not show warnings at all and always return 0.
                        "fail" will show warnings and return 1 if any are
                        present. The default is "suppress".
  -r, --reverse         Shows the dependency tree in the reverse fashion ie.
                        the sub-dependencies are listed with the list of
                        packages that need them under them.
  -p PACKAGES, --packages PACKAGES
                        Comma separated list of select packages to show in the
                        output. If set, --all will be ignored.
  -j, --json            Display dependency tree as json. This will yield "raw"
                        output that may be used by external tools. This option
                        overrides all other options.
  --json-tree           Display dependency tree as json which is nested the
                        same way as the plain text output printed by default.
                        This option overrides all other options (except
                        --json).
  --graph-output OUTPUT_FORMAT
                        Print a dependency graph in the specified output
                        format. Available are all formats supported by
                        GraphViz, e.g.: dot, jpeg, pdf, png, svg

If you look at the options you can see that there's a --freeze option, that's what we'll be using. Let's look at some of what that looks like.

pipdeptree --freeze | head

So it looks like the output of pip freeze except it puts the packages you installed flush-left and then uses indentation to indicate what that package installed. In the example above, I installed Nikola, then Nikola installed doit, and doit installed cloudpickle and pyinotify. I kind of remember needing to install pyinotify myself, but maybe pydeptree caught that it was a dependency that doit is using. Anyway.

For our requirements.in file we only want the names, and although there might be a reason to keep the entire tree, I think it makes it easier to understand what I'm using if the file only holds the dependencies at the top-level (the ones that I'm using directly, rather than being a dependency of a dependency). So, we'll use a little grep. First, since I'm a python-programmer I'm going to give it the -P flag to use perl escape codes. Next, we want to only match the lines that have alpha-numeric characters as the first character in the line.

grep Description
-P, --perl-regexp Use perl regular expression syntax
^ Match the beggining of a line
\w Match alpha-numeric character and underscores
+ Match one or more

First, let's see how many total dependencies there are.

pipdeptree --freeze | wc -l
: 160

So there are 160 dependencies total. How many did I install?

pipdeptree --freeze | grep --perl-regexp "^\w+" | wc -l

Out of the 160 only 11 were directly installed by me.

So we're done, right? Not yet, we need to get rid of the == and version numbers. I hadn't known that grep had this feature, since I normally use python instead of grep, but grep has an --only-matching option that will discard the parts of the line that don't match.

grep Description
-o, --only-matching Only show the parts of the line that match
pipdeptree --freeze | grep --only-matching --perl-regexp "^\w+"

If you look at the first entry you'll notice it says ghp, but the actual name of the package is ghp-import, but the hyphen isn't part of the alpha-numeric set, so we'll have to add it.

grep Description
[] Match one or the entries in the brackets
pipdeptree --freeze | grep -oP "^[\w\-]+"

This looks like what we want, but there's a couple of things that we should take care of that would happen if this were for an installed package.

  • there's a bug in ubuntu that causes pip to include pkg-resources, which isn't something you can install.
  • it will add an extra entry for your python egg, something like this:
-e git+git@github.com:russell-n/iperflexer.git@65f4d3ca72670591f584efa6fa9bfd64c18a925b#egg=iperflexer

So we should filter those out.

grep Description
-v, --invert-match Return lines that don't match
pipdeptree --freeze | grep --only-matching --perl-regexp "^[\w\-]+" | grep --invert-match "\-e\|pkg"
ghp-import2
graphviz
Nikola
notebook
pip-tools
pipdeptree
virtualfish
watchdog
webassets
wheel
ws4py

There are probaby other exceptions that have to be added for other installations, but this looks like enough for us. Now we can redirect this to a requirements.in file and we're ready for pip-tools.

pipdeptree --freeze | grep --only-matching --perl-regexp "^[\w\-]+" | grep --invert-match "\-e\|pkg" > requirements.in

pip-compile

pip-compile will read in the requirements.in file and add the version numbers and can create a requirements.txt file. It will automatically look for the requirements.in file or you can explicitly pass in the filename.

pip-compile | head
#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile --output-file requirements.txt requirements.in
#
argh==0.26.2              # via watchdog
backcall==0.1.0           # via ipython
bleach==2.1.3             # via nbconvert
blinker==1.4              # via nikola

You'll notice it adds in the dependencies of the dependencies and shows what requries them.

Well, that was a lot of work just for that.

If we stopped at this point we'd have:

  • a way to check who installed what using pipdeptree (as well as a way to plot the dependencies as a graph)
  • a way to separate out our dependencies into a separate file (requirements.in) to make it easier to read
  • a way to create our requirements.txt file using our requirements.in file

I think that's kind of nice already, especially if you end up with a lot of dependencies. Try working with sphinx and scikit-learn and you'll see things start to explode. But of course, there's always more.

Upgrade

You can run pip-compile with the --upgrade option to try and update dependencies whenever you want to make sure you have the latest of everything (you can do it per-package too, but nah).

pip-compile --upgrade | head
#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile --output-file requirements.txt requirements.in
#
argh==0.26.2              # via watchdog
backcall==0.1.0           # via ipython
bleach==2.1.3             # via nbconvert
blinker==1.4              # via nikola

This will upgrade your installation but not update the requirements.txt file, so you can test it out and see if everything works before updating the requirements.txt. If things don't work out, you could reinstall from the requirements.txt file, but see the next section for another way.

Sync

pip-tools also installed a command called pip-sync which will keep you in sync with what is in the requirements file, so as long as requirements.txt is always a working version, you can sync up with it to avoid problems with changes in any of the dependencies. This is different from the --upgrade option in that it will only install the exact version in the requirements file.

pip-sync
Collecting backcall==0.1.0
Collecting bleach==2.1.3
  Using cached https://files.pythonhosted.org/packages/30/b6/a8cffbb9ab4b62b557c22703163735210e9cd857d533740c64e1467d228e/bleach-2.1.3-py2.py3-none-any.whl
Collecting certifi==2018.4.16
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting cloudpickle==0.5.3
  Using cached https://files.pythonhosted.org/packages/e7/bf/60ae7ec1e8c6742d2abbb6819c39a48ee796793bcdb7e1d5e41a3e379ddd/cloudpickle-0.5.3-py2.py3-none-any.whl
Successfully installed backcall-0.1.0 bleach-2.1.3 certifi-2018.4.16 cloudpickle-0.5.3 decorator-4.3.0 doit-0.31.1 ipykernel-4.8.2 ipython-6.4.0 jedi-0.12.0 jupyter-client-5.2.3 logbook-1.4.0 lxml-4.2.1 natsort-5.3.2 nikola-7.8.15 notebook-5.5.0 parso-0.2.1 pexpect-4.6.0 pillow-5.1.0 python-dateutil-2.7.3 send2trash-1.5.0 tornado-5.0.2 virtualenv-16.0.0 virtualfish-1.0.6 wheel-0.31.1 ws4py-0.5.1

Since I upgraded the installation the requirements.txt file is now behind the latests versions so by syncing I undid the upgrade. This time I'll upgrade again and save the output.

pip-compile --upgrade

So now the file and my installation should be in sync.

pip-sync
: Everything up-to-date

Conclusion

So there you have it, how to keep dependencies synced. The README for pip-tools is much briefer, but I thought I'd add a little more detail to the part of it that I plan to use the most.