What happens when you ‘pip install’ a package with Python?

So you’ve had a play around with the core modules that come with Python, and now you want to move beyond that. One of Python’s main strengths is the incomprehensibly large number of additional packages you can install to add all sorts of extra functionality to your environment.

You’ve heard about a package you’d like to use, and now you’ve seen some instruction that looks like:

pip install openpyxl

What does this mean? Where do you enter that and run it, and what is it doing?

What is a Python package?

IMPORTANT

The term ‘package’ refers to two different, though related things in Python depending on the context it’s used. They’re both concerned with the same bits of extra code that adds extra functions to Python, but represent that code in different stages and locations. To distinguish between them, we’ll call one a ‘PyPI distribution package’, and the other a ‘Python import package’. Let’s start with the basic ‘Python import package’.

When programming in Python, we most commonly refer to installed Python import packages as simply ‘packages’, which consist of one or more ‘modules’ (files) that add new classes, functions and variables to our code. When your code says something like “import pprint”, this is referring to a Python import package.

That functionality might sit in one extra file, or it might be split across several.

A kitchen tool equivalent of a Python import package could be a blender. It has several pieces that connect together, and comes with different functionality (eg. pulse, blend, crush ice).

If your Python environment is your kitchen, then “import blender” is the equivalent of getting your blender out of the cupboard so that it’s ready to use!

Where do Python packages come from?

Anyone can write a Python import package. Like Python itself, packages are open source, which means they’re generally free to copy and use. Often-times individual developers will write a package and share it, sometimes companies do it, and sometimes hobbyists. You can even write your own package!

Assuming someone else has created the package you want to install, how do you go about it?

There are two things you need to consider:

  1. Where can you find the package?
  2. Once you’ve found it, where do you put it?

1. As I’ve said, packages usually contain text files and folders. In theory, anyone could host these on their own website, and you could manually download them and add them to where they need to go. Fortunately there’s a much simpler way.

The Python community has a website called the Python Package Index (PyPI) https://pypi.org/, which is basically one big database where people can go and upload the packages they’ve written. This database can then be searched, you can see what’s new and trending, and you could manually download the files you need.

Often, Python packages allow Python to access other programs and programming languages. These packages turn Python commands into formats that other programs can understand (and vice versa), bridging them together.

Once it’s installed, each time you run Python and want to use those functions, you tell it to include your new package with the import command, and from that point on your script has access to the new functions.

Thankfully, there’s a much easier way, which will be discussed in the next section: using ‘pip’.

One thing that’s worth bearing in mind: packages found on PyPI could be written by anyone – they might not be very good, or they might be out of date. Python updates fairly often, and a package might have been written with an older version in mind. In rare cases, a package you want might not work with the latest version of Python, in which case, it will need to have access to an earlier version to be able to run.

In the worst-case scenario, a Python package might be malicious. Fortunately, this is pretty uncommon – these sit on a site for developers, and developers are pretty good at looking through source code and weeding out the bad eggs. Still, since there are literally thousands upon thousands of packages available with indeterminate quality, it’s a good idea to stick with popular ones, or ones with recognizable authors.

What is a PyPI distribution package?

When we’re talking about the process of adding this new functionality though, ‘package’ can also refer to a PyPI distribution package the file source for the installation (and there can be multiple sources, depending on the version of the project, the version of Python it’s meant for, and the platform it’s designed for). When you write “pip install requests”, it goes looking for a PyPI distribution package.

Here’s an example from the ‘filehash’ project:

As you can see, the filehash project has several packages (PyPI distribution packages, ie. versions) – one built for Python 2, one for Python 3, and another with the raw code. The ones listed are for version 0.1.dev5 – there may be previous versions (eg. v0.05, v0.09 etc.)

When you install this with pip, pip chooses the most appropriate PyPI distribution package, and creates a local Python package (the collection) with several modules.

To go back to our kitchen blender analogy, ‘pip install blender’ would be the equivalent of buying the blender, opening the box it comes in, putting the pieces together (base, blade, jug, lid), then storing it in your kitchen cupboard, ready to be brought out when you need the ‘blend’ function.

In this analogy, the word ‘package’ could refer to either a) the unassembled blender parts in the box, or b) the fully assembled blender that’s waiting to be brought out of the cupboard and used.

To be continued in Part II…