How to Use pip install in Python: The Definitive Guide

If you‘re getting started with Python development, one of the first things you‘ll need to learn is how to install external packages that provide additional functionality not included in the Python standard library. That‘s where pip comes in.

What is pip?

pip is the standard package manager for Python. It allows you to easily install, update, and remove Python packages. pip is a recursive acronym that stands for "pip installs packages" or "Pip Installs Python".

Python actually comes with a large standard library that provides a wide range of functionality. However, the real power of Python lies in the hundreds of thousands of additional packages created by the community. These packages allow you do things like:

  • Access web APIs and process data from the internet
  • Perform data analysis and machine learning
  • Create web applications and APIs
  • Interact with databases
  • Build desktop GUIs
  • And much, much more

To get a sense of the scale of packages available, let‘s take a look at the Python Package Index (PyPI). PyPI is the central repository where anyone can upload and distribute their Python packages. As of June 2023, PyPI hosts over 458,000 projects and has seen over 817 billion downloads total since its inception. The growth has been exponential – PyPI went from 1 billion downloads in 2009 to 100 billion in 2019 to over 800 billion just a few years later.

PyPI cumulative download statistics
Source: https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/

This massive ecosystem is both a blessing and a curse. The upside is that chances are if you need to do something in Python, there‘s already a package for it. The downside is that managing all these dependencies can quickly become overwhelming without the right tool. That‘s where pip comes in – it makes installing and managing Python packages as easy as running a single command.

Installing pip

First, let‘s cover how to install pip. There‘s a good chance pip is already installed on your system, especially if you‘re using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org. You can check if pip is installed by running one of the following commands in a terminal:

pip --version
pip3 --version

If pip is installed, you‘ll see output showing the pip version number. If you get an error message, that means pip isn‘t installed.

To install pip, first download the installation script, get-pip.py, by running the following command:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Then run the script with Python:

python get-pip.py

This will download and install the latest version of pip. Alternatively, if you installed Python from a package manager on Linux, you can usually install pip through the system package manager, for example:

sudo apt-get install python3-pip 

Basic pip commands

Now let‘s go over the basic pip commands that allow you to install, update, and remove packages.

To install the latest version of a package:

pip install package_name

To install a specific version:

pip install package_name==1.0.0

To upgrade a package to the latest version:

pip install --upgrade package_name

To uninstall a package:

pip uninstall package_name

To see details about an installed package including version, location, and dependencies:

pip show package_name

To list all installed packages:

pip list

These commands form the core of pip‘s functionality that you‘ll use on a daily basis. Note that you may need to use pip3 instead of pip if you have both Python 2 and Python 3 installed on your system.

Binary vs source distributions

When you run pip install, pip will attempt to install a wheel file first (binary distribution), and if that‘s not available, will fall back to installing from source (source distribution). Wheels are pre-built packages that provide faster installation compared to source distributions which require the package to be built on your machine.

pip will look for wheels on PyPI that match your operating system and Python version. If a compatible wheel is found, it‘s downloaded and installed directly. If no wheel is found, pip downloads the source distribution (usually a tar.gz file), extracts it, and runs the setup.py script to build and install the package.

You can explicitly request only binary distributions with the --only-binary flag:

pip install --only-binary=:all: package_name

Or only source distributions with --no-binary:

pip install --no-binary=:all: package_name

To download the package files without installing them, use pip download:

pip download package_name

And to install from a local or remote zip, tar.gz, or wheel file:

pip install package_name.whl
pip install path/to/package_name.tar.gz  
pip install http://my.package.repo/package_name-1.0.4.tar.gz

Using requirements files

Often you‘ll have a bunch of packages that you need to install for a given project. Instead of installing each one individually, you can list them in a requirements file and pip can install all of them at once.

A requirements file is just a text file containing a list of packages, one per line. You can optionally specify exact versions. For example:

requests==2.28.2
numpy==1.24.3 
matplotlib>=3.7.1

Save this file with a name like requirements.txt. Then to install all the packages, just run:

pip install -r requirements.txt

This makes it really easy to set up a new development environment with a single command. You can also use this method to deploy your application – just specify the production dependencies in the requirements file.

Virtual environments

By default, pip installs packages globally, meaning they are accessible across all your Python projects. However, sometimes different projects require different package versions. That‘s where virtual environments come in.

A virtual environment is an isolated Python environment with its own packages that won‘t interfere with packages in other environments or the global environment.

Python 3 actually comes with a built-in module for creating virtual environments called venv. To create a new virtual environment:

python -m venv myenv

This creates a new virtual environment in a folder named myenv. To activate the virtual environment:

source myenv/bin/activate

Your shell prompt should change to indicate the active environment. Now when you run pip install, the packages will get installed in the virtual environment folder only. To deactivate the environment, just run:

deactivate

Another popular tool for creating virtual environments is virtualenv which is a third-party package that you can install with pip:

pip install virtualenv

The usage is very similar to venv:

virtualenv myenv
source myenv/bin/activate

The main difference is that virtualenv supports both Python 2 and Python 3, while venv is Python 3 only. virtualenv also has a few more configuration options.

Whichever tool you choose, virtual environments are a best practice for Python development to avoid conflicts between projects.

pip configuration

You can configure various pip settings either through command line options, environment variables or a configuration file. Some common configuration options include:

  • Specifying a PyPI mirror or private repository
  • Setting a proxy server
  • Configuring trusted hosts
  • Changing cache directory
  • Setting timeouts and retry limits
  • Specifying default virtual environment

To use a configuration file, create a pip.conf file in $HOME/.config/pip/pip.conf on Linux, or %HOME%\pip\pip.ini on Windows. Here‘s an example configuration file:

[global]
timeout = 60
trusted-host = pypi.python.org
               pypi.org
               files.pythonhosted.org

For the full list of configuration options, check the pip configuration documentation: https://pip.pypa.io/en/stable/topics/configuration/

pip in CI/CD pipelines

pip plays a crucial role in continuous integration and deployment (CI/CD) pipelines for Python projects. When you push code changes to version control, you typically want your CI system (like Jenkins, GitLab CI, CircleCI, etc) to automatically build, test and deploy your application.

Part of that process involves installing your application‘s dependencies in the CI environment. That‘s usually done by having pip install the packages listed in the requirements file.

Here‘s a simplified example of what that might look like in a .gitlab-ci.yml file:

image: python:3.9

stages:
  - build
  - test
  - deploy

build:
  stage: build
  script:
    - pip install -r requirements.txt
    - python setup.py sdist bdist_wheel
  artifacts:
    paths:
      - dist/

test:
  stage: test
  script:
    - pip install -r requirements.txt
    - pip install -e .
    - pytest

deploy:
  stage: deploy
  script:
    - pip install twine
    - twine upload dist/*
  only:
    - tags

This defines a three-stage pipeline: build, test and deploy. In the build stage, pip installs the dependencies and builds the package. The test stage installs the dependencies again and runs the tests. Finally, the deploy stage uses twine (uploaded by pip) to publish the package to PyPI.

By using pip to manage dependencies in your CI/CD pipeline, you ensure that your application is built and tested with the same packages that it will use in production.

Dependency resolution

One of pip‘s most important jobs is resolving dependencies. Many packages depend on other packages, which may in turn have their own dependencies. pip needs to figure out a set of package versions that satisfy all the constraints.

This is actually a really hard problem, especially when you have conflicting constraints. Let‘s say package A requires package X version 1.x, while package B requires package X version 2.x. There‘s no version of X that can satisfy both those constraints simultaneously.

pip‘s current dependency resolution algorithm is pretty naive. It essentially resolves dependencies in a single pass, choosing the first version that satisfies the constraints without any backtracking. This means it can fail to find a solution even when one exists. It also means that installation order matters – depending on which packages are installed first, you can end up with different final sets of packages.

However, pip is currently developing a new resolver that promises to address these limitations. The new resolver will use backtracking to exhaustively explore the solution space and ensure it always finds a solution if one exists. It will also ensure deterministic results regardless of installation order.

You can read more about the new resolver and follow its development here: https://pip.pypa.io/en/latest/topics/dependency-resolution/

Staying secure with pip

When you‘re installing packages from PyPI, you‘re essentially running code written by strangers on your machine. While the vast majority of packages are created with good intentions, there‘s always the potential for malware or unintended vulnerabilities.

Here are a few best practices to stay secure when using pip:

  • Only install packages from trusted sources, primarily PyPI
  • Pin your dependencies to specific versions in your requirements file
  • Periodically audit your dependencies for known security vulnerabilities
  • Use virtual environments to isolate your projects
  • Be cautious when installing packages globally that require root access

There are a couple tools that can help you audit your Python dependencies for known vulnerabilities:

pip-audit is a new experimental feature in pip that scans your installed packages and compares them against a database of known vulnerabilities. To use it:

pip install pip-audit
pip audit

safety is another tool that checks your installed packages against a curated database of insecure packages. To use it:

pip install safety
safety check

Both these tools are helpful for identifying potential issues, but they‘re not foolproof. It‘s still important to keep your dependencies up-to-date and pay attention to security announcements from the packages you use.

Staying up-to-date

The Python packaging ecosystem is constantly evolving, with new versions of pip and PyPI being released regularly. To stay informed about the latest developments, here are a few resources to follow:

Conclusion

In this definitive guide, we‘ve taken a comprehensive look at pip, the standard package manager for Python. We‘ve learned how to:

  • Install and update pip
  • Install, upgrade and uninstall packages
  • Manage binary and source distributions
  • Use requirements files
  • Create and use virtual environments
  • Configure pip settings
  • Integrate pip into CI/CD pipelines
  • Think about dependency resolution challenges
  • Audit packages for security vulnerabilities
  • Stay up-to-date with the Python packaging ecosystem

We‘ve seen how pip is an indispensable tool for Python developers, enabling us to easily tap into the vast ecosystem of third-party packages. A single pip install command is the gateway to hundreds of thousands of powerful libraries that can supercharge your Python projects.

At the same time, we‘ve seen how the scale and complexity of the packaging ecosystem brings challenges. Dependency conflicts, security risks, and rapidly evolving tools are all part of the landscape.

But equipped with the knowledge from this guide and a commitment to following best practices, you‘re now prepared to navigate this ecosystem effectively. You can leverage the power of pip and PyPI while avoiding the common pitfalls.

So go forth and pip install with confidence! The world of Python packaging awaits.

Similar Posts