Skip to main content

Using Pip in a Conda Environment

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages it did not install. Running conda after pip has the potential to overwrite and potentially break packages installed via pip. Similarly, pip may upgrade or remove a package which a conda-installed package requires. In some cases these breakages are cosmetic, where a few files are present that should have been removed, but in other cases the environment may evolve into an unusable state.
There are a few steps which can be used to avoid broken environments when using conda and pip together. One surefire method is to only use conda packages. If software is needed which is not available as a conda package, conda build can be used to create packages for said software. For projects available on PyPI, the conda skeleton command (which is part of conda-build) frequently produces a recipe which can be used create a conda package with little or no modifications.
Creating conda packages for all additional software needed is a reliably safe method for putting together a data science environment but can be a burden if the environment involves a large number of packages which are only available on PyPI. In these cases, using pip only after all other requirements have been installed via conda is the safest practice. Additionally, pip should be run with the “–upgrade-strategy only-if-needed” argument to prevent packages installed via conda from being upgraded unnecessarily. This is the default when running pip but it should not be changed.
If there is an expectation to install software using pip along-side conda packages it is a good practice to do this installation into a purpose-built conda environment to protect other environments from any modifications that pip might make. Conda environments are isolated from each other and allow different versions of packages to be installed. In conda environments, hard links are used when possible rather than copying files to save space. If a similar set of packages are installed, each new conda environment will require only a small amount of additional disk space. Many users rely on simply the “root” conda environment that is created by installing either Anaconda or Miniconda. If this environment becomes cluttered with a mix of pip and conda installs, it is much harder to recover. On the other hand, creating separate conda environments allows you to delete and recreate environments readily, without risking your core conda functionality.
Once pip is used to install software into a conda environment, conda will be unaware of these changes and may make modifications that would break the environment. Rather than running conda, pip and then conda again, a more reliable method is to create a new environment with the combined conda requirements and then run pip. This new environment can be tested before removing the old one. Again, it is primarily the “statefulness” of pip that causes problems – the more state that exists because of the order of installation of packages, the harder it will be to keep things working.
For environments that will be recreated often, it is a good practice to store the conda and pip package requirements in text files. Package requirements can be provided to conda via the –file argument and pip via the -r or –requirement. A single file containing both conda and pip requirements can be exported or provided to the conda env command to control an environment. Both of these methods have the benefit that the files describing the environment can be checked into a version control system and shared with others.
In summary, when combining conda and pip, it is best to use an isolated conda environment. Only after conda has been used to install as many packages as possible should pip be used to install any remaining software. If modifications are needed to the environment, it is best to create a new environment rather than running conda after pip. When appropriate conda and pip requirements should be stored in text files.
We at Anaconda are keenly aware of the difficulties in combining pip and conda. We want the process of setting up data science environments to be as easy as possible. That is why we have been adding new features to the next version of conda to simplify this process. While still in beta, conda 4.6.0 allows conda to consider pip installed packages and either replace these packages as needed or fulfill dependencies with the existing package. We are still testing these new features but expect the interactions between conda and pip to be greatly improved in the near future.

Best Practices Checklist

Use pip only after conda
  • install as many requirements as possible with conda, then use pip
  • pip should be run with –upgrade-strategy only-if-needed (the default)
  • Do use pip with the –user argument, avoid all “users” installs
Use conda environments for isolation
  • create a conda environment to isolate any changes pip makes
  • environments take up little space thanks to hard links
  • care should be taken to avoid running pip in the “root” environment
Recreate the environment if changes are needed
  • once pip has been used conda will be unaware of the changes
  • to install additional conda packages it is best to recreate the environment
Store conda and pip requirements in text files
  • package requirements can be passed to conda via the –file argument
  • pip accepts a list of Python packages with -r or –requirements
  • conda env will export or create environments based on a file with conda and pip requirements

Comments

Popular posts from this blog

Keep Calm and Hack The Box - Devel

Hack The Box (HTB) is an online platform allowing you to test your penetration testing skills. It contains several challenges that are constantly updated. Some of them simulating real world scenarios and some of them leaning more towards a CTF style of challenge. Note . Only write-ups of retired HTB machines are allowed. Devel is described as a relatively simple box that demonstrates the security risks associated with some default program configurations. It is a beginner-level machine which can be completed using publicly available exploits. We will use the following tools to pawn the box on a Kali Linux box nmap zenmap searchsploit metasploit msfvenom Step 1 - Scanning the network The first step before exploiting a machine is to do a little bit of scanning and reconnaissance. This is one of the most important parts as it will determine what you can try to exploit afterwards. It is always better to spend more time on that phase to get as much information as po...

Unity 3D – Create a Reusable UI System

Create a UI System in Unity that is completely Re-usable using C# What you’ll learn You will learn how to build a reusable UI System in C# and Unity 2017 You will understand how to create reusable animations with Mechanim Understand why and how to set up a good UI grouping structure Requirements You should be comfirtable creating Unity UI Elements You must have some experience with Unity and Unity C# A desire to look at ways in which to make your code more reusable Description Have you had some experience with setting UI’s inside of Unity? Would you like to know how to make a set of scripts that you can re-use between each project? Then this course is for you!  Throughout each of the lectures we focus on how we can build a re-usable UI System. One in which you can extend and modify to your project needs. We will walk through step by step and build up the components to make a fully modular UI system for Unity 2017!  By the end you will have a basic r...

An Important Update from the DataCamp Board of Directors

Update on April 30, 2019: Jonathan Cornelissen will be stepping down as the chairperson of DataCamp’s Board of Directors, in addition to stepping down from his role as CEO of DataCamp for an indefinite leave of absence without pay, effective May 1, 2019. Mr. Cornelissen will also be recused from the independent third party review described below and any decisions relating to his future role at DataCamp. To Our Community Update on April 30, 2019: Jonathan Cornelissen will be stepping down as the chairperson of DataCamp’s Board of Directors, in addition to stepping down from his role as CEO of DataCamp for an indefinite leave of absence without pay, effective May 1, 2019. Mr. Cornelissen will also be recused from the independent third party review described below and any decisions relating to his future role at DataCamp. To Our Community As the DataCamp Board of Directors, we want to assure our community we are taking the incident from October 2017, involving ou...