Advanced Conda: Installing, Building, and Uploading Packages, Adding Channels, and More

As many of our readers might know, conda is a package manager for the numerical python stack that solves many of the issues where pip falls short. While pip is great for pure Python packages (ones written exclusively in Python code), most data science packages need to rely on C code for performance. This, unfortunately, makes installation highly system-dependent. Conda alleviates this problem by managing compiled binaries such that a user does not the need to have a full suite of local build tools (for example, building NumPy from source no longer requires a FORTRAN 77 compiler). Additionally, conda is moving to include more than Python. For example, it’s also supporting managing packages in the R language. With such an ambitious scope, it’s not surprising that package coverage is incomplete and, if you’re a power user, you’ll often see yourself wanting to contribute missing packages. Here’s a primer on how:

Installing Using Pip / PyPI in Conda as a Fallback

The first thing to notice is that you don’t necessarily need to jump to building or uploading packages. As a simple fallback, you can tell conda to build Python Packages directly from pip/PyPI. For example, take a look at this simple conda environment.yaml file:

# environment.yaml
dependencies:
- numpy
- scipy
- pip:
  - requests

This installs numpy and scipy from anaconda but installs requests using pip.  You can invoke it by running:

conda env update -f environment.yaml

Adding New Channels to Conda for more Packages

While the core maintainers has fairly good coverage, the coverage isn’t complete. Also, because conda packages are version and system dependent, the odds of a specific version not existing for your operating system is fairly high. For example, as of this writing, scrapy, a popular web scraping software, lives in the popular conda-forge channel (package). Similarly, many r packages are under the r channel, for example the r-dplyr package. R packages are, by convention, prefaced with a “r-” prefix to their CRAN name. You can find the channel that supports it by Googling “scrapy conda”. To install it, we’ll need to add conda-forge as a channel:

# environment.yaml
channels:
- conda-forge
dependencies:
- scrapy

Building PyPI Conda Packages

But sometimes, even with extra channels, the packages simply don’t exist (as evidenced by a Google query) . In this case, we just have to build our own packages. It’s fairly easy. The first step is to create a user account on anaconda.org. The username will be your channel name.

First, you’ll want to make sure you have conda build and anaconda client installed:

conda install conda-build
conda install anaconda-client

and you’ll want to authenticate your anaconda client with your new anaconda.org credentials:

anaconda login

Finally, it’s easiest to configure anaconda to automatically upload all successful builds to your channel:

conda config --set anaconda_upload yes

With this setup, it’s easy! Below are the instructions for uploading the pyinstrument package:

# download pypi data to ./pyinstrument
conda skeleton pypi pyinstrument
# build the package
conda build ./pyinstrument
# you'll see that the package is automatically uploaded

Building R Conda Packages

Finally, we’ll want to do this with R packages. Fortunately, there’s conda support for building from CRAN, R’s package manager. For example, glm2 is (surprisingly enough) not on anaconda.  We’ll run the following commands to build and auto upload it:

conda skeleton cran glm2
conda build ./r-glm2

Of course, glm2 now has OSX and Linux packages on The Data Incubator’s Anaconda channel at r-glm2 so you can directly include our channel:

# environment.yaml
channels:
- thedataincubator
dependencies:
- r-glm2

Related Blog Posts

Moving From Mechanical Engineering to Data Science

Moving From Mechanical Engineering to Data Science

Mechanical engineering and data science may appear vastly different on the surface. Mechanical engineers create physical machines, while data scientists deal with abstract concepts like algorithms and machine learning. Nonetheless, transitioning from mechanical engineering to data science is a feasible path, as explained in this blog.

Read More »
Data Engineering Project

What Does a Data Engineering Project Look Like?

It’s time to talk about the different data engineering projects you might work on as you enter the exciting world of data. You can add these projects to your portfolio and show the best ones to future employers. Remember, the world’s most successful engineers all started where you are now.

Read More »
open ai

AI Prompt Examples for Data Scientists to Use in 2023

Artificial intelligence (AI) isn’t going to steal your data scientist job! Instead, AI tools like ChatGPT can automate some of the more mundane tasks in your future career, saving you time and energy. To make life easier, here are some data science prompts to get you started.

Read More »