Conda

Conda is an open source package management system and environment management system. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments. It was created for Python programs, but it can package and distribute software for any language.

Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager, because conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment.

There are two installers for conda, Anaconda and Miniconda.

Anaconda vs Minicoda

Anaconda is a downloadable, free, open source, high-performance and optimized Python and R distribution. Anaconda includes conda, conda-build, Python, and 100+ automatically installed, open source scientific packages and their dependencies that have been tested to work well together, including SciPy, NumPy and many others. Ananconda is more suited to be installed on a desktop environment as you get after installation a fairly complete environment for scientific computing.

From the other side Miniconda is free minimal installer for conda. Miniconda is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on and a small number of other useful packages, including pip, zlib and a few others. Miniconda is more suited for HPC environments where a minimal installation is all that is needed and users can create their own environments as needed.

Activating Conda on Spruce and Thorny Flat

On Spruce you can activate conda with:

source /shared/software/miniconda3/etc/profile.d/conda.sh

On Thorny Flat the command to activate conda is:

source /shared/software/conda/conda_init.sh

After activation your are positioned on the base environment.

Conda Environments

Conda allows you to create separate environments containing files, packages and their dependencies that will not interact with other environments.

When you begin using conda, you already have a default environment named base. You cannot install packages on the base environment. Yous should create new environments for installing packages. Try to keep separate environments for different packages or group of packages. That reduces the chances of incompatibility between them

Knowing which environments are available

By the time of writing this tutorial Spruce offers three environments centrally installed:

$> conda info --envs

# conda environments:
#
                         /scratch/gufranco/conda-intel
base                     /shared/software/miniconda3
genomics-core            /shared/software/miniconda3/envs/genomics-core
jupyter                  /shared/software/miniconda3/envs/jupyter
python27                 /shared/software/miniconda3/envs/python27
python35                 /shared/software/miniconda3/envs/python35
python36                 /shared/software/miniconda3/envs/python36
python37                 /shared/software/miniconda3/envs/python37
qiime2-2018.8            /shared/software/miniconda3/envs/qiime2-2018.8
snowflakes               /shared/software/miniconda3/envs/snowflakes
tpd0001                  /shared/software/miniconda3/envs/tpd0001

Activating an existing environment

Suppose that you want to use the environment called “tpd0001”, to achieve that execute:

conda activate tpd0001

Deactivating the current environment

The current environment can be deactivated with:

conda deactivate

If you are in the base environment, the deactivation will not have any effect. You are always at least on the base environment.

Create a new environment

We will name the environment snowflakes and install the package BioPython. At the Anaconda Prompt or in your terminal window, type the following:

conda create --name snowflakes

or if you want also to install a package you can execute:

conda create --name snowflakes biopython

Conda collects metadata about the package and its dependencies and produces an installation plan:

## Package Plan ##

environment location: /shared/software/conda/envs/snowflakes

added / updated specs:
  - biopython


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |             main           3 KB
    biopython-1.74             |   py37h7b6447c_0         2.0 MB
    blas-1.0                   |              mkl           6 KB
    ca-certificates-2019.8.28  |                0         132 KB
    certifi-2019.9.11          |           py37_0         154 KB
    intel-openmp-2019.4        |              243         729 KB
    libedit-3.1.20181209       |       hc058e9b_0         163 KB
    libffi-3.2.1               |       hd88cf55_4          40 KB
    libgcc-ng-9.1.0            |       hdf63c60_0         5.1 MB
    libgfortran-ng-7.3.0       |       hdf63c60_0        1006 KB
    libstdcxx-ng-9.1.0         |       hdf63c60_0         3.1 MB
    mkl-2019.4                 |              243       131.2 MB
    mkl-service-2.3.0          |   py37he904b0f_0         218 KB
    mkl_fft-1.0.14             |   py37ha843d7b_0         155 KB
    mkl_random-1.1.0           |   py37hd6b4f25_0         321 KB
    ncurses-6.1                |       he6710b0_1         777 KB
    numpy-1.17.2               |   py37haad9e8e_0           4 KB
    numpy-base-1.17.2          |   py37hde5b4d6_0         4.2 MB
    openssl-1.1.1d             |       h7b6447c_1         3.7 MB
    pip-19.2.3                 |           py37_0         1.9 MB
    python-3.7.4               |       h265db76_1        32.1 MB
    readline-7.0               |       h7b6447c_5         324 KB
    setuptools-41.2.0          |           py37_0         630 KB
    six-1.12.0                 |           py37_0          23 KB
    sqlite-3.29.0              |       h7b6447c_0         1.1 MB
    tk-8.6.8                   |       hbc83047_0         2.8 MB
    wheel-0.33.6               |           py37_0          40 KB
    xz-5.2.4                   |       h14c3975_4         283 KB
    zlib-1.2.11                |       h7b6447c_3         103 KB
    ------------------------------------------------------------
                                           Total:       192.2 MB

Conda asks if you want to proceed with the plan:

Proceed ([y]/n)? y
Type "y" and press Enter to proceed.

After that, conda, download and installs the packages creating a new environment for you. The final message shows how to activate and deactivate the environment:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate snowflakes
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Each environment is isolated from other conda environments and that allow you to keep several environments with different packages on them or different versions of the same packages. As the message shows you activate the environment with:

conda activate snowflakes

Notice that when you activate a new environment, the prompt changes adding a prefix in parenthesis to indicate on which conda environment are you using at that moment. To check the environments available execute:

conda env list

or:

conda info --envs

Conda and Python

When you create a new environment, conda installs the same Python version used to install conda on Spruce and Thorny (3.7). If you want to use a different version of Python, for example Python 2.7, simply create a new environment and specify the version of Python that you want:

conda create --name python27 python=2.7

You activate the environment:

conda activate python27

And verify the python version:

$ python --version
Python 2.7.16 :: Anaconda, Inc.

Conda has packages for versions of python for 2.7, 3.5, 3.6 and 3.7

Managing packages and channels

New packages can be installed to existing conda environments. First search for packages with:

conda search mkl

Packages are stored in repositories called channels. By default, conda search on the pkgs/main channel only. However, there are many other packages on several other channels.

The most prominent channels to search for packages are intel, conda-forge and bioconda To search for packages there execute:

conda search -c intel mkl

conda search -c conda-forge nitime

conda search -c bioconda blast

Packages can be installed on the current environment with:

conda install -c conda-forge nitime

In this case conda will pick the most recent version of the package compatible with the packages already present on the current environment. You can also be very selective on version and build that you want for the package. First get the list of versions and builds for the package that you want:

$ conda search -c intel mkl
Loading channels: done
# Name                       Version           Build  Channel
mkl                         2017.0.3         intel_6  intel
mkl                         2017.0.4      h4c4d0af_0  pkgs/main
mkl                         2018.0.0      hb491cac_4  pkgs/main
mkl                         2018.0.0         intel_4  intel
mkl                         2018.0.1      h19d6760_4  pkgs/main
mkl                         2018.0.1         intel_4  intel
mkl                         2018.0.2               1  pkgs/main
mkl                         2018.0.2         intel_1  intel
mkl                         2018.0.3               1  pkgs/main
mkl                         2018.0.3         intel_1  intel
mkl                           2019.0             117  pkgs/main
mkl                           2019.0             118  pkgs/main
mkl                           2019.0       intel_117  intel
mkl                           2019.1             144  pkgs/main
mkl                           2019.1       intel_144  intel
mkl                           2019.2       intel_187  intel
mkl                           2019.3             199  pkgs/main
mkl                           2019.3       intel_199  intel
mkl                           2019.4             243  pkgs/main
mkl                           2019.4       intel_243  intel
mkl                           2019.5       intel_281  intel

Now, install the package declaring the version and build:

$ conda install -c intel mkl=2019.4=intel_243
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /users/gufranco/.conda/envs/test

  added / updated specs:
    - mkl==2019.4=intel_243


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    intel-openmp-2019.5        |        intel_281         888 KB  intel
    mkl-2019.4                 |        intel_243       204.1 MB  intel
    tbb-2019.8                 |        intel_281         874 KB  intel
    ------------------------------------------------------------
                                           Total:       205.8 MB

The following NEW packages will be INSTALLED:

  intel-openmp       intel/linux-64::intel-openmp-2019.5-intel_281
  mkl                intel/linux-64::mkl-2019.4-intel_243
  tbb                intel/linux-64::tbb-2019.8-intel_281


Proceed ([y]/n)?


Downloading and Extracting Packages
tbb-2019.8           | 874 KB    | #################################################################################################################################### | 100%
mkl-2019.4           | 204.1 MB  | #################################################################################################################################### | 100%
intel-openmp-2019.5  | 888 KB    | #################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Creating a new environment from a YML file

You can create your own environment, one easy way of doing that is via a YML file that describes the channels and packages that you want on your environment. The YML file will look like this, for a simple case when you want one env for bowtie2 (bowtie2.yml)

name: spruce-bowtie2
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bowtie2

Another example is this YML file for installing a curated set of basic genomics codes that requires just a few dependencies. (biocore.yml)

name: biocode
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bamtools
  - bcftools
  - bedtools
  - hmmer
  - muscle
  - raxml
  - samtools
  - sga
  - soapdenovo-trans
  - soapdenovo2
  - sra-tools
  - vcftools
  - velvet

To create an environment from those YML files you can select one location on your scratch folder

conda env create -p $SCRATCH/bowtie2 -f bowtie2.yml

or for the biocore.yml

conda env create -p $SCRATCH/biocore -f biocore.yml

By default, new environments are created inside your $HOME folder on $HOME/.conda

Listing the packages inside one environment

Bowtie2 has a number of dependencies (19 dependencies for 1 package) Notice that only bowtie2 comes from bioconda channel. All other packages are part of conda-forge, a lower level channel.

$ conda activate $SCRATCH/bowtie2
$ conda list
# packages in environment at /scratch/gufranco/bowtie2:
#
# Name                    Version                   Build  Channel
bowtie2                   2.3.4.2          py36h2d50403_0    bioconda
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.8.24            ha4d7672_0    conda-forge
certifi                   2018.8.24                py36_1    conda-forge
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
ncurses                   6.1                  hfc679d8_1    conda-forge
openssl                   1.0.2p               h470a237_0    conda-forge
perl                      5.26.2               h470a237_0    conda-forge
pip                       18.0                     py36_1    conda-forge
python                    3.6.6                h5001a0f_0    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
setuptools                40.2.0                   py36_0    conda-forge
sqlite                    3.24.0               h2f33b56_1    conda-forge
tk                        8.6.8                         0    conda-forge
wheel                     0.31.1                   py36_1    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

Using a conda environment in a submission script

To execute software in a non-interactive job you need to source the main script, activate the environment that contains the software you need, execute the the scientific code and deactivate the environment. This is a simple example showing that for bowtie2

#!/bin/bash

#PBS -N MY_JOB
#PBS -q standby
#PBS -j oe
#PBS -l nodes=1:ppn=2

source /shared/software/miniconda3/etc/profile.d/conda.sh
conda activate $SCRATCH/bowtie2

bowtie2 .....

conda deactivate

Deleting a environment

To remove an environment you can just execute this command.

conda remove --all -p $SCRATCH/bowtie2

More documentation

Conda Documentation

[https://conda.io/docs/user-guide/tasks/manage-environments.html# Managing environments]

Using Bioconda — Bioconda documentation

Available packages — Bioconda documentation

Downloading Miniconda

Miniconda can be downloaded from:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

No installing anything

Just load the module:

module purge
module load genomics/qiime

This module will load python 2.7.3 and qiime on top of that