Index
1. What is Conda?
2. Scope of the project
3. Using Conda
4. Conda environments
5. Basic management operations
6. Using Mamba
7. Script examples
8. Best practices
What is Conda?
✓ Originally: Anaconda, a distribution of Python including common
scientific packages.
https://www.anaconda.com/
✓ Extended to include R and R packages, scientific libraries, other
software, etc.
✓ Conda is the core package manager for the Anaconda project.
✓ Well established within the statistics, bioinformatics, data science,
artificial intelligence communities, amongst others.
What is Conda?
✓ Conda installs and updates binary versions of Python and R packages
from its own (or third party) repositories.
✓ It is an alternative to other repository systems, like Pip for Python or
CRAN for R.
✓ It is also a convenient way to manage dependencies for Python and
R packages.
✓ Particularly useful for fields in which there are complex analysis
pipelineswhich require many different packages.
What is Conda?
✓ But Conda isn't...
- A repository of system software packages.
- A repository of source code.
- A replacement for environment modules.
- Perfect, infinite or infallible.
✓ Be mindful of its limitations.
✓ You can find full documentation at:
https://conda.io
Scope of the project
✓ Conda has grown over the past few years to encompass a large
interconnected ecosystem of software, including:
✓ Python and Python packages
✓ R and R packages
✓ IDEs: Jupyter, Spyder, Rstudio...
✓ Scientific packages: Numpy, SciPy, Pandas, Numba, Dask...
✓ Machine learning packages: Scikit-learn, TensorFlow, Theano...
✓ Data visualisation packages: Matplotlib, Bokeh, Datashader,
Holoviews...
✓ External libraries and tools: compilers, libraries, programs...
Scope of the project
✓ Channels are thematic collections of packages, equivalent to
repositories, which are useful to avoid version conflicts.
✓ Examples:
- main: default channel
- free: software with licences that restrict commercial use
- conda-forge: large collection of third-party packages
- bioconda: software for bioinformatics
- r: tailored to R users
✓ Companies and reference research groups often create their own
channel, to ensure that pre-compiled, compatible versions of certain
packages are portable between machines within the organisation.
How to use Conda
✓ All we need to do to use Conda on our cluster is to load an
environment module:
module load conda/current
✓ This will put the Conda executable in our path and will configure
everything else for us.
How to use Conda
✓ Then we use the command conda (+ action) to run it:
conda list
conda activate
conda deactivate
conda create
conda search
conda install
conda remove
conda update
conda clean
conda info
Conda environments
✓ Inside a given installation of Conda, there can be any number of
environments.
✓ Environments are like Python profiles: each will be an independent
ecosystem with a different list of packages and versions installed.
✓ On our machines, Conda is always used via private environments,
owned by a specific user.
✓ Users within the same group can easily share their environments with
each other through a simple configuration step.
Conda environments
✓ To see a list of environments: conda env list
✓ To load an environment: conda activate <env_name>
✓ To unload: conda deactivate
Conda environments
✓ To see the contents of an env: conda list [-n env_name]
✓ If no environment is specified, actions typically apply to the currently
active one.
✓ Note: source activate and source deactivate are obsolete.
Basic management operations
✓ Your environments are stored at /scratch/$USER/.conda/envs
✓ To create a new empty environment: conda create -n <env_name>
✓ To create a new environment with packages already installed in it:
conda create –n <env_name> [list of packages]
✓ To install packages in an existing environment: conda install [-n
env_name] <list of packages>
✓ Version and channel can also be specified: conda install [-n
env_name] [-C channel] <package=version>
Basic management operations
✓ To update packages in an environment: conda update [-n
env_name] <packages> or conda update [-n env_name] --all
✓ To uninstall packages: conda remove [-n env_name] <packages>
✓ To completely delete an environment: conda env remove -n
<env_name>
✓ conda keeps track of modifications to each environment; you can see
a history with conda list [-n env_name] --revisions
Basic management operations
✓ Although not ideal, sometimes you will need software that isn't
packaged for Conda inside a Conda environment.
✓ It is possible to install Python packages using Pip, or R packages
using BioConductor or CRAN, within a Conda environment, but it
requires manually configuring a proxy.
✓ Those packages won't be managed by Conda, so be careful.
✓ Let us know if you need to do this so that we can set it up for you.
Using Mamba
Mamba is a reimplementation of Conda with fast execution in mind.
✓ It is rewritten in C++, with multithreading support for downloads,
installations and searches.
✓ It is fully integrated with Conda and can be used as an alternative on
an execution-by-execution basis.
Using Mamba
✓ To use it, simply replace conda with mamba in your command:
mamba list
mamba create
mamba search
mamba install
mamba remove
mamba update
mamba clean
mamba info
✓ Note: to activate or deactivate an environment, you should still use
conda activate/deactivate instead
Using Mamba
✓ Mamba will take the same options and arguments as Conda.
✓ The additional performance brought by Mamba will be most useful
when managing large numbers of packages at once, such as when:
- Installing many packages at once
- Updating a large environment
- Searching caches
- Solving dependency clashes
✓ Conda will still be a fallback for the rare functionalities not yet
implemented in Mamba.
Script examples: R
✓ Minimal example running R through Conda:
#!/bin/bash
#SBATCH –p std
#SBATCH –N 1
#SBATCH –n 1
#SBATCH –t 1:00:00
module load conda/current
conda activate my_R_environment
Rscript example.R > out.log
Best practices
✓ For software used across a group, it's typically more convenient to
designate one user as the owner and manager of an environment and
have everyone else access it.
✓ Avoid clutter; it's better to create multiple single-purpose environments
than one large environment with too many packages.
✓ Be mindful of version clashes when updating environments; if you
don't need to update, don't.
✓ When in doubt, contact us – we can do it for you.