Python
Versions
The default version of Python on Rocky Linux 8 is Python 3.6. Python 3.6.0 was released in 2016 and is now considered "end-of-life" by the Python community.
There are two modules you can load on Palmetto to access a newer Python version:
anaconda3 and miniforge3.
| Module Version | Python Version |
|---|---|
anaconda3/2023.09-0 | 3.11.5 |
miniforge3/24.3.0-0 | 3.10.14 |
Loading either of these modules will allow you to create Python virtual environments or Conda environments.
Python package installation, virtual environment or Conda environment creation should not be done on the login nodes.
Python Virtual Environments
Virtual environments (venv) allow you to create independent sets of Python packages.
A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the base environment, so only those explicitly installed in the virtual environment are available.
Source: venv documentation
With virtual environments, you do not get to choose a specific version of Python. If you need to create environments with a specific Python version, use Conda environments.
Create a virtual environment
python -m venv <environment name>
Replace <environment name> with a name of your choosing. This will create a
directory with the same name as the environment (so no spaces).
Virtual environment activation
source <environment name>/bin/activate
This command assumes the environment is in the current working directory. Otherwise, you will need to specify the full path.
Once the environment is activated, you can move on to installing packages.
Conda Environments
Conda provides a package management system for more than just Python. Conda environments provide the same isolation of packages as virtual environments with some added features.
Certain packages you wish to install are only available for installation with conda.
Conda utilizes
channels
which are where packages are stored. Some of these channels (like the default
anaconda channel) may have some restrictions for use. The miniforge3 module
on Palmetto by default installs packages from the conda-forge channel, which
is a community project that is not subject to license restrictions.
Other than conda-forge, other channels like bioconda exist as a repository
for python packages related to bioinformatics.
Conda environment creation
conda create -n <environment name>
The above command will create an empty environment in your home directory. By
default, conda environments are stored in ~/.conda.
During the environment creation, you can choose to install Python. Without specifying a version, conda will install the latest Python available. You can specify a specific Python version you might need in the environment.
conda create -n <environment name> python=3.9
Conda environment activation
source activate <environment name>
Conda environments are activated with "source activate". Once the environment is activated, you can choose to install packages with pip or conda.
Python package installation with pip
pip is the builtin Python package management system. pip can install
packages in either virtual environments or conda environments.
pip installs packages from the Python Package Index, also
known as PyPI.
pip install some-package-name
You can install a specific package version by adding ==<version> to the
installation command:
pip install some-package-name==0.1.0
Python package installation with conda
The conda command can install packages from a channel into a conda
environment only.
conda install some-package-name
You can install a specific package version by adding =<version> to the
installation command:
conda install some-package-name=0.1.0
Conda will download the conda package once in your home directory. If you have
multiple conda environments using the same version of a package, this means that
the data is not downloaded multiple times and can save space in your home
directory. Installing packages with pip in a conda environment will not save
space like conda will.
Bioconda
Bioconda is a repository of bioinformatics software made available via the conda package manager. RCD maintains a local mirror of Bioconda packages for Palmetto.
When using the anaconda3 or miniforge3 modules, installing Bioconda packages
will automatically pull from the RCD mirror.
Installing Bioconda packages
To install a package in a conda environment (e.g. samtools) the following
commands are both valid:
conda install -c bioconda samtools
conda install bioconda::samtools
Configure conda to use Bioconda automatically
If you would like the Bioconda channel to be searched automatically, you can add
the channel to your ~/.condarc file:
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict