Python Language

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.

Due to Python’s extensive third-party libraries such as NumPy, SciPy and many others, make Python a programming language for scientific computing. Python servers as an important tool for several areas in science and engineering and packages for plotting offer publication-ready plotting capabilities.

Among external libraries NumPy provides the foundation for numerical operation, in particular operations related to vectors and matrices. SciPy extends NumPy to include special functions, numerical integration and other scientific general purpose operations. Matplotlib is the basic library for plotting in 2D and some functions for 3D plots.

Several specialized packages have been written such as Biopython and Astropy providing domain-specific functionality in Astronomy and Biology. Python is commonly used in artificial intelligence projects with the help of libraries like TensorFlow, Keras and Scikit-learn. As a scripting language with modular architecture, simple syntax and rich text processing tools, Python is often used for natural language processing.

Both clusters, Spruce and Thorny offer a variety of Python implementations and versions, including the reference Python implementation written in C and called CPython, the MKL optimized Intel Python Distribution and PyPy, a fast, compliant alternative implementation of the Python thanks to its Just-in-Time compiler.

Accessing the Python interpreter

The Python interpreter is the basic command that allows you to interact with Python interactively and is the same command used for scripting. There are two versions of Python. Python 2.x is now deprecated but still used in some cases. Python 3.x is the current version and has been around 2008 more than a decade now. You should always prefer Python 3.x over Python 2.x for scientific applications. From January 1, 2020, the 2.x branch of the Python programming language is no longer supported by the Python Software Foundation. Many packages have been abandoning compatibility with Python 2.x and as such there is no much reason to continue using it for scientific applications.

There are four ways of accessing the different Python installations on our clusters: Using a Python version installed by default from the OS repositories, using an environment module, creating a conda environment or using a singularity container that provides Python inside. We will explore those options in the contexts of our two clusters, Spruce Knob and Thorny Flat.

Spruce Knob

The Operating System on Spruce comes with Python 2.6.6 preinstalled:

$ python --version
Python 2.6.6

No scientific packages are installed along with this version of Python making it unsuitable for most scientific purposes.

On Spruce there is a variety of environment modules for Python. These modules include a good variety of scientific computing packages preinstalled for the corresponding version of Python. The modules available on Spruce are:

lang/python/cpython_2.7.15_gcc82
lang/python/cpython_3.6.9_gcc82
lang/python/cpython_3.7.4_gcc82
lang/python/intelpython_2.7.16
lang/python/intelpython_3.6.3
lang/python/intelpython_3.6.9
lang/python/pypy2.7-7.1.1-portable
lang/python/pypy3.6-7.1.1-portable

The modules with suffix _gcc82 where compiled with GCC 8.2.0 so the module lang/gcc/8.2.0 must be preloaded to activate the corresponding Python module. For example to load lang/python/cpython_3.7.4_gcc82 you should execute the following command to activate Python:

module load lang/gcc/8.2.0 lang/python/cpython_3.7.4_gcc82

There are basically three implementations of Python to choose. The versions with prefix cpython_ are compilations of the reference implementation written in C and Python from https://www.python.org/downloads also called CPython.

The modules with prefix intelpython_ are the Intel Distribution for Python. They Intel-optimized versions of the reference CPython using MKL and some other optimized libraries for use in Intel processors. In particular Intel Python includes MKL optimized versions of Numpy, Scipy and Scikit-Learn.

The modules with prefix pypy offers an alternative implementation of the Python programming language. PyPy often runs faster than CPython, because PyPy is a just-in-time compiler, while CPython is uses a more traditional approach. Most Python code runs well on PyPy, except for code that depends on CPython extensions, which either does not work or incurs some overhead when run in PyPy.

Thorny Flat

The Operating System on Thorny Flat comes with Python 2.7.5 preinstalled:

$ python --version
  Python 2.7.5

No scientific packages are installed along with this version of Python making it unsuitable for most scientific purposes.

On Thorny there are a several environment modules for Python. These modules include a good variety of scientific computing packages preinstalled for the corresponding version of Python. The modules available on Thorny Flat are:

lang/python/cpython_3.10.5_gcc112
lang/python/cpython_3.10.5_gcc93
lang/python/cpython_3.8.13_gcc112
lang/python/cpython_3.8.13_gcc93
lang/python/cpython_3.9.13_gcc112
lang/python/cpython_3.9.13_gcc93
lang/python/intelpython2_2019.5
lang/python/intelpython_2.7.16
lang/python/intelpython_3.9
lang/python/pypy2.7-v7.3.9-linux64
lang/python/pypy3.9-v7.3.9-linux64

Packages installed with CPython modules

In the particular case of CPython modules a number of scientific packages were included. The following table shows the list and version of the packages included on the CPython modules.

Package

Version

appdirs

1.4.4

argon2-cffi

20.1.0

asv

0.4.2

async-generator

1.10

atomicwrites

1.4.0

attrs

20.3.0

backcall

0.2.0

bleach

3.3.0

cached-property

1.5.2

cffi

1.14.5

cloudpickle

1.6.0

cycler

0.10.0

Cython

0.29.22

dask

2021.3.0

decorator

4.4.2

defusedxml

0.7.1

distlib

0.3.1

entrypoints

0.3

filelock

3.0.12

h5py

3.1.0

imageio

2.9.0

importlib-metadata

3.7.3

importlib-resources

5.1.2

iniconfig

1.1.1

ipykernel

5.5.0

ipyparallel

6.3.0

ipython

7.16.1

ipython-genutils

0.2.0

ipywidgets

7.6.3

jedi

0.18.0

Jinja2

2.11.3

joblib

1.0.1

joblib

1.0.1

jsonschema

3.2.0

jupyter

1.0.0

jupyter-client

6.1.12

jupyter-console

6.4.0

jupyter-core

4.7.1

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

kiwisolver

1.3.1

MarkupSafe

1.1.1

matplotlib

3.3.4

mistune

0.8.4

more-itertools

8.7.0

mpmath

1.2.1

nbclient

0.5.3

nbconvert

6.0.7

nbformat

5.1.2

nest-asyncio

1.5.1

networkx

2.5

notebook

6.3.0

numpy

1.19.5

packaging

20.9

pandas

1.1.5

pandocfilters

1.4.3

parso

0.8.1

pexpect

4.8.0

pickleshare

0.7.5

Pillow

8.1.2

pip

21.0.1

pluggy

0.13.1

prometheus-client

0.9.0

prompt-toolkit

3.0.18

ptyprocess

0.7.0

ptyprocess

0.7.0

py

1.10.0

pycparser

2.20

Pygments

2.8.1

pymongo

3.11.3

pyparsing

2.4.7

pyrsistent

0.17.3

pytest

6.2.2

python-dateutil

2.8.1

pytz

2021.1

PyWavelets

1.1.1

PyYAML

5.4.1

pyzmq

22.0.3

qtconsole

5.0.3

QtPy

1.9.0

scikit-image

0.17.2

scikit-learn

0.24.1

scipy

1.5.4

seaborn

0.11.1

Send2Trash

1.5.0

setuptools

54.2.0

six

1.15.0

sympy

1.7.1

terminado

0.9.3

testpath

0.4.4

threadpoolctl

2.1.0

tifffile

2020.9.3

toml

0.10.2

toolz

0.11.1

tornado

6.1

traitlets

4.3.3

typing-extensions

3.7.4.3

virtualenv

20.4.3

wcwidth

0.2.5

webencodings

0.5.1

widgetsnbextension

3.5.1

xlrd

2.0.1

zipp

3.4.1

The modules for Pypy and Intel Python include their own list of preinstalled packages.

Another alternative to get Python is creating a conda environment. Load conda with the command:

source /shared/software/conda/etc/profile.d/conda.sh

This will activate the command conda and you can create conda environments for the version of Python of your choice. This is particularly useful if you want a very specific version of Python, as new as 3.9.2 or as old as 2.7.13. You can search for all the versions available with:

conda search python

Or including specific channels with:

conda search -c intel python
conda search -c conda-forge python

Both intel and conda-forge are popular channels for general purpose scientific packages.

For example to create a conda environment called python392 installing insider Python version 3.9.2 execute:

$> conda create -n python392 python==3.9.2

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /users/gufranco/.conda/envs/python392

  added / updated specs:
    - python==3.9.2


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.1.19  |       h06a4308_1         118 KB
    certifi-2020.12.5          |   py39h06a4308_0         140 KB
    openssl-1.1.1j             |       h27cfd23_0         2.5 MB
    pip-21.0.1                 |   py39h06a4308_0         1.8 MB
    python-3.9.2               |       hdb3f193_0        18.2 MB
    setuptools-52.0.0          |   py39h06a4308_0         724 KB
    sqlite-3.35.2              |       hdfb4753_0         983 KB
    tzdata-2020f               |       h52ac0ba_0         113 KB
    ------------------------------------------------------------
                                           Total:        24.5 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  ca-certificates    pkgs/main/linux-64::ca-certificates-2021.1.19-h06a4308_1
  certifi            pkgs/main/linux-64::certifi-2020.12.5-py39h06a4308_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7

  libffi             pkgs/main/linux-64::libffi-3.3-he6710b0_2
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  ncurses            pkgs/main/linux-64::ncurses-6.2-he6710b0_1
  openssl            pkgs/main/linux-64::openssl-1.1.1j-h27cfd23_0
  pip                pkgs/main/linux-64::pip-21.0.1-py39h06a4308_0
  python             pkgs/main/linux-64::python-3.9.2-hdb3f193_0
  readline           pkgs/main/linux-64::readline-8.1-h27cfd23_0
  setuptools         pkgs/main/linux-64::setuptools-52.0.0-py39h06a4308_0
  sqlite             pkgs/main/linux-64::sqlite-3.35.2-hdfb4753_0
  tk                 pkgs/main/linux-64::tk-8.6.10-hbc83047_0
  tzdata             pkgs/main/noarch::tzdata-2020f-h52ac0ba_0
  wheel              pkgs/main/noarch::wheel-0.36.2-pyhd3eb1b0_0
  xz                 pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3

Proceed ([y]/n)? y

Downloading and Extracting Packages
python-3.9.2         | 18.2 MB   | ###################################################### | 100%
tzdata-2020f         | 113 KB    | ###################################################### | 100%
setuptools-52.0.0    | 724 KB    | ###################################################### | 100%
pip-21.0.1           | 1.8 MB    | ###################################################### | 100%
openssl-1.1.1j       | 2.5 MB    | ###################################################### | 100%
sqlite-3.35.2        | 983 KB    | ###################################################### | 100%
certifi-2020.12.5    | 140 KB    | ###################################################### | 100%
ca-certificates-2021 | 118 KB    | ###################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate python392
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Finally, all that you have to do is activate the environment with:

conda activate python392

And deactivate the environment with:

conda deactivate

Installing python packages with pip

The package pip is a popular way of installing python packages. You cannot install packages on system-wide locations, but you still can install them on your $HOME folder for personal use.

To install Python modules locally (within your user $HOME directory) is by adding --user as argument for the command pip install. It is important to notice that in cases where you have Python 2.x and Python 3.x you need to pay attention to which pip command to use. The command pip usually refers to Python 2.x and there is an equivalent command pip3 for Python 3.x. As we are using Python 3.x, pip3 is the command that we will be using.

There are two ways of using the command pip3. One is calling the command directly:

pip3 install --user <package_name>

Another is using pip indirectly as a module:

python3 -m pip install --user <package_name>

The --user flag directs python to install the package in a user location rather than a system-wide location where you are not allow to alter files.

The user location for python packages is $HOME/.local/lib/pythonX.Y/site-packages, This is generally the preferred method of locally installing new python packages. There is no disadvantage on using a user installation other that the package is only available to you.

Installing python packages with a custom prefix

Another way to install Python modules locally is by using the --target flag:

pip3 install --target <dir> <package_name>

representing the directory location you want the package installed into. These flags essentially do the same thing by directing Python to install the module in the specified directory. These directories will not be searched by default with Python. Therefore, in order to use these modules in your Python scripts you will have to modify the $PYTHONPATH environment variable to include the specified directory. Or alternatively, modify sys.path from within your python script (for this method, consult python documentation.

export PYTHONPATH=<dir>

Using Python virtualenv

The installing with pip install --user or pip install --target <dir> those locations are all searched secondary to the system-wide site packages.

This is could be an issue if you are trying to install locally a different version of a module already installed system-wide. A way to get around this is by using Python Virtual Environments.

Python virtual environments are used to build completely isolated python workflows. Primarily they are used to solve the need for multiple versions within python modules. Often, you might have the need to use pkgA which needs pkgC version 1.24, but you also need pkgB which needs pkgC version 2.1. If you use setuptools to install the packages (i.e. pip or easy_install), you will create a dependency issue since both versions of pkbC will be installed to the same location.

To resolve this, you can create python virtual environments that all isolation of package dependencies, so you can successfully have different versions of packages installed and tied to separate python interpreters. Setting up python virtual environments is easy, and using them is no different than using python it’s self.

Using Virtual Environments with python2

First, load which version of python2 you would like to use as your base python interpreter. For instance, if you want python 2.7.10, then load the 2.7.10 python modefule. If you want to use the default system python (v. 2.6), then you do not need to load a python modulefile. However, you do need to load the virtualenv modulefile:

module load lang/gcc/8.2.0 lang/python/2.7.15_gcc82

Then create a virtualenv directory with the ‘virtualenv’ command:

virtualenv workflow1

You should now have a directory called ‘workflow1’. You can use whatever name you want for the virtualenv, so long as you remember what directory corresponds with what environment. You now need to simply activate the virtualenv:

source workflow1/bin/activate

Your command prompt will now be pre-emptied by (workflow1) to remind you that you have an activate virtualenv. You can now proceed to use python, pip, and easy_install just as you would regularly.

Using Virtual Environments with python3

First, load the python3 modulefile. The python3 modulefile comes with it’s own virtual environment utility, so you do not need to load the virtualenv modulefile:

module load lang/gcc/8.2.0 lang/python/cpython_3.7.4_gcc82

Then create a virtualenv directry with the ‘pyvenv’ command:

pyvenv workflow1

You should now have a directory called ‘workflow1’. You can use whatever name you want for the virtualenv, so long as you remember what directory corresponds with what environment. You now need to simply activate the virtualenv:

source workflow1/bin/activate

Your command prompt will now be pre-emptied by (workflow1) to remind you that you have an activate virtualenv. You can now proceed to use python, pip, and easy_install just as you would regularly.

Activating virtual environments using the C shell

If you are using the shells csh or tcsh, you will not be able to source the ‘activate’ file. Instead, you need to source the activate.csh file.

source workflow1/bin/activate.csh

Using site-wide system packages

The centrally installed python interpreters (python loaded with modulefiles), have some common scientific packages installed with them by default. To have your virtualenv keep using these packages you do not need to install them in your virtualenv, using the –system-site-packages option.

virtualenv --system-site-packages

or

pyvenv --systems-site-packages