Installing software packages locally with Anaconda

Anaconda is a popular open-source platform designed for managing and deploying software and environments, particularly for data science and machine learning applications. It simplifies package management and deployment by providing a convenient way to install, update, and manage software packages and their dependencies. Anaconda is used to create isolated environments, which helps avoid conflicts between different software versions and ensures that projects have the specific libraries they need.

This KB Article References: High Performance Computing
This Information is Intended for: Instructors, Researchers, Staff, Students
Created: 10/23/2017 Last Updated: 07/03/2024

Installing Software on SeaWulf

When you need software that isn't currently available on SeaWulf, you have two main options:.

  1. Request Installation by HPC Support: If the software is widely used or could benefit multiple users, you can submit a ticket to the HPC support staff to request its installation. This approach is ideal for programs that might be of general interest.
  2. Install Locally in Your Home or Project Directory: Alternatively, you may install the program locally in your home or project directory. The easiest way to install many software packages is by using the Anaconda package manager.

     

Installing Software Locally with Anaconda

Anaconda is an open-source platform designed for managing and deploying software and environments. It provides an efficient way to handle package installations and manage dependencies.

 
Loading Anaconda

Before installing software, load the Anaconda module:

module load anaconda/3

 

Creating a Custom Anaconda Environment

To prevent conflicts with existing software, it's best to create a custom environment. You can create an environment either by specifying a name or a directory:

 

By Name:

conda create --name env-name

This creates the environment in:

/gpfs/home/NETID/.conda/envs/env-name

 

By Directory:

conda create --prefix /path-to-env/env-name

This creates the environment in:

/path-to-env/env-name

Note: You can't combine the --prefix and --name flags, you may only choose one.

 

Activating the Environment

Activate your newly created environment with:

conda activate /path-to-env/env-name

By doing this, the environmental variables  associated with your custom Anaconda environment (including the path to executable files) will become active.  

 

Installing Software Packages

With your environment active, you can install packages using conda install. For example, to install the scipy package, use:

conda install scipy

After installation, the package’s executable files will be placed in the bin directory within your environment. This directory is automatically added to your system's PATH, allowing you to run the executables directly from the command line.

Additionally, any libraries installed with Anaconda will be located in the lib directory of your environment. You can find these directories as follows:

  • Executable files: .../env-name/bin/
  • Libraries: .../env-name/lib/

These directories ensure that your environment remains self-contained and manageable, avoiding conflicts with other software on the system.

 

Deactivating the Environment

Once you’re finished, return to the default environment by typing:

conda deactivate

 

Managing Storage with Anaconda

Managing storage is crucial when working with Anaconda, especially if you encounter file system quota issues. Here’s how to handle and optimize storage within your Anaconda environment:

 
Understanding File System Quotas

SeaWulf enforces storage quotas to ensure equitable resource allocation among all users. Exceeding your quota may result in errors when attempting to install new packages or create environments. Therefore, it is crucial to regularly monitor your storage usage and manage your files accordingly.

For detailed information about the file system, refer to the SeaWulf File System Overview.

To keep track of your available disk space, use the following commands:

df -h /gpfs/home/$USER								# Check disk storage usage in your home directory:

df -hi /gpfs/home/$USER								# Check disk inode usage in your home directory:

du -ah /gpfs/home/$USER | sort -rh | head -n 20 	# Identify the 20 largest files in your home directory:

Additionally, you can use the following script to monitor both disk usage and file count:

/usr/lpp/mmfs/bin/mmlsquota --block-size auto -j ${USER}-home -v mmfs1

 

Cleaning Up Unused Packages and Environments

To free up space, you can remove unused packages and environments. Here’s how:

To remove unused packages

conda activate env-name 			 # Activate the environment you wish to clean
conda list 							 # Lists all installed packages
conda remove package-name 		     # Replace 'package-name' with the package you want to remove

To remove unused environments:

conda env list 			   			 # Lists all environments
conda env remove --name env-name     # Replace 'env-name' with the environment you want to remove
 
Managing Anaconda Cache

Anaconda maintains caches of packages and environments to speed up future installations. However, these caches can consume significant storage over time. To clean up these caches, use:

conda clean --all

This command removes unused packages, caches, and tarballs from your system, helping to free up space.

 

Managing pip Cache

If you use pip for package management alongside conda, it also maintains a cache that can consume disk space. To clear the pip cache, use:

pip cache purge

This command removes the cache directory where pip stores downloaded packages, freeing up additional space.

 

For more detailed guidance on managing storage with Anaconda and pip, refer to the conda documentation and pip documentation.

For More Information Contact


IACS Support System

Still Need Help? The best way to report your issue or make a request is by submitting a ticket.

Request Access or Report an Issue