How to use the GPU Nodes on SeaWulf

Audience: Faculty, Postdocs, Researchers, Staff and Students

This KB Article References: High Performance Computing
This Information is Intended for: Faculty, Postdocs, Researchers, Staff, Students
Last Updated: January 03, 2024
Average Rating: Not Rated
Your feedback is important to us, help us by logging in to rate this article and provide feedback.

SeaWulf has 8 nodes containing 4 Tesla K80 GPUs each.  One node with 2 Tesla P100 GPUs and one node with 2 Tesla V100 GPUs are also available. In addition, 11 nodes each with 4 nvidia a100 GPUs are available via the milan login nodes. Please note that there are no GPUs available on the login nodes, since they are not meant for computational workloads. So, for example, running the command nvidia-smi on the login node will produce an error. An interactive job or an sbatch script is required to access the GPU nodes.

 

To access the GPU nodes, you can submit to the GPU queue using the SLURM workload manager.

module load slurm/17.11.12
sbatch [...]

You can open an interactive shell onto a GPU node with the following:

srun -J [job_name] -N 1 -p gpu --ntasks-per-node=28 --pty bash

 

Note: If you are using the a100 queues, you will not get any GPU allocations by default - you must request them explicitly. For example, to request an interactive session with one GPU:

srun -J [job_name] -N 1 -p a100 --gpus=1 --pty bash

Similarly, in a slurm job script, you would have to add the --gpus flag:

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH -p a100
#SBATCH --gpus=1
...

 

If you want to use CUDA to take advantage of GPU acceleration, you will need to load the modules, then compile with NVCC:

module load cuda113/toolkit/11.3 (or) cuda120/toolkit/12.0
nvcc INFILE -o OUTFILE

cuda113/toolkit/11.3 can be used on the GPU, P100, and V100 nodes, while cuda120/toolkit/12.0 can be used on the A100 nodes.

In the above "INFILE" is meant to be an input file with code that is going to be compiled.  And "OUTFILE" is the name of the binary that will be produced. 

For a sample CUDA program, see:

 /gpfs/projects/samples/cuda/test.cu

 

The GPU queues have the following attributes:

Queue

Default run time

Max run time

Max # of nodes

gpu 1 hour 8 hours 2
gpu-long 8 hours 48 hours 1
gpu-large 1 hour 8 hours 4
p100 1 hour 24 hours 1
v100 1 hour 24 hours 1
a100 1 hour 8 hours 2
a100-long 8 hours 48 hours 1
a100-large 1 hour 8 hours 4

Submit a ticket

Additional Information


There are no additional resources available for this article.

Provide Feedback


Your feedback is important to us, help us by logging in to rate this article and provide feedback.

Sign in with NetID

Getting Help


The Division of Information Technology provides support on all of our services. If you require assistance please submit a support ticket through the IT Service Management system.

Submit A Quick Ticket