Getting Started Guide

Audience: Faculty, Postdocs, Researchers, Staff and Students

This KB Article References: High Performance Computing
This Information is Intended for: Faculty, Postdocs, Researchers, Staff, Students
Last Updated: July 14, 2020

Introduction

This guide assumes you have already received access to SeaWulf, and that you are able to log in.  It serves to get you acquainted with the environment you will be interacting with once on the system.

 

Basic Linux Commands

SeaWulf uses CentOS as its operating system, one of the many variants of Linux.  Unlike a desktop, you interact with this operating system through the terminal, sometimes referred to as the command line.  Windows and OS X both have their own version of the terminal, even though most users choose not to use them.  Here, the use of the terminal is mandatory, so it is important that you know your way around it.

mkdir

When you first log in you will arrive in your home directory.  This is your own private folder to store things related to your work.  You can make subdirectories, files, and even install software here.  Making a subdirectory is simple.  Use the mkdir command:

mkdir <directory name>

Here, <directory name> is the name you want to give the folder.

ls and pwd

After you have done this, you can use the ls command to verify that the directory has been created without issue.  Typing in ls will result in a list of files and subdirectories being printed back to you, all of which are located in your present working directory.  Your working directory is the command line equivalent of your current folder in Windows Explorer or Finder - it's the directory that you're currently looking at.  When you type the pwd command, your working directory will be printed out to you:

/gpfs/home/<my username>

The top-level directory, equivalent to C: on Windows, is always /gpfs.  The home subdirectory of /gpfs contains all users' home directories. 

cd

To change your present working directory, you can use the cd command, which stands for change directory.

cd <path>

You can change your directory using either an absolute or relative path.  An absolute path begins with a forward-slash and specifies each level of subdirectories, starting from the root folder (which contains gpfs).  A relative path does not start with a forward-slash, and fills in each subdirectory level up to the directory you're currently in.  For example, if you are in /gpfs/home/<your username> and want to move to a subdirectory in that folder, just give cd the subdirectory name.

touch

If instead of a folder you would rather create a blank text file, you can use the touch command:

touch <new filename>

You can then edit this file with a text editor of your choice (e.g., nano, vim, or emacs).

rm

If you want to delete a file or folder, you can use the rm command (short for remove).  This command will permanently delete anything you tell it to (no trash bin!).  You will pass this command different options, depending on what it is you want to remove.  For a regular file, you can choose not to pass it any options at all:

rm <file to remove>

However, if you want to remove an entire directory (even if it's empty), you will have to pass it the -r option (short for recursive):

rm -r <folder to remove>

This will remove everything in that directory, files and subdirectories included.  The recursive option is called such because it recursively deletes everything it finds.A word of warning - it is very easy to accidentally delete important information.  Be very careful when using this command.

Most of these commands have a help or -h option.  If you forget how to use a command, simply type that command followed by -h to get a description of it.

 

Modules

All of the commands described above are not programs, but functionality built into the shell.  The shell is the program you're interacting with whenever you type something into the terminal, and is always running.  In addition to these commands, the shell has a few helpful features, one of which is the existence of environment variables.  These are little bits of data that all programs can access, but which go away any time you log out.  Typically they are used for storing paths to directories so that programs know where to look for the files they need.

Yet another command is the env command, which lists all of your environment variables.  When you type this command, you will see something like this printed to your screen:

...
COLLECTION_DATA=/data/collection
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
rvm_path=/home/austin/.rvm
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/usr/share/upstart/xdg:/etc/xdg
rvm_prefix=/home/austin
...

Each line is an individual environment variable.  The name of the environment variable is in all caps (e.g.  COLLECTION_DATA), and its value to the right of the equals sign.

Dealing with defining these every time you log in is cumbersome, which is why we have installed a software package to simplify the process.  Using the module command (a program, this time), you can load and unload environment variables that you commonly need, depending on the software you use.  The module command has several subcommands that perform different functions.  The most common subcommands are:

module avail
module load <some module>
module list
module unload <some module>

The load subcommand will load a module.  This will make a certain software package callable from the terminal.  If, for example, you load the matlab/2018a module, you will be able to start MatLab 2018a by typing in the command matlab.

The list subcommand will show you a list of all the modules you have loaded since logging in.

The avail subcommand will list all of the modules that are available to be loaded.  When you first log in, only a limited selection of local modules will be displayed.  To view all of the software that is installed globally on Seawulf, you must first load the shared module with:

module load shared

Special requests can be made to install software globally (outside of a home directory) through the ticketing system and are reviewed for notoriety of the software in question.

If you accidentally load the wrong software package or want to switch to a different version of the same software, you should use the unload command to erase the environment variables associated with that software.  If, for example, you decide that MatLab 2018a is insufficient and want to switch to the 2019a release, you would first unload the matlab/2018a module, then load the matlab/2019a module.

Other subcommands exist.  To see a list of these subcommands and how to use them, type module help.

 

Slurm

Now that you know the basic ways of interacting with the cluster, the next step is to understand how to use it to run computational software.  SeaWulf has what is called the login node.  Each node on SeaWulf is an individual computer that is networked to all the other nodes, forming a computing cluster.  The login node is the entry point to the cluster, and only exists as an interface to use the other nodes.  Since the beginning of this guide you have been interacting with this node.  Because everybody will be on this node, it shouldn't be used for heavy computation - otherwise, the system would slow down and become unusable.  To actually run heavy computation, you will have to run your software on the compute nodes.

To manage demand, we have use a scheduling system called Slurm to grant you access to the compute nodes and run your job when nodes become available. All Slurm commands can only be used after loading its module:

module load slurm

Running an interactive job

Loading the Slurm module gives you access to several commands, one of which is srun.  There are several different ways to use this command.  To start off, we will begin an interactive job which asks for one compute node with 28 cores:

srun -N 1 -n 28 -p short-28core --pty bash

The --pty bash option indicates that we want to manually control a node through the terminal.  The -N flag specifies the number of nodes the job needs, and the -n flag specifies the number of cores per node.  The -p flag specifies which queue you want to wait in.  A list of queues and their resource limits can be found here. Slurm documentation uses the word "partition" instead of "queue"; our FAQ pages will use these terms interchangeably.

After running this command you will either be waiting in the short-28core queue or given a node immediately.  This depends on demand at the time.  You can use the squeue command to show a list of jobs and their status to estimate how long you may be waiting in the queue, if at all.

Once granted access, your terminal will be interacting with the compute node instead of the login node.  Here you can test software you have installed, as you are the only user on this node and have access to all its resources. 

To end the interactve job session and return to the login node, type exit.

Running an automated job with Slurm

Interactive jobs are good for testing your code or installed software, but should not be used for long running computational jobs since your job will end once you log off.  An automated job will run until finished, and with it you won't have to retype commands all the time.

To run an automated job with Slurm, you will need to write a job script.  A job script is a text file that contains all of the information needed to run your job. Your job script will contain special Slurm directives starting with #SBATCH that specify job options, like the number of nodes desired and the expected completion time.  It also communiates with Message Passing Interface (MPI) to enable programs to synchronize across nodes.  MPI is the standard method for communication across more than one node in a computer cluster.  In order to utilize multiple nodes for a single job, your software must be built with MPI, and you must use an MPI command (e.g., mpirun) when you execute your job.

Here is an example Slurm script:

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --ntasks-per-node=40
#SBATCH --nodes=2
#SBATCH --time=05:00
#SBATCH -p short-40core
#SBATCH --mail-type=BEGIN,END
#SBATCH --mail-user=jane.smith@stonybrook.edu

module load shared
module load intel/compiler/64/2017/17.0.0
module load intel/mkl/64/2017/0.098
module load intel/mpi/64/2017/0.098

cd /gpfs/projects/samples/intel_mpi_hello/
mpiicc mpi_hello.c -o intel_mpi_hello

mpirun ./intel_mpi_hello

The  --job-name option gives the job a name so that it can be easily found in the list of queued and running jobs.  The next three lines specify the file where output will be written, the number of CPUs per node, and the number of nodes to request. In addition, we've specified an expected wall time in the --time option.  The --mail-type and --mail-user options are not required but control whether the user should be notified via email when the job state changes (in this case when the job starts and finishes).  Emails will only be sent to "stonybrook.edu" addresses.

The next four lines load the modules required to find the software run by the script.  The shared module will make available all of the modules for globally installed software on Seawulf.  The intel/mpi/64/2017/0.098 module is an implementation of MPI, needed for the mpirun command.

The script then sets the present working directory to a directory containing Intel MPI samples.  By default, Slurm will set the working directory to the directory where the sbatch command was run.

To start the job, use the sbatch command with the filename of the script as the only argument.  Your job will be placed in the specified queue and will run without your involvement.  If you want to cancel the job at any point, you can use the scancel command, providing the number at the beginning of the job id found in the first column of the squeue printout.

Checking Job Status

First, make sure you have loaded the shared and slurm modules:

module load slurm

After you've submitted a job, you can check the status of your job in the queue using the squeue command.  Issuing this command alone will return the status of every job currently managed by the scheduler.  As a result we recommend narrowing the results by user name or job number:

squeue -j <your_job_number>

or

squeue -u <your_user_name>

Or, for a full list of options available to the squeue command issue:

man squeue

The documentation for all Slurm commands can be found here.

DUO Two Factor Authentication

If you tried logging into Seawulf  recently, you may have noticed that you are required to use DUO security to authenticate.  DUO provides an additional layer of security on the Seawulf cluster by asking you to confirm your login attempt by accepting a push notification to your smart phone.

Please check your email for a personalized invitation allowing you to enroll with DUO. Please click the link in your email and follow this article on the DUO enrollment process.  It is recommended to enroll two devices in DUO.

The Division of Information Technology offers the DUO service page, which can be referred to for additional information regarding this service.

 


 

 

Submit a ticket

Additional Information


There are no additional resources available for this article.

Getting Help


The Division of Information Technology provides support on all of our services. If you require assistance please submit a support ticket through the IT Service Management system.

Submit A Ticket