Getting Started Guide

Audience: Faculty, Postdocs, Researchers, Staff and Students

This KB Article References: High Performance Computing
This Information is Intended for: Faculty, Postdocs, Researchers, Staff, Students
Last Updated: March 20, 2018

Introduction

This guide assumes you have already received access to SeaWulf, and that you are able to log in.  It serves to get you acquainted with the environment you will be interacting with once on the system.

 

Basic Linux Commands

SeaWulf uses CentOS as its operating system, one of the many variants of Linux.  Unlike a desktop, you interact with this operating system through the terminal, sometimes referred to as the command line.  Windows and OS X both have their own version of the terminal, even though most users choose not to use them.  Here, the use of the terminal is mandatory, so it is important that you know your way around it.

mkdir

When you first log in you will arrive in your home directory.  This is your own private folder to store things related to your work.  You can make subdirectories, files, and even install software here.  Making a subdirectory is simple.  Use the mkdir command:

mkdir <directory name>

Here, <directory name> is the name you want to give the folder.

ls and pwd

After you have done this, you can use the ls command to verify that the directory has been created without issue.  Typing in ls will result in a list of files and subdirectories being printed back to you, all of which are located in your present working directory.  Your working directory is the command line equivalent of your current folder in Windows Explorer or Finder - it's the directory that you're currently looking at.  When you type the pwd command, your working directory will be printed out to you:

/gpfs/home/<my username>

The top-level directory, equivalent to C: on Windows, is always /gpfs.  The home subdirectory of /gpfs contains all users' home directories.

cd

To change your present working directory, you can use the cd command, which stands for change directory.

cd <path>

You can change your directory using either an absolute or relative path.  An absolute path begins with a forward-slash and specifies each level of subdirectories, starting from the root folder (which contains gpfs).  A relative path does not start with a forward-slash, and fills in each subdirectory level up to the directory you're currently in.  For example, if you are in /gpfs/home/<your username> and want to move to a subdirectory in that folder, just give cd the subdirectory name.

touch

If instead of a folder you would rather create a blank text file, you can use the touch command:

touch <new filename>

You can then edit this file with a text editor, which will be covered in a later section.

rm

If you want to delete a file or folder, you can use the rm command (short for remove).  This command will permanently delete anything you tell it to (no trash bin!).  You will pass this command different options, depending on what it is you want to remove.  For a regular file, you can choose not to pass it any options at all:

rm <file to remove>

However, if you want to remove an entire directory (even if it's empty), you will have to pass it the -r option (short for recursive):

rm -r <folder to remove>

This will remove everything in that directory, files and subdirectories included.  The recursive option is called such because it recursively deletes everything it finds.  A word of warning - it is very easy to accidentally delete important information.  Be very careful when using this command.

Most of these commands have a help or -h option.  If you forget how to use a command, simply type that command followed by -h to get a description of it.

 

Modules

All of the commands described above are not programs, but functionality built into the shell.  The shell is the program you're interacting with whenever you type something into the terminal, and is always running.  In addition to these commands, the shell has a few helpful features, one of which is the existence of environment variables.  These are little bits of data that all programs can access, but which go away any time you log out.  Typically they are used for storing paths to directories so that programs know where to look for the files they need.

Yet another command is the env command, which lists all of your environment variables.  When you type this command, you will see something like this printed to your screen:

...
COLLECTION_DATA=/data/collection
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
rvm_path=/home/austin/.rvm
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/usr/share/upstart/xdg:/etc/xdg
rvm_prefix=/home/austin
...

Each line is an individual environment variable. The name of the environment variable is in all caps (e.g. COLLECTION_DATA), and its value to the right of the equals sign.

Dealing with defining these every time you log in is cumbersome, which is why we have installed a software package to simplify the process.  Using the module command (a program, this time), you can load and unload environment variables that you commonly need, depending on the software you use.  The module command has several subcommands that perform different functions.  The most common subcommands are:

module avail
module load <some module>
module list
module unload <some module>

The avail subcommand will list all of the modules we have available on the cluster of your choosing.  You can use this to find out what software is already installed globally.  Special requests can be made to install software globally (outside of a home directory) through the ticketing system and are reviewed for notoriety of the software in question.

The load subcommand will load a module.  This will make a certain software package callable from the terminal.  If, for example, you load the matlab/2016 module, you will be able to start MatLab 2016 by typing in the command matlab.

The list subcommand will show you a list of all the modules you have loaded since logging in.

If you accidentally load the wrong software package or want to switch to a different version of the same software, you should use the unload command to erase the environment variables associated with that software.  If, for example, you decide that MatLab 2016 is insufficient and want to switch to the 2017 release, you would first unload the matlab/2016 module, then load the matlab/2017b module.

Other subcommands exist.  To see a list of these subcommands and how to use them, type module help.

 

Torque and PBS

Now that you know the basic ways of interacting with the cluster, the next step is to understand how to use it to run computational software. SeaWulf has what is called the login node.  Since the beginning of this guide you have been interacting with this node.  The login node is the entry point to the cluster, and only exists as an interface to use the other nodes.  Since everybody will be on this node, it can't be used for heavy computation - otherwise, the system would slow down and become unusable.  To actually run heavy computation, you will have to run your software on the compute nodes.

To manage demand, we have a scheduling system which will grant you access to the compute nodes to run your job when they become available.  This system is called torque, and exists as a module:

module load torque

Running an interactive job

Loading the torque module gives you access to several commands, the most important of which is qsub.  There are several different ways to use this command.  To start off, we will begin an interactive job which asks for one compute node with 28 cores:

qsub -I -l nodes=1:ppn=28 -q debug-28core

The -I flag indicates that we want to manually control a node through the terminal.  The -l flag is followed by a specification of the job, namely the number of nodes we want access to and the processors per node we need.  Depending on the queue you will use, ppn should be 24 or 28.  The -q flag specifies which queue you want to wait in.  A list of queues and their resource limits can be found here.

After running this command you will either be waiting in the debug queue or given a node immediately.  This depends on demand at the time.  You can use the qstat command to show a list of jobs and their status to estimate how long you may be waiting in the queue, if at all.

Once granted access, your terminal will be interacting with the compute node instead of the login node.  Here you can test software you have installed, as you are the only user on this node and have access to all its resources.  Note that your environment variables on this machine are not loaded like they are on the login node - this is because you have just logged in to the compute node, and your shell on the node has just started running.  You will have to reload any modules you loaded on the login node.

To end the interactve job session and return to the login node, type exit.

Running an automated job with PBS

Interactive jobs are good for testing your code or installed software, but it would be rather inconvenient to have to do everything manually on a compute node, especially since your job will end if you log off.  An automated job will run until finished, and with it you won't have to retype commands all the time.

The language that facilitates the automation of jobs is portable batch script.  It is a small extension of the scripting language built into the shell, referred to as shell script.  The difference between the two is that PBS can be used with torque to specify job options, like the number of nodes desired and the expected completion time.  It also communiates with MPI to enable MPI-built programs to synchronize across nodes.

Here is an example PBS script:

#PBS -l nodes=2:ppn=28,walltime=00:05:00
#PBS -N my_example_job
#PBS -q short-28core

module load shared
module load mvapich2/gcc/64/2.2rc1

cd $HOME

mpirun ./myexec /gpfs/home/<my username>/output.txt

The first three lines pass options to qsub automatically, so that you don't have to manually specify them when you run the script.  The new -N option gives the job a name so that it can be easily found in the list of queued and running jobs.  In addition, we've specified an expected wall time in the -l option.

The next two lines load the modules required to find the software run by the script.  The shared module allows your environment to find other modules, and should always be loaded at the beginning of the script.  The mvapich2/gcc/64/2.2rc1 module is an implementation of MPI, needed for the mpirun command.

The script then sets the present working directory to the user's home directory (stored in the environment variable HOME).  Finally, on the last line, the computational software is run.

To start the job, use the qsub command, this time with the filename of the script as the only argument.  Your job will be placed in the specified queue and will run without your involvement.  If you want to cancel the job at any point, you can use the qdel command, providing the number at the beginning of the job id found in the first column of the qstat printout.

Additional Information


There are no additional resources available for this article.

Getting Help


The Division of Information Technology provides support on all of our services. If you require assistance please submit a support ticket through the IT Service Management system.

Submit A Ticket

Supported By