SeaWulf Queues

SeaWulf's queues, also referred to as partitions, are designed to optimize both runtime and resource allocation across its computing nodes, accommodating a wide range of computational needs and hardware configurations. It's essential for users to align their job submissions with the capabilities of each partition, considering factors such as core counts, GPU specifications, and memory capacities, to ensure maximum cluster efficiency and effective utilization of allocated resources.

This KB Article References: High Performance Computing
This Information is Intended for: Guests, Instructors, Researchers, Staff, Students
Created: 10/21/2016 Last Updated: 07/01/2024
Expect a Trade-off Between Resource Usage and Wait Times

Balancing resource requests (nodes and job duration) with queue wait times minimizes waste and delays. Users should carefully assess computational requirements to avoid underutilizing or overloading the system.


Optimize Resource Usage with Test Jobs

Not all applications benefit from using multiple nodes or all cores on a single node. Starting with smaller test jobs allows users to gauge the necessary computational resources accurately before scaling up to larger jobs.


Understand Hardware Specifics for Optimal Performance

SeaWulf offers queues with diverse CPU architectures (e.g., Haswell, Skylake, AMD EPYC Milan) and varying core counts. Understanding these specifics is crucial for matching the computational requirements of applications effectively. Users should select a queue that aligns with their application's CPU architecture requirements to achieve optimal performance.


GPU Usage: Maximize Efficiency

Queues equipped with GPUs should be used exclusively for applications that require GPU acceleration. Before submitting jobs, users must verify compatibility with the available GPU types and ensure that their software is configured to utilize GPUs effectively. 


Consider Shared Queues for Smaller Jobs

SeaWulf offers "shared" queues such as 96-core or 40-core configurations where multiple jobs can run concurrently on the same node. These queues are ideal for jobs that require modest computational resources and maximize resource efficiency by utilizing node capacities fully. 


Use Specialized Queues for Testing and Debugging

For tasks like interactive sessions, brief testing, or code debugging, SeaWulf provides specialized queues such as debug-28core or short queues with rapid turnaround times. These queues are designed to prioritize quick job execution, making them suitable for initial code development and testing phases where rapid feedback is essential.
 

Use Long Queues When Uncertain of Job Duration

If unsure about the runtime required for a job, opting for long queues initially allows flexibility. Users can assess actual job durations from test runs or previous executions and then adjust to more suitable queues accordingly. 


Handle Memory-Intensive Jobs Appropriately

For applications demanding significant memory resources, SeaWulf provides specialized queues like hbm-1tb-long-96core, equipped with nodes featuring large memory capacities tailored for memory-intensive tasks.


Ensure Software/Hardware Compatibility

Ensuring software compatibility with the hardware configurations available in each queue is essential for maximizing job performance. Users should verify that their applications are configured to leverage the specific CPU architectures, core counts, and GPU types available in their chosen queue.

By following these guidelines, users effectively manage job submissions on SeaWulf, optimizing resource usage and minimizing queue wait times.

 

Available Queues

The full list of available queues will depend upon the type of login node you are submitting from. Specifically, there are two sets of login nodes: login1/login2 which provide access to one set of queues, and milan1/milan2 which provide access to another set of queues.

 

Queues accessed from login1 and login2:

QueueCPU ArchitectureVector/Matrix Extension CPU Cores per NodeGPUs per NodeNode Memory1Default RuntimeMax RuntimeMax NodesMin NodesMax Simultaneous Jobs per User
debug-28coreIntel HaswellAVX2280128 GB1 hour1 hour8n/an/a
short-28coreIntel HaswellAVX2280128 GB1 hour4 hours12n/a8
medium-28coreIntel HaswellAVX2280128 GB4 hours12 hours2482
long-28coreIntel HaswellAVX2280128 GB8 hours48 hours8n/a6
extended-28coreIntel HaswellAVX2280128 GB8 hours7 days2n/a6
large-28coreIntel HaswellAVX2280128 GB4 hours8 hours80241
gpuIntel HaswellAVX2284128 GB1 hour8 hours2n/a2
gpu-longIntel HaswellAVX2284128 GB8 hours48 hours1n/a2
gpu-largeIntel HaswellAVX2284128 GB1 hour8 hours4n/a1
p100Intel HaswellAVX212264 GB1 hour24 hours1n/a1
v100Intel HaswellAVX2282128 GB1 hour24 hours1n/a1

1A small subset of node memory is reserved for the OS and file system and is not available for user applications.

 

Queues accessed from milan1 and milan2:

QueueCPU ArchitectureVector/Matrix Extension CPU Cores per NodeGPUs per NodeNode Memory1Default RuntimeMax RuntimeMax NodesMin NodesMax Simultaneous Jobs per UserMultiple Users per Node
debug-40coreIntel SkylakeAVX512400192 GB1 hour1 hour8n/an/aNo
short-40coreIntel SkylakeAVX512400192 GB1 hour4 hours8n/a4No
short-40core-sharedIntel SkylakeAVX512400192 GB1 hour4 hours4n/an/aYes
medium-40coreIntel SkylakeAVX512400192 GB4 hours12 hours1661No
long-40coreIntel SkylakeAVX512400192 GB8 hours48 hours6n/a3No
long-40core-sharedIntel SkylakeAVX512400192 GB8 hours24 hours3n/an/aYes
extended-40coreIntel SkylakeAVX512400192 GB8 hours7 days2n/a3No
extended-40core-sharedIntel SkylakeAVX512400192 GB8 hours3.5 days1n/an/aYes
large-40coreIntel SkylakeAVX512400192 GB4 hours8 hours50161No
short-96coreAMD EPYC MilanAVX2960256 GB1 hour4 hours8n/a4No
short-96core-sharedAMD EPYC MilanAVX2960256 GB1 hour4 hours4n/an/aYes
medium-96coreAMD EPYC MilanAVX2960256 GB4 hours12 hours1661No
long-96coreAMD EPYC MilanAVX2960256 GB8 hours48 hours6n/a3No
long-96core-sharedAMD EPYC MilanAVX2960256 GB8 hours24 hours3n/an/aYes
extended-96coreAMD EPYC MilanAVX2960256 GB8 hours7 days2n/a3No
extended-96core-sharedAMD EPYC MilanAVX2960256 GB8 hours7 days1n/an/aYes
large-96coreAMD EPYC MilanAVX2960256 GB4 hours8 hours38161No
hbm-short-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost960384 GB (256GB DDR5 + 128GB HBM)1 hour4 hours8n/a4No
hbm-medium-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost960384 GB (256GB DDR5 + 128GB HBM)4 hours12 hours1661No
hbm-long-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost960384 GB (256GB DDR5 + 128GB HBM)8 hours48 hours6n/a3No
hbm-1tb-long-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost9601000 GB (1 TB DDR5 + 128 GB HBM configured as level 4 cache)8 hours48 hours1n/a1No
hbm-extended-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost960384 GB (256GB DDR5 + 128GB HBM)8 hours7 days2n/a3No
hbm-large-96coreIntel Sapphire RapidsAMX, AVX512 & Intel DL Boost960384 GB (256GB DDR5 + 128GB HBM)4 hours8 hours38161No
a100Intel Ice LakeAVX512 & Intel DL Boost644256 GB1 hour8 hours2n/a2Yes
a100-longIntel Ice LakeAVX512 & Intel DL Boost644256 GB8 hours48 hours1n/a2Yes
a100-largeIntel Ice LakeAVX512 & Intel DL Boost644256 GB1 hour8 hours4n/a1Yes

1A small subset of node memory is reserved for the OS and file system and is not available for user applications.

 

In addition to the limits in the tables above, users cannot use more than 32 nodes at one time unless running jobs in one of the large queues, and the maximum number of jobs that a user can have queued at any given time is 100.
 

Hardware Configurations Across SeaWulf Queues


SeaWulf's queues offer a variety of hardware configurations tailored to different computational needs. Here’s a detailed breakdown of the hardware specifications across various queues:

  • The debug-28core, short-28core, long-28core, extended-28core, medium-28core, and large-28core queues share a set of identical nodes that have a max of 28 Haswell cores.
  • The debug-40core, short-40core, long-40core, extended-40core, medium-40core, and large-40core queues share a set of identical nodes that have 40 Skylake cores.
  • The short-96core, long-96core, extended-96core, medium-96core, and large-96core queues share a set of identical nodes that have 96 AMD EPYC Milan cores.
  • The short-96core-shared, long-96core-shared, and extended-96core-shared queues also share the same set of identical nodes that have 96 AMD EPYC Milan cores, but multiple jobs are allowed to run on the same node simultaneously.
  • The hbm-short-96core, hbm-long-96core, hbm-extended-96core, hbm-medium-96core, and hbm-large-96core queues share a set of identical nodes that have 96 Intel Sapphire Rapids cores.
  • The hbm-1tb-long-96core queue allocates jobs to 4 identical nodes that have 96 Intel Sapphire Rapids cores.  These nodes differ from the other hbm nodes in that they are configured in Cache mode and have 1 TB DDR5 memory.
  • The gpu and gpu-long queues share a third set of identical nodes that are similar to those used by the short, long, etc. queues but with 4x K80 24GB GPUs each.
  • The p100 and v100 queues each allocate jobs to a single node that has two Tesla P100 16GB or 2x V100 32GB GPUs, respectively.
  • The a100, a100-long, and a100-large queues have 4x A100 80GB GPUs and 64 cores of Intel Xeon Ice Lake CPUs.
     

Users must ensure that their applications are compatible with the specific hardware configurations available in each queue. This involves optimizing software usage to effectively utilize CPU architectures, GPU capabilities, and memory configurations.

For More Information Contact


IACS Support System

Still Need Help? The best way to report your issue or make a request is by submitting a ticket.

Request Access or Report an Issue