GPUs on Quest#

GPUs are important hardware that can enable significant performance improvement in everything from ML and AI to computational simulations and traditional calculations. The demand for GPUs is continuously growing in almost every sector.

When using GPU resources in a high-performance computing environment, it’s important to understand that wait times may be longer due to the resource demand. It’s also important to ensure that your code is using the GPUs as expected.

This guide provides information on how you can leverage the GPU resources that are available to you in our systems.

Attention

For technical details on the available GPUs in the General Access partitions, please see Quest Storage and Compute Resources.

Note

NVIDIA driver version 570.86.15 is installed on all Quest GPU nodes which is compatible with applications built with CUDA Toolkit 12.8 or earlier.

Requesting General Access GPUs#

A common use case for researchers is the need to submit batch jobs to GPUs. The guide below provides the process to submit a simple GPU enabled batch job to the gengpu partition.

Jobs on the General Access GPU nodes can run for a maximum of 48 hours. To submit jobs to General Access GPU nodes, set the --partition to gengpu and set the --gres option to the number of GPU cards required.

Example batch job submission script header:

#!/bin/bash
#SBATCH --account=<account>  ## Required: your Slurm account name, i.e. eXXXX, pXXXX or bXXXX
#SBATCH --partition=<partition> ## Required: buyin, short, normal, long, gengpu, genhimem, etc.
#SBATCH --time=<HH:MM:SS>       ## Required: How long will the job need to run?  Limits vary by partition
#SBATCH --gres=gpu:1            ## Required: How long will the job need to run?  Limits vary by partition
#SBATCH --nodes=<#>             ## How many computers/nodes do you need? Usually 1
#SBATCH --ntasks=<#>            ## How many CPUs or processors do you need? (default value 1)
#SBATCH --mem=<#G>              ## How much RAM do you need per computer/node? G = gigabytes
#SBATCH --job-name=<name>       ## Used to identify the job

# clear your environment
module purge

# load any modules needed
module load matlab/r2023b

# List the Avaulable GPU Devices
matlab -batch 'gpuDeviceTable'

The specific type of GPU card can also be specified with the --gres option. For example, to request one A100 GPU card, use --gres=gpu:a100:1. To schedule an H100 card, change the option to --gres=gpu:h100:1. It can be important to reference the type of GPU as the memory and capabilities may differ by model. It can sometimes be beneficial to go for the older models due to availability even if their performance may not be as cutting-edge as the newer capabilities.

The --gres option should also be used to request GPU resources for interactive jobs.

Note

Any Slurm memory settings, such as with --mem, are for general, not GPU, memory. Jobs automatically have access to all of the memory of the GPU cards requested. GPU jobs may need as much or more general memory as the memory available on the GPU card (which is 40 GB or 80 GB as indicated in the technical specifications).

Note

There are some limits on the General Access GPU partitions. The total number of GPUs any individual user can utilize across all running jobs is eight. If you submit additional jobs that put you above this limit your job will be set to pending (PD) with a reason of (QOSMaxGRESPerUser). As your active jobs complete and the sum total of GPUs you are running on goes below eight, your pending jobs will change their reason to the usual (RESOURCES) or (PRIORITY).

GPUs for Interactive Sessions#

You can request GPUs for applications like JupyterHub via Quest OnDemand. This can be done by selecting the gengpu partition in the SLURM Partition option for the application. After that, additional configuration options such as the type of GPU and number of GPUs required, should appear.

As it was mentioned above, GPUs are often in higher demand. Because of this, we don’t recommend asking for a large amount of GPU resources for interactive jobs. If you do request a large amount of GPU resources, your job may be waiting for a long period of time before it becomes active and available to you. If you have a unique use case, we recommend you reach out to our team at quest-help@northwestern.edu to discuss further.

Note

GPUs are not available as part of the Quest Analytics service.

Important Considerations When Requesting GPU Resources#

There are several factors to consider before determining the appropriate GPU request.

  • Does the job require a specific type of NVIDIA GPU Card?

  • How much GPU memory does the job need?

  • Does the job require the use of more than 1 GPU card?

    • If so, is the data transfer speed between the cards important for the job?

What to Consider with Specific NVIDIA GPU Cards#

Tip

For shorter wait times, it is best to not specify a specific GPU card unless your job requires it.

#SBATCH --gres=gpu:1 #This will schedule a job with the first available GPU regardless of GPU type. 

There are two types of General Access GPUs on Quest: the NVIDIA A100 and the NVIDIA H100.

When you are compiling your own software, it’s important to pay attention to the type of CUDA that you’re using in the compilation. If the GPU-enabled application has been compiled with CUDA Toolkit less than 11.8, then it will only run on the A100 card. If it was compiled with CUDA Toolkit greater than or equal to 11.8, then it should run on both the A100 and H100 cards. In rare case that the application still does not run on the H100, it is possible that it was not compiled with the correct for CUDA architecture option: sm_90.

In some cases, a particular job (perhaps the training of a model), may run faster on the H100 versus the A100.

If the job requires a specific GPU card type for one of the reasons above, use

  • #SBATCH --gres=gpu:h100:1 to request the H100 specifically.

  • #SBATCH --gres=gpu:a100:1 to request the A100 specifically.

What to consider with GPU Memory#

There are two GPU memory tiers for General Access GPUs: those with 40 GB of memory and those with 80 GB. All H100 cards have 80 GB of GPU memory, but for the A100s some have 40 GB and some have 80 GB.

If the job requires a GPU card with over 40 GB of RAM, use the --constraint option:

  • To request 80 GB A100 or H100 GPU cards, add the line: #SBATCH --constraint=sxm in addition to --gres=gpu:1

What to Consider with Multi-GPU Jobs#

Similar to the type of GPU, it’s important to consider the type of GPU connection for your job as well. This is especially important for jobs that utilize more than 1 GPU.

Quest General Access GPUs have two different types of GPU interconnections: SXM and PCIe. SXM GPUs utilize NVLink, which offers much higher bandwidth (up to 900 GB/s for H100 SXM5) compared to PCIe (up to 128 GB/s for PCIe Gen 5) for inter-GPU communication. Multi-gpu jobs that require significant communication between GPU cards (for instance training or tuning for a LLM) will benefit from running on multiple GPUs with the SXM interconnect.

  • To request SXM configured A100 or H100 GPU cards, add the line: #SBATCH --constraint=sxm in addition to --gres=gpu:1

Installing and Validating GPU Aware Applications#

Compiling CUDA Code#

Some applications will require building CUDA code from source. The NVIDIA CUDA Toolkit provides a development environment for creating high-performance, GPU-accelerated applications.

To see which versions of CUDA are available on Quest, run the command:

$ module spider cuda

Using CUDA Toolkit 11.8 or greater and compiling with the A100 (sm_80) and H100 (sm_90) architectures will allow the application to run on both A100 and H100 GPU. The recommended CUDA toolkit for applications leveraging the H100 is CUDA Toolkit 12.2.

GPU Aware Application List#

Other example of GPU aware code compilations include: