Slurm Job Scheduler#
Overview#
Slurm is the software on Quest that manages and allocates requests for compute resources. Users submit computational “jobs” to Quest requesting the CPUs, nodes, memory, and other resources they need. Slurm manages these requests and allocates resources to jobs as they are available.
Jobs that run without user supervision are called batch jobs. Jobs where the user will actively use an application with the requested compute resources are called interactive jobs.
Slurm decides which jobs to run when based on the requested resources and your Fairshare score, which takes into account the priority of your Slurm account and your past usage of Quest.
Tip
A few common misconceptions about batch jobs:
- We don’t actually aim for 100% efficiency. Job resources can fluctuate and we don’t want the job to run out of memory. We often aim for 75% efficiency. 
- More memory doesn’t always mean a faster job. There are many factors that influence how long a job runs for. Our team is happy to help if you have any questions. 
The Job Submission Script#
Slurm requires users to write a submission script to run a batch job. It is a Bash script that specifies what resources the job needs to run, how to handle output and errors, and what commands to run as part of the job.
Example Submission Script#
Create a .sh file for the submission script.  The file will have #SBATCH statements at the top that give Slurm the information it needs to schedule and run the job.  After that, you can enter commands to load data, run code files, or do the other work of the job.
In all examples, <> denotes a value that you need to fill in.  Fill in these values and remove the <>.
Example: jobscript.sh
#!/bin/bash
#SBATCH --account=<account>  ## Required: your Slurm account name, i.e. eXXXX, pXXXX or bXXXX
#SBATCH --partition=<partition> ## Required: buyin, short, normal, long, gengpu, genhimem, etc.
#SBATCH --time=<HH:MM:SS>       ## Required: How long will the job need to run?  Limits vary by partition
#SBATCH --nodes=<#>             ## How many computers/nodes do you need? Usually 1
#SBATCH --ntasks=<#>            ## How many CPUs or processors do you need? (default value 1)
#SBATCH --mem=<#G>              ## How much RAM do you need per computer/node? G = gigabytes
#SBATCH --job-name=<name>       ## Used to identify the job 
# load any modules needed
module load mamba/24.3.0
# set or change your working directory if needed
cd ~/myscripts
# run any commands or code files
date
python --version
python -c "print('hello')"
The first line of the script, #!/bin/bash, loads the Bash shell; it is required. Only the lines that begin with #SBATCH are interpreted by Slurm at the time of job submission. Normally in Bash, # is a comment character which means that anything written after a # is ignored by the Bash interpreter/language. When writing a submission script, however, the Slurm interpreter recognizes #SBATCH as a command.  Any words following ## on the #SBATCH lines are treated as comments and ignored by Slurm interpreter.
Once Slurm places the job on a compute node, the remainder of the script (everything after the last #SBATCH line) is run. After the Slurm commands, the rest of the script works like a regular Bash script. You can modify environment variables, load modules, change directories, and execute program commands. Lines in the second half of the script that start with # are comments, such as # load any modules needed in the the example above.
Example values for #SBATCH options:
#SBATCH --account=p00000      ## use your Slurm account name
#SBATCH --partition=short
#SBATCH --time=01:00:00       ## one hour
#SBATCH --nodes=1             ## 1 node
#SBATCH --ntasks=1            ## 1 processor
#SBATCH --mem=2G              ## 2 GB of RAM
#SBATCH --job-name=sample_job  
More information on how to choose the values of the Slurm options is in Slurm Configuration Settings.
Note
The compute and memory resources you request affect your Fairshare score. Request only what you need for your job.
Tip
There are additional example job submission scripts in the RCDS Example Job Repository on GitHub .
Submitting A Batch Job#
After you have written and saved your submission script, you can submit your job. At the command line type
$ sbatch <name_of_script>
where, in the example above <name_of_script> would be jobscript.sh.  If your submission script is not in your current working directory, either change to that directory or specify the path to the submission script as part of the command.
Upon submission the scheduler will return your job number:
Submitted batch job 549005
If you have a workflow that accepts or needs the jobid as an input for job monitoring or job dependencies, then you may prefer the return value of your job submission be just the job number. To do this, pass the --parsable argument:
$ sbatch --parsable <name_of_script>
549005
If there is an error in your job submission script, the job will not be accepted by the scheduler and you will receive an error message right away, for example:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
If your job submission receives an error, you will need to address the issue and resubmit your job. If no error is received, your job has entered the queue and will start when resources are available. See Slurm Commands and Job Management to learn how to get information about your job.
Partitions#
All jobs must specify a partition. Partitions determine which compute nodes the job can be scheduled on.
General Access Partitions#
Partitions that run on regular compute nodes without special resources.
| Partition | Minimum Wall Time | Maximum Wall Time | Notes | 
|---|---|---|---|
| short | 00:00:00 | 04:00:00 | For jobs that will run in 4 hours or less. The short partition has access to most compute nodes on Quest. | 
| normal | 04:00:00 | 48:00:00 | For jobs that will run between 4 hours and 2 days. The normal partition has access to more compute nodes than the long partition but fewer than the short partition. | 
| long | 48:00:00 | 168:00:00 | The long partition is for jobs that will run between 2 days and 7 days. The long partition has access to fewer compute nodes than the short and normal partitions. | 
Jobs scheduled on the short partition typically start the soonest due to the greater number of compute nodes available and the shorter time required.  Consider splitting up jobs to run in less than 4 hours when possible.
If you have a General Access allocation and need to run jobs longer than one week, contact quest-help@northwestern.edu for a consultation. Some special accommodations can be made for jobs requiring the resources of up to a single node for a month or less.
Partitions that run on specialty compute nodes with GPU or high-memory resources.
| Partition | Maximum Wall Time | Notes | 
|---|---|---|
| gengpu | 48:00:00 | Only for jobs requiring GPUs. In addition to entering gengpu as the partition in the submission script, the script must specify  | 
| genhimem | 48:00:00 | Only for jobs requiring more than 473 GB memory per node. This partition has access to a 52-core node with 1480 GB of schedulable memory. | 
Priority Access Partitions#
Priority Access partitions are available to users who are part of allocations containing purchased compute resources . The resources available and any limits on jobs are governed by the specific policies of the Priority Access allocation.
In addition to setting the -A/--account value to the Slurm account name, set the partition to either the Slurm account name or “buyin”.
Example:
#SBATCH -A b1234
#SBATCH -p b1234
or
#SBATCH -A b1234
#SBATCH -p buyin
The wall time limits vary by partition. Priority resource allocations have different wall time limits as well.
Some Slurm accounts, such as the GCC b1042 account, have additional partition names. If your account has account-specific partitions, use those partition names instead of the account name or “buyin”.
Warning
Priority Access allocations cannot use the General Access partitions.
Slurm Configuration Settings#
There are many options and settings available when submitting a job beyond the #SBATCH options shown in the example submission script.  Common options, their possible values, and considerations for choosing appropriate values are listed here.
Tip
Slurm offers two methods, one short (such as -A) and one long/verbose (--account) to indicate most settings. For the long option name, an = sign is required after the flag (ex. --account=p00000).
Account
The -A/--<account> option is required in submission scripts.
#SBATCH --account=<slurm_account>
To submit jobs to Quest, you must be part of an active General Access or Priority Access allocation. To use a Priority Access allocation to submit jobs, it must include compute resources; storage only allocations cannot be used to submit jobs. To determine the names of the allocations that you are part of on Quest, run the groups command.
To check if an allocation is active and has access to compute resources, run checkproject <account-name>.  See Checking Allocation Resources for details.
Quest Partitions
The -p/--partition option is required in submission scripts.  Omitting it will result in the error: sbatch: error: Batch job submission failed: No partition specified or system default partition.
#SBATCH --partition=<partition>
Choose the partition based on the resources and time required for the job.
Time
The -t/--time option is required in submission scripts.  Specify the time in terms of hours (HH), minutes (MM), and seconds (SS).
#SBATCH --time=<HH:MM:SS>
The time is referred to as “wall time,” as in the amount of time that passes on the clock on the wall, not computing cycles.
There are two important considerations when selecting the time, the partition that you chose and how long your job is expected to run. Although the partition will control the maximum time that can be selected, do not simply select the maximum allowable time for that partition unless it is truly needed. Jobs requiring longer times can take longer to start running.
Specifying too short of a time can result in the job failing, as there is no way to extend the time of a running job. If you have multiple similar jobs to run, the best practice is to submit a single, representative job to estimate the time required.
When choosing a partition and time, different partitions have different time limits; requesting time outside of the allowed range for a partition will result in an error.
Number of Nodes
The -N/--nodes option specifies how many nodes (computers) are needed.  This is an optional but strongly recommended setting. It should be set to 1 unless your code is specifically designed to use multinode protocols such as MPI.
#SBATCH --nodes=<number_of_nodes>
Warning
The vast majority of software can only run on a single node and cannot run across multiple nodes. If --nodes is not set, but --ntasks (number of cores) is greater than 1, Slurm may match the job with cores on different nodes.  Only applications specifically designed to use multiple nodes can successfully use cores from different nodes.
Number of Cores
The -n/--ntasks option specifies how many cores are needed. Only request more than 1 core if your application can make use of them through parallelization. There are two predominant types of parallelization and depending on which method your application uses, you will either request cores with the -N/--nodes option or request cores without the -N/--nodes option.
Applications using shared memory parallelization (OpenMP, R’s doParallel, Python’s multiprocessing, MATLAB local parpool, etc.) can only utilize CPUs within a single node/computer and CPUs allocated across multiple computers will go unused. In this situation, --nodes=1 must be set along with --ntasks.
#SBATCH --nodes=1
#SBATCH --ntasks=<number_of_cores>
Applications using Message Passing Interface (MPI) can utilize cores (CPUs) allocated across nodes/computers. In this situation, -n/--ntasks should be used without setting the -N/--nodes option.
#SBATCH --ntasks=<number_of_cores>
Warning
--ntasks can allocate cores on different nodes.  Only applications using Message Passing Interface (MPI) can use cores and share memory across multiple nodes.
Before requesting a given number of cores on a single node, please consider how many cores are available on each of the different generations/families of compute nodes that make up Quest. For example, quest12 nodes have 64 cores per node, while quest13 nodes have 128 cores per node.  Requesting a large number of cores per node can limit the number of compute nodes on which the job can be scheduled.
Memory (RAM)
There are two methods for specifying the amount of memory/RAM:
- --memspecifies how much memory per node
- --mem-per-cpuspecifies how much memory per core; use this option if specifying- --ntasks
In both cases, specify memory with a number followed by G for gigabytes or M for megabytes.
Example:
#SBATCH --mem-per-cpu=3G
If your job submission script does not specify how much memory your job requires, then the default setting is 3110 MB of memory per core. For a job specifying 10 cores and not specifying the memory requirements, Slurm will allocate 31100 MB (~30.3 GB)in total.
The memory that is allocated to a job via this setting creates a hard upper limit; an application cannot access memory beyond what Slurm reserves. Jobs that try to access more memory than allocated will be terminated. To determine the amount of memory needed, run a test job with higher memory limits, and then set your memory requirements to approximately 110% of the memory used by the test job to account for variation across jobs.
There is a special setting to request the entire memory of the computer.
#SBATCH --mem=0
How much memory this ends up being will depend on what generation of compute node the job is assigned to.
If a job requests more memory than is available on any of the compute nodes available in that partition, then the job will be rejected with the following message:
srun: error: Unable to allocate resources: Requested node configuration is not available
Jobs requesting high amounts of memory may only be able to run on more recent generations of compute node, which may mean it takes longer for the required resources for the job to become available.
Standard Output/Error
There are two output streams from a job: standard output and standard error. These output streams can be written to separate files or the same file.
To send the output to separate files, use:
#SBATCH --output=<name of file>
#SBATCH --error=<name of file>
To send both the standard output  and standard error to a single file, use only -o/--output and omit --error:
#SBATCH --output=<name of file>
The files will be created in the directory from which you submit the job. To direct the output to a different, already created, directory, include the path as part of the filename. A filename must be specified; a directory alone will not work.
If you include neither --output nor -error, Slurm will write both the standard output and standard error from your job in a file called
slurm-<slurm jobid>.out
where <slurm jobid> is the ID given to your job by SLURM. You can replicate this default naming scheme yourself by providing the following option:
#SBATCH --output=slurm-%j.out
In addition to %j, which will add the job id to the name of the output file, there is also %x which will add the job name to the name of the output file.
Job Name
The -J/--job-name assigns a name to the job to help manage and identify the job.
#SBATCH --job-name=<jobname>
If not specified, the name of the job submission file will be used as the job name.
%j, which stands for the job ID number, can be used as part of the job name if desired.
Email Notifications
To receive emails regarding the status of your Slurm jobs, include both the --mail-type option and the --mail-user option in the job submission script:
#SBATCH --mail-type=<job state that triggers email> ## BEGIN, END, FAIL, or ALL
#SBATCH --mail-user=<email address>
Both options must be specified to receive emails from Slurm. Any combination of BEGIN, END, and FAIL can be used; for example: --mail-type=END,FAIL.
Compute Node Generation
To specify a specific generation of compute nodes for your job, include a -C/--constraint option in the job submission script:
#SBATCH --constraint=<name of compute node generation>
Compute node generations on Quest have names like quest13; see generations of compute nodes for the current options.
If the job is requesting multiple nodes, and you would like all the compute nodes to be of the same generation and not a combination of generations, then you can specify this with a command such as:
#SBATCH --constraint="[quest10|quest11|quest12|quest13]"
This would allow all of the nodes to be of any of the four specified generations, but require all of them to be of the same generation. This is recommended for jobs that are parallelized using MPI.
--constraint is also used to specify specific types of GPU cards. See the GPU page for more information.
Additional Options
| Option | Slurm (sbatch) | Description | 
|---|---|---|
| Request GPUs | 
 | The “name” field will always be  | 
| Job array | 
 | Submit a job array, a type of submission which will launch multiple jobs to be executed with identical parameters. The indexes specification identifies what array index values should be used. Multiple values may be specified using a comma separated list and/or a range of values with a “-” separator. For example,  | 
| Copy environment | 
 | Optional: Default is to export ALL environmental settings from the submission environment to the runtime environment. | 
| Copy environment variable | 
 | Example:  | 
| Job dependency | 
 | After the specified jobs start or are cancelled.  | 
| Defer job until the specified time | 
 | Submit the batch script to the Slurm controller immediately, like normal, but tell the controller to defer the scheduling of the job until the specified time.  | 
| Node exclusive job | 
 | The job is allocated all CPUs and GRES on all requested nodes, but is only allocated as much memory as it requested. To request all the memory on the allocated nodes as welle, use  | 
| Instead of specifying how many nodes you want, you could request a specific set of compute nodes. This cannot be used in combination with the  | 
 | Request a specific list of hosts. The job will contain all of these hosts and possibly additional hosts as needed to satisfy resource requirements | 
Environmental Variables Set by Slurm
Multiple variables are set by Slurm and are accessible in the environment of a job after the job has started running.
| Info | Variable Name | 
|---|---|
| Job name | 
 | 
| Job ID | 
 | 
| Submission directory | 
 | 
| Node list | 
 | 
| Job array index | 
 | 
| Partition name | 
 | 
| Number of nodes allocated | 
 | 
| Number of processes | 
 | 
| Number of processes per node | 
 | 
| Requested tasks per node | 
 | 
| Requested CPUs per task | 
 | 
| Scheduling priority | 
 | 
| Job user | 
 | 
| Login node from which the job was submitted | 
 | 
Slurm Schedulable Resources#
The Quest Storage and Compute Resources page details the raw resources available on Quest. In practice, not all of these resources are available for user jobs. Memory on each compute node is dedicated for use by the Quest file system and Operating System to improve the stability and performance of Quest. The amount of schedulable memory is dependent on the architecture/generation of the node. A table is provided below which summarizes the amount of schedulable resources per generation and the Slurm constraint options associated with them.
| Node Family Name | Number of Schedulable CPUs | Amount of Schedulable Memory/RAM | Constraints | 
|---|---|---|---|
| quest10 | 52 | 166GB | 
 | 
| quest10 GPU | 52 | 166GB | 
 | 
| quest11 | 64 | 221GB | 
 | 
| quest12 | 64 | 221GB | 
 | 
| quest12 GPU | 64 | 473 GB | 
 | 
| quest13 | 128 | 473GB | 
 | 
| quest13 GPU | 64 | 976 GB | 
 | 
Slurm Commands and Job Management#
Once a job has been submitted, additional Slurm commands are available to monitor and manage the job.
Common Slurm Commands
| Action | Slurm Command | 
|---|---|
| Delete a job | 
 | 
| Job status (by job) | 
 | 
| Job status (by user) | 
 | 
| Job status (detailed) | 
 | 
| Show expected start time | 
 | 
| Queue list / info | 
 | 
| Hold a job | 
 | 
| Release a job | 
 | 
| Monitor or review a job’s resource usage | 
 | 
| View job batch script | 
 | 
Further details and options for these commands are in the sections below.
List Current Jobs with squeue
The squeue command can be used display information about current jobs on Quest.  squeue alone will show all jobs across all users.  Use the -u option to limit the output to your NetID.
| Command | Description | 
|---|---|
| 
 | Show only jobs belonging to user specified | 
| 
 | Show only jobs belonging to Slurm account specified | 
| 
 | Display the status of the specified job | 
| 
 | Show running jobs for the specified user | 
| 
 | Show pending jobs for the specified user | 
| 
 | See documentation and additional options | 
List Current and Past Jobs with sacct
The sacct command  can be used display information about your past and
current jobs on Quest.  By default, sacct will only display information about jobs
from today. Unlike the squeue command, there is no need to supply -u <netid> to sacct as it will do this by default. The default output includes just a few fields: job ID, jobname,
partition, account, allocated CPUs (AllocCPUs), state, and exit code.
Example:
$ sacct -X 
JobID      JobName Partition  Account AllocCPUS   State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1453894      bash      short   p1234     1 COMPLETED   0:0
1454434   sample_job      short   p1234     52   FAILED   6:0 
We strongly recommend including the -X flag when using sacct. This flag will suppress the output to only show statistics relevant to the job itself and will exclude the individual job steps which are not relevant for the vast majority of situations.
The lone exception to this is when you would like to display information related to the GPU utilization of the job. this information is contained in the “batch” step of the job, and can be displayed for a given job via the following command:
$ sacct -j <slurm-jobid>.batch --format=jobid,tresusagein%120
For example,
$ sacct -j 3547592.batch --format=jobid,tresusagein%120
JobID                                                                                                                  TRESUsageInAve 
------------ ------------------------------------------------------------------------------------------------------------------------ 
3547592.bat+          cpu=00:04:56,energy=0,fs/disk=13439086206,gres/gpumem=6218M,gres/gpuutil=95,mem=5825476K,pages=0,vmem=21553268K 
The state codes are available in the Slurm documentation .  The exit codes do not have standard meanings across all jobs, but they can be used in troubleshooting.  A normal exit code with no errors is 0:0.
To include additional information in the output of sacct, add the --format option with a comma separated list of values for the additional fields . The values are not case-sensitive.
$ sacct --format=var_1,var_2, ... ,var_N
Example:
$ sacct -X --format=jobid,priority
See the Slurm documentation for the full list of additional fields that can be included.
To retrieve the submission script used by a job, use:
$ sacct -j <job_num> -B
To display jobs from a shorter or longer time period than the default (today), use the --starttime and/or --endtime options:
$ sacct -X --starttime=03/14/25 --format=jobname,nnodes,ncpus,elapsed
Job records are kept for about a year.
Job Resource Use with seff
To see the resources used by your job, including the maximum amount of memory, run the command:
$ seff <job_id>
This returns output similar to:
Job ID: 767731
    Cluster: quest
    User/Group: abc123/abc123
    State: COMPLETED (exit code 0)
    Cores: 1
    CPU Utilized: 00:10:00
    CPU Efficiency: 100.00% of 00:10:00 core-walltime
    Job Wall-clock time: 00:10:00
    Memory Utilized: 60.00 GB
    Memory Efficiency: 50.00% of 120.00 GB
Check the job state reported in the 4th line. If it is “COMPLETED (exit code 0)”, look at the last two lines. “Memory Utilized” is the amount of memory your job used, in this case 60 GB.
If the job State is FAILED or CANCELLED, the Memory Efficiency percentage reported by seff will be extremely inaccurate. The seff command only works on jobs that have COMPLETED successfully.
Detailed Job Information with checkjob
The checkjob command displays detailed information about a submitted job’s status and diagnostic information that can be useful for troubleshooting submission issues. It can also be used to obtain useful information about completed jobs such as the allocated nodes, resources used, and exit codes.
$ checkjob <jobid>
where you can get the <jobid> using the squeue command.
Example for a Successfully Running Job
$ checkjob 548867
--------------------------------------------------------------------------------------------------------------------
JOB INFORMATION
--------------------------------------------------------------------------------------------------------------------
JobId=548867 JobName=high-throughput-cpu_000094
    UserId=abc123(123123) GroupId=abc123(123) MCS_label=N/A
    Priority=1315 Nice=0 Account=p12345 QOS=normal
    JobState=RUNNING Reason=None Dependency=(null)
    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:13:13 TimeLimit=00:40:00 TimeMin=N/A
    SubmitTime=2019-01-22T12:51:42 EligibleTime=2019-01-22T12:51:43
    AccrueTime=2019-01-22T12:51:43
    StartTime=2019-01-22T15:52:20 EndTime=2019-01-22T16:32:20 Deadline=N/A
    PreemptTime=None SuspendTime=None SecsPreSuspend=0
    LastSchedEval=2019-01-22T15:52:20
    Partition=short AllocNode:Sid=quser21:15454
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=qnode[5056-5060]
    BatchHost=qnode5056
    NumNodes=5 NumCPUs=120 NumTasks=120 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=120,mem=360G,node=5,billing=780
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    MinCPUsNode=1 MinMemoryCPU=3G MinTmpDiskNode=0
    Features=(null) DelayBoot=00:00:00
    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
    Command=(null)
    WorkDir=/projects/p12345/high-throughput
    StdErr=/projects/p12345/high-throughput/lammps.error
    StdIn=/dev/null
    StdOut=/projects/p12345/high-throughput/lammps.output
    Power=
--------------------------------------------------------------------------------------------------------------------
JOB SCRIPT
--------------------------------------------------------------------------------------------------------------------
#!/bin/bash
#SBATCH --account=p12345
#SBATCH --partition=normal
#SBATCH --job-name=high-throughput-cpu
#SBATCH --ntasks=120
#SBATCH --mem-per-cpu=3G
#SBATCH --time=00:40:00
#SBATCH --error=lammps.error
#SBATCH --output=lammps.output
module purge
module load lammps/lammps-22Aug18
mpirun -n 120 lmp -in in.fcc
Note in the output above that:
- The JobState is listed as RUNNING. 
- The time passed since job start (RunTime) and the total walltime requested (TimeLimit) are listed. 
- The node name(s) are listed after NodeList. 
- The paths to job’s working directory (WorkDir), standard error (StdErr) and output (StdOut) files are given. 
- If a batch job script is used for submission, the script is presented at the end. 
Cancelling Jobs with scancel
You can cancel one or all of your jobs with scancel. Proceed with caution, as this cannot be undone, and you will not be prompted for confirmation after issuing the command.
| Command | Description | 
|---|---|
| scancel  | Cancel the job with given job ID | 
| scancel -u  | Cancel all the jobs of the user | 
Holding, Releasing, or Modifying Jobs with scontrol
Users can place their jobs in a “JobHeldUser” state while submitting the job or after the job has been queued. Running jobs cannot be placed on hold. Placing a job on hold means that the system will set its priority to 0 and not attempt to schedule it until the hold is removed.
| Command | Description | 
|---|---|
| 
 | Place hold within the job script | 
| sbatch -H  | Place hold while submitting from command line | 
| scontrol hold  | Place hold on a queued job from command line | 
The job status will be shown in the output of monitoring commands such as squeue or checkjob.
To release a job from user hold state:
$ scontrol release <jobid>
The job control command (scontrol) can also be used for changing the parameters of a submitted job before it starts running. The following parameters can be modified safely:
- Job dependency (change to “none”) 
- Partition 
- Job name 
- Wall clock limit 
- Slurm Account 
Examples of using scontrol to change a job’s parameters:
| Command | Description | 
|---|---|
| 
 | Change job to depend on the successful completion of the job 1000 | 
| 
 | Change partition to short | 
| 
 | Change name to myjob | 
| 
 | Set job time limit to 2 hours | 
| 
 | Change the account to p12345 | 
For a complete listing of scontrol options, see the official scontrol documentation .
Probing Priority with sprio
Slurm implements a multi-factor priority scheme for ordering the queue of jobs waiting to be run. sprio command is used to see the contribution of different factors to a pending job’s scheduling priority.
The sprio command can be helpful to get an idea of where your jobs might be in the overall queue of pending jobs. However, we caution against using it as a tool to estimate specific wait times. The high-performance computing systems are complex and have many different components that impact when jobs will run. This leads to variation in wait time. Similar to reservations at a restaurant, sprio can be helpful to tell you that you’re 1 of the next 10 groups to get a table, but it doesn’t necessarily mean that you’ll be seated in the next 5 minutes.
| Command | Description | 
|---|---|
| 
 | Show scheduling priority for all pending jobs for the user | 
| 
 | Show scheduling priority of the defined job | 
For running jobs, you can see the starting priority using checkjob <jobid> command.
Special Types of Job Submissions#
In this section, we provide details and examples of how to use Slurm to run:
- Interactive jobs where the user has access to the allocated compute resources to run commands interactively 
- Job arrays that submit multiple jobs using the same job submission script 
- Jobs that depend on other jobs completing first 
Interactive Jobs
This section explains how to start interactive jobs from the command line on the Quest login nodes.
Tip
Quest OnDemand provides a great alternative for launching interactive jobs through your web browser.
Jobs without GUIs
To launch an interactive job in order to run
an application without a GUI, use either the
srun  or salloc 
command instead of sbatch.
srun
If you use srun to run an interactive job, Slurm will
automatically launch a terminal session on the compute node after it
schedules the job and the job starts.
Warning
When using srun , if you lose your connection to Quest, the interactive job will terminate.
Instead of writing a job submission script as you would do for a batch job, for an interactive job, you can specify the key options as part of the srun command directly.  The same options for sbatch can be used with srun.
Example:
[quser41 ~]$ srun --nodes=1 --ntasks=1 --account=<account> --mem=<memory> \ # continue next line
> --partition=<partition> --time=<hh:mm:ss> --pty bash -l
srun: job 3201233 queued and waiting for resources
srun: job 3201233 has been allocated resources
----------------------------------------
srun job start: Mon Mar 14 13:25:41 CDT 2022
Job ID: 3201233
Username: abc123
Queue: short
Account: pXXXXX
----------------------------------------
The following variables are not
guaranteed to be the same in
prologue and the job run script
----------------------------------------
PATH (in prologue) : /usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/opt/ibutils/bin
WORKDIR is: /home/<netid>
----------------------------------------
[qnode0114 ~]$
In the example above, the initial command has been split across two lines due to space constraints in the documentation.  The command can be run on a single line.  quser41 is a login node, while qnode0114 is a compute node.  --pty bash indicates to start a bash shell on the compute node. Adding the option -l to the bash command will start a login bash shell on the compute node. This means that the environment for the bash shell will be the same as the one you have on the login nodes.
You can stop your interactive session entering the command exit.
[qnode0114 ~]$ exit
[quser41 ~]$
salloc
Unlike srun, salloc  does not automatically launch a
terminal session on the compute node. Instead, after it schedules your
job, it will tell you the name of the compute node the job has been scheduled on. Then you can run ssh qnodeXXXX to directly connect to the compute
node. If you lose connection to your interactive session started with salloc, the interactive job will not terminate.
Example:
[quser41 ~]$ salloc --nodes=1 --ntasks=1 --account=<account> --mem=<memory> --partition=<partition> --time=<hh:mm:ss>
salloc: Pending job allocation 276305
salloc: job 276305 queued and waiting for resources
salloc: job 276305 has been allocated resources
salloc: Granted job allocation 276305
salloc: Waiting for resource configuration
salloc: Nodes qnode0114 are ready for job
[quser41 ~]$ ssh qnode0114
Warning: Permanently added 'qnode0114,172.20.134.29' (ECDSA) to the list of known hosts.
[qnode0114 ~]$
In the example above, quser41 is a login node, and qnode0114 is a compute node.
You can stop your interactive session entering the command scancel <slurm-jobid>.
[qnode0114 ~]$ scancel 276305
Jobs with GUIs
To launch an interactive job in order to run
an application with a GUI, first you need to connect to Quest using an
application with X11 forwarding support. We recommend using FastX.
Once you have connected to Quest with X11 forwarding enabled, you can
then use either the
srun or salloc
command with the --x11 option added.
Examples:
$ srun --x11 --nodes=1 --ntasks=1 --account=<account> --mem=<memory> --partition=<partition> --time=<hh:mm:ss> --pty bash -l
$ salloc --x11 --nodes=1 --ntasks=1 --account=<account> --mem=<memory> --partition=<partition> --time=<hh:mm:ss>
Job Arrays
Job arrays can be used to submit multiple jobs at once that use the same job submission script. This can be useful if you want to run the same script multiple times with different input parameters.
A job array is created with the addition of the --array option to a job submission script, and using the $SLURM_ARRAY_TASK_ID environment variable to keep track of which job in the array is running.  It is useful to update the job name and output files to incorporate the array ID in the filenames so that a separate log file is created for each job.
Example submission file: jobsubmission.sh
#!/bin/bash
#SBATCH --account=<account>  ## Required: your Slurm account name, i.e. eXXXX, pXXXX or bXXXX
#SBATCH --partition=<partition> ## Required: buyin, short, normal, long, gengpu, genhimem, etc.
#SBATCH --time=<HH:MM:SS>       ## Required: How long will the job need to run?  Limits vary by partition
#SBATCH --nodes=<#>             ## How many computers/nodes do you need? Usually 1
#SBATCH --ntasks-per-node=<#>   ## How many CPUs or processors do you need on per computer/node? (default value 1)
#SBATCH --mem=<#G>              ## How much RAM do you need per computer/node? G = gigabytes
#SBATCH --array=0-9             ## number of jobs to run: here, 10 jobs, labelled 0 through 9 
#SBATCH --job-name="sample_job_\${SLURM_ARRAY_TASK_ID}"   ## use the array id in the name of the job
#SBATCH --output=sample_job.%A_%a.out                     ## use the jobid (%A) and the array index (%a) to name the log files
module purge all
module load python-anaconda3
source activate /projects/intro/envs/slurm-py37-test
# Read in the different input arguments from a file input_args.txt
IFS=$'\n' read -d '' -r -a input_args < input_args.txt
python slurm_test.py --filename ${input_args[$SLURM_ARRAY_TASK_ID]}
This script will create 10 jobs, labelled with the job array indices 0 through 9.  Each job runs the same Python script, slurm_test.py using different input arguments from the file input_args.txt.
input_args.txt contains:
filename1.txt
filename2.txt
filename3.txt
filename4.txt
filename5.txt
filename6.txt
filename7.txt
filename8.txt
filename9.txt
filename10.txt
Tip
Make sure that the number of lines in the input file matches the number of jobs specified by --array.
myscript.py contains the following code to read a --filename argument from the command line:
import argparse
import time
def parse_commandline():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("--filename",
                        help="Name of file",
                        default=None)
    args = parser.parse_args()
    return args
if __name__ == '__main__':
    args = parse_commandline()
    print(args.filename)
myscript.py will receive the value from input.csv as an argument.  The first job in the array will receive “filename1.txt” as the input, the second job will receive “filename2.txt” as the input, etc.
Submit the script as normal with sbatch:
$ sbatch jobsubmission.sh
The job array will then be submitted to the scheduler with each array element requesting the same resources (such as number of cores, time, memory etc.) per job.
Dependent Jobs
Dependent jobs are a series of jobs which run or wait to run conditional on the state of another job. For instance, you may submit two jobs and you want the first job to complete successfully before the second job runs.  This is helpful if one job needs to produce a data file or other output file that is an input to another job.  In order to submit this type of workflow, you pass sbatch the jobid of the job that needs to finish before this job starts via the command line argument:
--dependency=afterok:<jobid>
You can manually submit a series of jobs, but it is helpful to write all of your sbatch submission commands in a bash script and pass the job IDs programmatically.  In order to be able to capture and pass the job ID of one job to the next, save the output of a call to sbatch in a variable.
Here is an example submitting 3 jobs in sequence, where each job depends on the previous job completing before it runs.  This example uses the same job submission script, example_submit.sh for each job, but this is not required.  You can use different submission scripts with different resource requests for each job.
Example: wrapper_script.sh
#!/bin/bash
# submit the first job: no special options because it's the first job
jid0=$(sbatch --parsable example_submit.sh)
# submit the second job: dependent on the first job with ID stored in jid0
jid1=$(sbatch --parsable --dependency=afterok:${jid0} --export=DEPENDENTJOB=${jid0} example_submit.sh)
# submit the third job: dependent on the second job with ID stored in jid1
jid2=$(sbatch --parsable --dependency=afterok:${jid1} --export=DEPENDENTJOB=${jid1} example_submit.sh)
The variables jid0, jid1, and jid2 will contain the job ID that Slurm assigns each job.
Tip
Anything you can tell Slurm via #SBATCH in the submission script itself you can also pass to sbatch via the command line.
example_submit.sh
#!/bin/bash
#SBATCH --account=<account>  ## Required: your Slurm account name, i.e. eXXXX, pXXXX or bXXXX
#SBATCH --partition=<partition> ## Required: buyin, short, normal, long, gengpu, genhimem, etc.
#SBATCH --time=<HH:MM:SS>       ## Required: How long will the job need to run?  Limits vary by partition
#SBATCH --nodes=<#>             ## How many computers/nodes do you need? Usually 1
#SBATCH --ntasks-per-node=<#>   ## How many CPUs or processors do you need on per computer/node? (default value 1)
#SBATCH --mem=<#G>              ## How much RAM do you need per computer/node? G = gigabytes
#SBATCH --output=job_%A.out     ## include the job ID in the output file name
# very simple job to just print information to the output file
# print the date and time to the output file
date
# print out the ID of the job this one was dependent on
if [[ -z "${DEPENDENTJOB}" ]]; then
    echo "First job in workflow"
else
    echo "Job started after " $DEPENDENTJOB
fi
$ bash wrapper_script.sh
Another way to run this example, would be to make wrapper_script.sh  executable with chmod +x wrapper_script.sh and invoke it with ./wrapper_script.sh.
This will submit the three jobs in sequence.  Using squeue -u <netid> after running the above command, you should see jobs 2 and 3 pending for reason DEPENDENCY.
Troubleshooting#
Debugging a Job Submission Script Rejected By The Scheduler#
If your job submission script generates an error when you submit it with the sbatch command, the problem in your script is in one or more of the lines that begin with #SBATCH.
Errors can be difficult to identify, and often require careful reading of your #SBATCH lines.  To debug job scripts that generate error messages:
- Look up the error message in the section below to identify the most likely reason your script received that error message. 
- Once you have identified the issue with your script, edit the script to correct it and resubmit your job. 
- If you receive the same error message again, examine the error message and the mistake in your script more closely. Sometimes the same error message can be generated by two different issues in the same script, meaning it is possible that you may resolve the first issue but need to correct a second issue to clear that particular error message. 
- When you resubmit your job you may receive a new error message. This means the issue that generated the first error message has been resolved, and now you need to fix another issue. 
When Slurm encounters a problem in your job submission script, it does not read the rest of your script that comes after the error. Slurm returns up to two distinct error messages at a time. If your submission script has more than two problems, you will need to resubmit your job multiple times to identify and fix all of them.
Common Error Messages#
The errors listed below may also be generated by interactive job submissions using srun or salloc. In those cases, the error messages will begin with “srun error” or “salloc error.”
sbatch: error: --account option required
sbatch: error: Unable to allocate resources: Invalid account or account/partition combination specified
Location of error: 
#SBATCH --account=<account> 
or 
#SBATCH -A <account>
Example of correct syntax: 
#SBATCH --account=p12345 
or 
#SBATCH -A p12345
Possible issue: The script doesn’t have an #SBATCH line specifying account
Fix: Confirm that #SBATCH --account=<account> is in the script
Possible issue: A typo in the --account= or -A part of this #SBATCH line
Fix: Examine this line closely to make sure the syntax is correct
Possible issue: You are not a member of the Slurm account specified in your job submission script
Fix: Confirm you are a member of the allocation by typing groups at
the command line on Quest. If the allocation you have specified in your
job submission script is not listed, you are not a member of this
allocation. Use an allocation that you are a member of in your job
submission script.
Possible issue: The error is on a line earlier in your job submission script which causes Slurm to stop reading your script before it reaches the #SBATCH --account=<account> line
Fix: Move the #SBATCH --account=<account> line to be immediately
after the line #!/bin/bash and submit your job again. If this
generates a new error referencing a different line of your script, the
account line is correct and the mistake is elsewhere in your submission
script. To resolve the new error, follow the debugging suggestions for
the new error message.
sbatch: error: Your allocation has expired
sbatch: error: Unable to allocate resources: Invalid qos specification
Location of error: 
#SBATCH --account=<account> 
or 
#SBATCH -A <account>
The allocation specified in your job submission script is no longer active.
Possible issue: Your allocation has expired
Fix: If you are a member of more than one allocation, you may wish to submit your job to an alternate allocation. To see a list your allocations, type groups at the command line on Quest.  Otherwise, renew your allocation or request a new one .
srun: error: --partition option required
srun: error: Unable to allocate resources: Access/permission denied
Location of error: 
#SBATCH --partition=<partition> 
or 
#SBATCH -p <partition>
Example of correct syntax for General Access allocations (“p” Slurm account name): 
#SBATCH --partition=short  
or 
#SBATCH -p short
Example of correct syntax for Priority Access allocations (“b” Slurm account name): 
#SBATCH --partition=buyin 
or 
#SBATCH -p buyin
Possible issue: The script doesn’t have an #SBATCH line specifying partition
Fix: Confirm that #SBATCH --partition=<partition> or #SBATCH -p <partition> is in the script.
Possible issue: A typo in the --partition= or -p part of this #SBATCH line
Fix: Examine this line closely to make sure the syntax is correct
Possible issue: The error is on a line earlier in the job submission script which causes Slurm to stop reading the script before it reaches the #SBATCH --account=<account> line
Fix: Move the #SBATCH --account=<account> line to be immediately after the line #!/bin/bash and submit your job again. If this generates a new error referencing a different line of your script, the account line is correct and the mistake is elsewhere in your submission script. To resolve the new error, follow the debugging suggestions for the new error message.
sbatch: error: Unable to allocate resources: Invalid qos specification
Location of error: 
#SBATCH --partition=<partition> 
or 
#SBATCH -p <partition>
Meaning: The partition name specified is not associated with the account in the line #SBATCH --account=<account>.
Possible issue: The script specifies a Priority Access allocation for the account (account name starts with a “b”), but you’ve entered “short”, “normal” or “long” as the partition.
Fix: Priority Access allocations with purchased compute resources should use the “buyin” partition or partitions specific to their account, not “short”, “normal”, or “long”. Change the partition in the script.
Possible issue: Your script specifies a Slurm account and partition combination which do not belong together.
Fix: Specify the correct partition for your account. To see the accounts and partitions you have access to, use this version of the sinfo command:
$ sinfo -o "%g %.10R %.20l"
GROUPS      PARTITION         TIMELIMIT
b1234       buyin             168:00:00
Note that “GROUPS” are Slurm accounts on Quest.
In this example, valid lines in your job submission script that relate to account, partition and time would be:
#SBATCH --account=b1234
#SBATCH --partition=buyin
#SBATCH --time=168:00:00  ## maximum value; shorter times are OK
sbatch: error: invalid partition specified: <partition_name>
sbatch: error: Unable to allocate resources: Invalid partition name specified
Location of error: 
#SBATCH --partition=<partition> 
or 
#SBATCH -p <partition>
Example of correct syntax for General Access allocations (“p” Slurm account name): 
#SBATCH --partition=short  
or 
#SBATCH -p short
Example of correct syntax for Priority Access allocations (“b” Slurm account name): 
#SBATCH --partition=buyin 
or 
#SBATCH -p buyin
Possible issue: A typo in the --partition= or -p part of this #SBATCH line
Fix: Examine this line closely to make sure the syntax is correct
Possible issue: The script specifies a General Access allocation (“p” Slurm account) with a partition that isn’t “short”, “normal” or “long”
Fix: Change the partition to be “short”, “normal” or “long”
sbatch: error: Unable to allocate resources: Invalid account or account/partition combination specified
sbatch: error: Unable to allocate resources: User’s group not permitted to use this partition
This message can refer to mistakes on the #SBATCH lines specifying account or partition.
Possible location of error specifying account: 
#SBATCH --account=<account> 
or 
#SBATCH -A <account>
Possible location of error specifying partition 
#SBATCH --partition=<partition> 
or 
#SBATCH -p <partition>
Possible issue: The syntax in the #SBATCH line specifying the account is incorrect
Fix: Examine the account line closely to confirm the syntax is exactly correct. Example of correct account syntax: 
#SBATCH --account=p12345 
or 
#SBATCH -A p12345
Possible issue: You are trying to run in a partition that belongs to one allocation (Slurm account), while specifying a different allocation (Slurm account).
Fix: Specify the correct partition for your account. To see the accounts and partitions you have access to, use this version of the sinfo command:
$ sinfo -o "%g %.10R %.20l"
GROUPS      PARTITION         TIMELIMIT
b1234       buyin             168:00:00
Note that “GROUPS” are Slurm accounts on Quest.
In this example, valid lines in your job submission script that relate to account, partition and time would be:
#SBATCH --account=b1234
#SBATCH --partition=buyin
#SBATCH --time=168:00:00  ## maximum value; shorter times are OK
Possible issue: The error is on a line earlier in your job submission script which causes Slurm to stop reading your script before it reaches the #SBATCH --account=<account> line
Fix: Move the #SBATCH --account=<account> line to be immediately after the line #!/bin/bash and submit your job again. If this generates a new error referencing a different line of your script, the account line is correct and the mistake is elsewhere in your submission script. To resolve the new error, follow the debugging suggestions for the new error message.
sbatch: error: --time limit option required
sbatch: error: Unable to allocate resources: Requested time limit is invalid (missing or exceeds some limit)
Location of error: 
#SBATCH --time=<hours:minutes:seconds>  
or 
#SBATCH -t <hours:minutes:seconds> \
Example of correct syntax: 
#SBATCH --time=10:00:00  
or 
#SBATCH -t 10:00:00
Possible issue: The script doesn’t have an #SBATCH line specifying time
Fix: Confirm that #SBATCH --time=<hh:mm:ss> is in the script
Possible issue: A typo in the --time= or -t part of this #SBATCH line
Fix: Examine this line closely to make sure the syntax is correct
Possible issue: The time request is too long for the partition
Fix: Review the time limits of your partition and adjust the amount of time requested by your script. Priority Access Slurm accounts that begin with a “b” have their own wall time limits. For information on the wall time of your partition, use the sinfo command:
$ sinfo -o "%g %.10R %.20l"
GROUPS      PARTITION         TIMELIMIT
b1234       buyin             168:00:00
To fix this error, set your wall time to be less than the time limit of your partition and resubmit your job.
Possible issue: The error is on a line earlier in your job submission script which causes Slurm to stop reading your script before it reaches the #SBATCH --account=<allocation> line
Fix: Move the #SBATCH --time=<HH:MM:SS> line to be immediately after the line #!/bin/bash and submit your job again. If this generates a new error referencing a different line of your script, the account line is correct and the mistake is elsewhere in your submission script. To resolve the new error, follow the debugging suggestions for the new error message.
sbatch: unrecognized option <option>
Example:
Line in script: #SBATCH --n-tasks-per-node=1
Error generated sbatch: unrecognized option '--n-tasks-per-node=1'
With an “unrecognized option” error, Slurm correctly read the first part of the #SBATCH line but the option that follows it has generated the error. In this example, the option has a dash between “n” and “tasks” that should not be there. The correct option does not have a dash in that location. This line should be corrected to:
#SBATCH --ntasks-per-node=1
To fix this error, locate the option specified in the error message and examine it carefully for errors. Reference Slurm Configuration Settings for the correct option names.
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
Location of error: 
#SBATCH --ntasks-per-node=<CPU count> 
Example of mistake: 
#SBATCH --ntasks-per-node=10000
This error is generated if your job requests more CPUs/cores than are available on the nodes in the partition your job submission script specified. CPU count is the number of cores requested by your job submission script. Cores are also called processors or CPUs.
To fix this error, use the sinfo command to get the maximum number of cores available in the partitions you have access to:
$ sinfo -o "%g %.10R %.20l %.10c"
GROUPS      PARTITION       TIMELIMIT       CPUS
b1234       buyin           2-00:00:00      20+
Then ensure that --ntasks-per-node does not exceed the limit (ignore the “+” in the sinfo output).
#SBATCH --ntasks-per-node=20
For details on the number of cores per node for different Quest compute node generations, see Quest Storage and Compute Resources.
sbatch: error: Batch script contains DOS line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n)
This error is caused by hidden characters in your job submission script.
The job submission script was likely created on a Windows computer and copied to Quest without converting the file to use UNIX encoded line endings.
Fix: From the command line on Quest run the command dos2unix <submission_script> to correct your job submission script and resubmit your job to the scheduler.
Debugging a Job Accepted by the Scheduler#
Once your job has been accepted, the Slurm scheduler will return a job ID number. After waiting in the queue, your job will run. To see the status of your job, use the command sacct -X.
Not all job submission errors generate error messages; if the output from your job is unexpected or incorrect, there may be an issue with the submission script. If your script’s required elements (account, partition, nodes, cores, and wall time) have been read successfully before Slurm encounters the error, your job will still be accepted by the scheduler and run, just not the way you expect it to. Scripts with issues that don’t generate errors still need to be debugged since the scheduler has ignored some of your #SBATCH lines.
For jobs with mistakes that do not give error messages, you will need to investigate if you notice something is wrong with how the job runs.
Common problems include:
Job runs very slowly or dies after starting
Possible issue: Job script omits or misspecified the #SBATCH --mem=<amount> or other memory directive.
Fix: All job submission scripts should specify the amount of memory your job needs to run. If your job runs very slowly or dies, investigate if it requests enough memory with the Slurm utility seff.
Job name is the name of the job submission script instead of the job name specified in the submission script
To see the name of your job, run sacct -X. If the JOB NAME listed in the output is the first eight characters of the name of your submission script, Slurm has not read the #SBATCH line for job name.
Possible issue: Job script omits or misspecified the #SBATCH --job-name=<jobname> directive.
Possible issue: A typo in the --job-name= or -J part of this #SBATCH line
Fix: Examine this line closely to make sure the syntax is correct
Possible issue: The error is on a line earlier in your job submission script which causes Slurm to stop reading your script before it reaches the #SBATCH --job-name=<jobname> line
Fix: Move the #SBATCH --job-name=<jobname> line to be immediately after the line #!/bin/bash and submit your job again. If this generates a new error referencing a different line of your script, the account line is correct and the mistake is elsewhere in your submission script. To resolve the new error, follow the debugging suggestions for the new error message.
Modules or environment variables are inherited from the login session by a running job
Possible issue: The job submission script is not purging modules before starting the job on the compute node
Fix: After the #SBATCH directives in your job submission script, add the line
$ module purge all
This will clear any modules inherited from your login session and begin your job in a clean environment. You will need to load any necessary modules into your job submission script after this line.
Job immediately fails and generates no output or error file
This happens when the job can’t write to the output and/or error files so the job immediately dies.
Possible issue: The job script specifies a directory that does not exist
Fix: Check the output and error files specified in the job submission script.
#SBATCH --output=/path/to/file/file_name
or
#SBATCH --error=/path/to/file/file_name
Tip
Remember that file paths are relative to your current working directory when you run sbatch to submit the job.  Use absolute file paths starting with a / when in doubt.
Possible issue: A typo in the --output= or --error part of the #SBATCH line
Fix: Examine these lines closely to make sure the syntax is correct
Possible issue: Providing a directory, but not a file name, for output and/or error files
Fix: Add a file name at the end of the specified path. For a file name in the format <job_name>.o<job_id>, use
#SBATCH --output=/path/to/file/"%x.o%j"
Note if a separate error file is not specified, errors and output will both be written into the output file. To generate a separate error file, include the line:
#SBATCH --error=/path/to/file/"%x.e%j"
Troubleshooting Failed Jobs#
There are two common reasons for job failure outside of errors in the code being executed:
Job Exceeded Request Time or Memory
Besides errors in your script or hardware failure, your job may be
aborted by the system if it is still running when the wall time limit you
requested (or the upper wall time limit for the partition) is reached.
You will see a TIMEOUT state for these jobs when running sacct -X.
If you use more cores than you requested, the system will stop the job. This can happen with programs that are multi-threaded. Similarly, if the job exceeds the requested memory, the job will be terminated. Due to this, it is important to profile your code for the memory requirement.
Note
If you do not set the number of nodes/cores, memory or time in your job submission script, the default values will be assigned by the scheduler.
Out of Disk Space
Your job could fail if you exceed your storage quota (limit) in your home or projects directory.
Check how much space you are using in your home directory with
$ homedu
or
$ du -h --max-depth=0 ~
Check how much space is used in your projects directory with
$ checkproject <account-name>
Troubleshooting Memory Requests#
How can I tell if my job needs more memory to run successfully?
Use the sacct -X command to see information about your recent jobs,for example:
$ sacct -X
JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1273539      lammps-te+      short     p1234          40  COMPLETED      0:0 
1273543      vasp-open+      short     p1234          40 OUT_OF_ME+    0:125
The “State” field is the status of your job when it finished. Jobs with a “COMPLETED” state have run without system errors. Jobs with an “OUT_OF_ME+” state have run out of memory and failed. “OUT_OF_ME+” jobs need to request more memory in their job submission scripts to complete successfully.
If the job you’re investigating is not recent enough to be listed by sacct -X, add date fields to the command to see jobs between specific start and end dates. For example, to see all jobs between September 15, 2019 and September 16, 2019:
$ sacct -X --starttime=091519 --endtime=091619
Specify the date using MMDDYY. More information on sacct is available in the Slurm documentation .
My job ran out of memory and failed, now what?
Often a challenge that researchers face is that the amount of memory required to run an initial job may be unknown. Running out of memory will cause the job to fail with an out of memory (OOM) error.
A good strategy is to run a test job to determine how much memory your job needs, and then request that amount of memory + 10% when submitting your full job. In the example below, you’ll see that we start by requesting all of the memory for a node. This may be much more memory than we need, but it gives us a starting point to ensure that the job completes. After the job completes, we’ll use specific applications to see how much memory was actually used by the job and adjust accordingly.
To do this:
- Create a test job by editing your job’s submission script to reserve all of the memory of the node it runs on 
- Run your test job 
- Confirm your test job has completed successfully 
- Use - seffto see how much memory your job actually used
- Submit your full job with new memory limits 
1. Create a test job
To profile your job’s memory usage, create a test job by modifying your job’s submission script to include the lines:
#SBATCH --mem=0
#SBATCH --nodes=1
Setting --mem=0 reserves all of the memory on the node for your job; if you already have a --mem= directive in your job submission script, comment it out. Now your job will not run out of memory unless your job needs more memory than is available on the node.
Setting --nodes=1 reserves a single node for your job. For jobs that run on multiple nodes such as MPI-based programs, request the number of nodes that your job utilizes. Be sure to specify a value for #SBATCH --nodes= or the cores your job submission script reserves could end up on as many nodes as cores requested. Be aware that by setting --mem=0, you will be reserving all the memory on each of those nodes that your cores are reserved on.
Run your test job
Submit your test job to the cluster with the sbatch command. For interactive jobs, use srun or salloc.
Did your test job complete successfully?
When your job has stopped running, use the sacct -X command to confirm your job finished with state “COMPLETED”. If your test job finishes with an “OUT_OF_ME+” state, confirm that you are submitting the modified job submission script that requests all of the memory on the node. If the “OUT_OF_ME+” errors persist, your job may require more memory than is available on the compute node it ran on. In this case, please email quest-help@northwestern.edu  for assistance.
How much memory did your job actually use?
To see how much memory it used run the command: seff <test_job_id_number>. This returns output similar to:
Job ID: 767731
    Cluster: quest
    User/Group: abc123/abc123
    State: COMPLETED (exit code 0)
    Cores: 1
    CPU Utilized: 00:10:00
    CPU Efficiency: 100.00% of 00:10:00 core-walltime
    Job Wall-clock time: 00:10:00
    Memory Utilized: 60.00 GB
    Memory Efficiency: 50.00% of 120.00 GB
Check the job State reported in the 4th line. If it is “COMPLETED (exit code 0)”, look at the last two lines. “Memory Utilized” is the amount of memory your job used, in this case 60Gb.
If the job State is FAILED or CANCELLED, the Memory Efficiency percentage reported by seff will be extremely inaccurate. The seff command only works on jobs that have COMPLETED successfully.
How much memory should I reserve in my job script?
This question builds off of the process that we outlined in the My job ran out of memory and failed, now what section above. Once you have an idea of how much memory your job needs to run successfully, you can adjust your memory requirements to find the best balance between resource needs and wait time.
It’s a good idea to reserve slightly more memory than your job utilized since the same job may require slightly different amounts of memory depending on variations in data it processes in each run of the job. To correctly reserve memory for this job, edit your test job submission script to modify the #SBATCH --mem= directive to reserve 10% more than 60Gb in the job submission script:
#SBATCH --mem=66G
For jobs that use MPI, remove the #SBATCH --mem= directive from your job submission script. Now specify the amount of memory you’d like to reserve per core instead. For example, if your job uses 100Gb of memory total and runs on 10 cores, reserve 10Gb plus a safety factor per cpu:
#SBATCH --mem-per-cpu=11G
If it doesn’t matter how many nodes your cores are distributed on, you may remove the #SBATCH --nodes= directive as well.
Be careful not to reserve significant amounts of memory beyond what your job requires as your job’s wait time will increase. Reserving excessive memory also wastes shared resources that could be used by other researchers.
