Slurm Basic Commands

The Slurm Workload Manager, or more simply Slurm, is what Resource Computing uses for scheduling jobs on our cluster SPORC and the Ocho. Slurm makes allocating resources and keeping tabs on the progress of your jobs easy. This documentation will cover some of the basic commands you will need to know to start running your jobs.

To run jobs you need to connect to sporcsubmit.rc.rit.edu using either SSH of FastX.

Commands Overview

**All commands have a --help option available which will describe how to use the commands more in-depth and all the options available for the command.

sinfo

Reports the state of the partitions and nodes managed by Slurm.

[abc1234@sporcsubmit ~]$ sinfo

PARTITION      AVAIL   TIMELIMIT  NODES  STATE  NODELIST

tier 1          up    10-00:00:0      1  down*  skl-a-08

tier 1          up    10-00:00:0      1    mix  skl-a-60

tier 1          up    10-00:00:0     12  alloc  skl-a-[01-04,07,09-15]

tier 1          up    10-00:00:0     20   idle  skl-a-[05-06,16-32,61]

tier 2          up    10-00:00:0      1  down*  skl-a-08

...

onboard         up    10-00:00:0     27   idle  skl-a-[33-59]

interactive     up    2-00:00:00      1    mix  theocho

PARTITION: the name of the partition
AVAIL: whether the partition is up or down
TIMELIMIT: the maximum length a job will will run in the format Days-Hours:Minutes:Seconds
NODES: the number of nodes of that configuration
STATE:down* if jobs cannot be ran, idle if it is are available for jobs, alloc if all the CPUs in the partition are allocated to jobs, or mix if some CPUs on the nodes are allocated and others are idle.
NODELIST: specific nodes associated with that partition.

sbatch

Submits a script to Slurm so a job can scheduled. A job will wait in pending until the allocated resources for the job are available.

[abc1234@sporcsubmit ~]$ sbatch myscript.sh

Submitted batch job 2914

If no filename is specified, then sbatch will read from the command line
The number after job is the job_id
In your script file you can specify options with #SBATCH[option]. For example:
- #SBATCH -J job_name will specify the name of the job
- #SBATCH -t Days-Hours:Minutes:Seconds will set the time limit. Other acceptable time formats include: Minutes, Minutes:Seconds, Hours:Minutes:Seconds, Days-Hours, and Days-Hours:Minutes.
- #SBATCH -p paritition -c cpuspertask# will specify which partition to run the job on as well as the number of processors to use for each task.
- #SBATCH -A accountName changes the account the job is ran under
- There are many more options see the official sbatch Slurm documentation.

Example Bash Script:

More examples can be found by running grab-examples on sporcsubmit.rc.rit.edu.

#!/bin/bash -l

#NOTE the -l flag!

#Name of the job

#SBATCH -J my_job

#Standard out and Standard Error output files *if not specified

#both will go to slurm-job_id.out

#SBATCH -o job_out.out

#SBATCH -e job_errors.err

#To send mail for updates on the job

#SBATCH --mail-user abc1234@rit.edu

#notify state changes: BEGIN, END, FAIL, or ALL

#SBATCH --mail-type=ALL

#Request the run time MAX, anything over will be KILLED, in this case it will be 2 hours

#SBATCH -t 2:0:0

#Put the job in the tier3 partition and request 3 processors per task

#SBATCH -p tier3 -c 3

#Job memory requirements in MB

#SBATCH --mem=220000

#Job script goes below this line

my script....

squeue

Lists the state of all jobs being run or scheduled to run.

[abc1234@sporcsubmit ~]$ squeue

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

2714_1     tier3     myjob abc1234 PD       0:00       1 (JobHeldAdmin)

2714_2     tier3     myjob abc1234 PD       0:00       1 (JobHeldAdmin)

...

   384     tier1   new_job def5678  R 2-09:14:40       1 skl-a-18

  1492 interacti  _interac aaa0000  R    1:24:23       1 theocho

JOBID: number id associated with the job
PARTITION: name of partition running the job
NAME: name of the job ran with sbatch or sinteractive
USER: who ordered the job to be ran
ST: State of the job, PD for pending, R for running
TIME: how long the job has been running in the format Days-Hours:Minutes:Seconds
NODES: number of nodes allocated to the job
NODELIST(REASON): either the name of the node running the job of the reason the job is not running such as JobHeldAdmin (job is prevented from running by the administrator). Other reasons and their explanations can be found in the official Slurm documentation for squeue
Use squeue -u username to view only the jobs from a specific user

scancel

Signals or cancels a job. One or more jobs separated by spaces may be specified.

[abc1234@sporcsubmit ~]$ scancel job_id[_array_id]

sacct

Lists the jobs that are running or have been run.

[abc1234@sporcsubmit ~]$ sacct

      JobID      JobName    Partition      Account    AllocCPUS        State    ExitCode

-----------    ---------    ---------   ----------    ---------    ---------    --------

2912           job_tests        tier3   job_tester            2    COMPLETED         0:0

2912.batch         batch                job_tester            2    COMPLETED         0:0

2912.extern       extern                job_tester            2    COMPLETED         0:0

2913               jobs2        tier3   job_tester            1       FAILED         1:0

2913.batch         batch                job_tester            1       FAILED         1:0

2913.extern       extern                job_tester            1    COMPLETED         0:0

sacct -j <jobName> will display only the one or more jobs listed
sacct -A <accountName> will display only the jobs ran by the one or more comma separated accounts
Failed jobs will have an exit code other than 0. 1 is used for general failures. Some exit codes have special meanings which can be looked up online

my-accounts

Although not apart of Slurm my-accounts allows you to see all the accounts associated with your username which is helpful when you want to charge resource allocation to certain accounts.

[abc1234@sporcsubmit ~]$ my-accounts

  Account Name      Expired  Allowed Partitions

- ------------      -------  ------------------

* my_acct           false    tier3,debug,interactive

If there are any further questions, or there is an issue with the documentation, please contact rc-help@rit.edu for additional assistance.