A job script is the same format as the shell script. A job script consists of an option area describing Slurm job submission options and a user program area describing the program to be executed. Please refer to here for environment variables that are automatically set when a job is executed.
An example of a job script is shown below, so you can see the script to keep in perspective. More detailed information for each is given in the next section.
#!/bin/bash
#============ Slurm Options ===========
#SBATCH -p gr19999b # Specify the job queue (partition). It must be changed to the queue name which you want to submit.
#SBATCH -t 1:0:0 # Specify the elapsed time (example of specifying 1 hour)
#SBATCH --rsc p=4:t=8:c=8:m=8G # Specify requested resources
#SBATCH -o %x.%j.out # Specify the standard output file for the job
## %x is replaced by the job name and %j is replaced by the job ID.
#============ Shell Script ============
# (Optional) Specify set -x to keep track of the execution progress of the job script.
set -x
# Environment variables such as the number of MPI processes and OMP_NUM_THREADS are automatically set based on the value of the --rsc option.
# If necessary, it is possible to specify overwrite within the range of resources allocated by the srun command argument or environment variables.
srun ./a.out
## Notes on job scripts ##
# Lines beginning with '#' or after '#' in a line are treated as comments. Only lines beginning with #SBATCH are exceptionally recognised as slurm option specifications.
# The current directory at the job execution is automatically moved to the current directory at the job submission.
# Environment variables set at the job submission are inherited during job execution.
A sample job script is available for your reference.
Execution Type | Sample File |
---|---|
Non-parallelism | Download |
Thread parallelism | Download |
Process parallelism(Intel MPI) | Download |
Hybrid parallelism | Download |
Specify in the Slurm Options part of the job script followed by "#SBATCH".
Option | Meaning | Example |
---|---|---|
-p QUEUE | Specify the queue (required item) | -p gr19999b |
-t HOUR:MINUTES:SECONDS | Set the upper limit of execution time | -t 24:0:0 |
--rsc p=PROCS:t=THREADS:c=CORES:m=MEMORY or --rsc g=GPU |
Specify the resources. For more details click here | --rsc p=4:t=8:c=8:m=8G or --rsc g=1 |
-o FILENAME | Specify the destination to save for standard output. Refer to Official Manual for the available special characters. | -o result.out |
-e FILENAME | Specify the designation to save for standard error output. Refer to Official Manual for the available special characters. | -e result.err |
-J JOBNAME | Specify the job name. | -J ReplaceJobName |
--comment=Comment | Specify the Comments. | --comment=ThisIsComment |
-a ARRAY_SPEC | Specify the array job. For more details click here | -a 1-5 |
-d TYPE:JOBID | Specify the order of job execution. For more details click here | -d afterok:999999 |
--no-requeue | Declare that batch request is not re-runnable | --no-requeue |
--mail-user=MAILADDR | Specify the e-mail address | --mail-user=bar@sample.com |
--mail-type=TYPE | Specify the event notification Specify BEGIN, END, FAIL, REQUEUE, and ALL if necessary |
--mail-type=BEGIN,END |
Please refer to the Official Manual for other options and details on options. Also, please check Options not available when submitting jobs if necessary.
In order to execute a program on a computing node, you must use the srun command at the point where you execute the program of the job script whether it is a sequential program or an MPI program.
The following is a typical list of options for the srun command. Please refer to the Official Manual for other options and details on options.
Option | Function |
---|---|
-n PROCS | Specify the number of processes to be started. If not specified, the value of p in the --rsc option is used. |
-c CORES | Specify the number of CPU cores to be secured per process. If not specified, the value of c in the --rsc option is used. |
--ntasks-per-node=PROCS_PER_NODE | Specify the number of processes per node. If not specified, they will be scheduled to execute on a small number of nodes. |
To check the queue of jobs available for submission, use the spartition command.
Displays the queue name, Rmin/Rstd/Rmax, and standard and maximum elapsed time values for jobs that can be submitted. For more information on Rmin/Rstd/Rmax
$ spartition
Partition State Rmin Rstd Rmax DefTime MaxTime
gr19999g UP 0 64 64 01:00:00 1-00:00:00
To submit a job to the queue, use the sbatch command.
$ sbatch sample.sh
Submitted batch job 20
$ sbatch sample.sh
sbatch: cli_filter/accms_resource_req: convert_rsc_option: Updated the number of cores from c=10 to c=40 based on memory size request
Submitted batch job 21
The message above is that the number of cores has been changed based on the requested memory size. This is not an error and the job will be submitted.
To display information on submitted jobs, use the squeue or sacct command.
Displays information about jobs currently registered in the queue.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 gr19999b interact b59999 R 0:33 1 no0001
If a job is waiting for execution, the reason is indicated in NODELIST(REASON). Typical examples are shown below. For other reasons, please refer to Official Manual for more information.
REASON | Meaning |
---|---|
Resources | There are no available resources at this time. |
QOSJobLimit | The maximum number of concurrent executions has been reached. |
Displays information about jobs in the accounting database. Information about past jobs can also be displayed. Please refrain from repeating the command mechanically because it overloads the system.
$ sacct -X
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1 test.sh gr19999b b59999 8 COMPLETED 0:0
2 test.sh gr19999b b59999 8 COMPLETED 0:0
Option | Meaning | Examples |
---|---|---|
-j | Display statistics for the specified job. | sacct -j 1234556 |
-X | Display only statistics related to the job assignment itself, without considering the job steps. | sacct -X |
-l | Display all information about the job. | sacct -l / sacct -Xl |
-S | Displays jobs after the specified time. | sacct -S 2022-10-01 |
For other options, please refer to the Official Manual.
To cancel a submitted job, use the scancel command.
$ scancel 20
You can cancel all submitted jobs with an instruction below.
## How to delete by specifying a user name.
$ scancel -u b59999
## How to delete by specifying a queue name.
$ scancel -p gr19999b
## How to delete by specifying a running status.
$ scancel -t pending