A job script is the same format as the shell script. A job script consists of an option area describing Slurm job submission options and a user program area describing the program to be executed. Please refer to here for environment variables that are automatically set when a job is executed.
An example of a job script is shown below, so you can see the script to keep in perspective. More detailed information for each is given in the next section.
#!/bin/bash
#============ Slurm Options ===========
#SBATCH -p gr19999b # Specify the job queue (partition). It must be changed to the queue name which you want to submit.
#SBATCH -t 1:0:0 # Specify the elapsed time (example of specifying 1 hour)
#SBATCH --rsc p=4:t=8:c=8:m=8G # Specify requested resources
#SBATCH -o %x.%j.out # Specify the standard output file for the job
## %x is replaced by the job name and %j is replaced by the job ID.
#============ Shell Script ============
# (Optional) Specify set -x to keep track of the execution progress of the job script.
set -x
# Environment variables such as the number of MPI processes and OMP_NUM_THREADS are automatically set based on the value of the --rsc option.
# If necessary, it is possible to specify overwrite within the range of resources allocated by the srun command argument or environment variables.
srun ./a.out
## Notes on job scripts ##
# Lines beginning with '#' or after '#' in a line are treated as comments. Only lines beginning with #SBATCH are exceptionally recognised as slurm option specifications.
# The current directory at the job execution is automatically moved to the current directory at the job submission.
# Environment variables set at the job submission are inherited during job execution.
A sample job script is available for your reference.
Execution Type | Sample File |
---|---|
Non-parallelism | Download |
Thread parallelism | Download |
Process parallelism(Intel MPI) | Download |
Hybrid parallelism | Download |
Specify in the Slurm Options part of the job script followed by "#SBATCH".
Option | Meaning | Example |
---|---|---|
-p QUEUE | Specify the queue (required item) | -p gr19999b |
-t HOUR:MINUTES:SECONDS | Set the upper limit of execution time | -t 24:0:0 |
--rsc p=PROCS:t=THREADS:c=CORES:m=MEMORY or --rsc g=GPU |
Specify the resources. For more details click here | --rsc p=4:t=8:c=8:m=8G or --rsc g=1 |
-o FILENAME | Specify the destination to save for standard output. Refer to Official Manual for the available special characters. | -o result.out |
-e FILENAME | Specify the designation to save for standard error output. Refer to Official Manual for the available special characters. | -e result.err |
-J JOBNAME | Specify the job name. | -J ReplaceJobName |
--comment=Comment | Specify the Comments. | --comment=ThisIsComment |
-a ARRAY_SPEC | Specify the array job. For more details click here | -a 1-5 |
-d TYPE:JOBID | Specify the order of job execution. For more details click here | -d afterok:999999 |
--no-requeue | Declare that batch request is not re-runnable | --no-requeue |
--mail-user=MAILADDR | Specify the e-mail address | --mail-user=bar@sample.com |
--mail-type=TYPE | Specify the event notification Specify BEGIN, END, FAIL, REQUEUE, and ALL if necessary |
--mail-type=BEGIN,END |
Please refer to the Official Manual for other options and details on options. Also, please check Options not available when submitting jobs if necessary.
In order to execute a program on a computing node, you must use the srun command at the point where you execute the program of the job script whether it is a sequential program or an MPI program.
The following is a typical list of options for the srun command. Please refer to the Official Manual for other options and details on options.
Option | Function |
---|---|
-n PROCS | Specify the number of processes to be started. If not specified, the value of p in the --rsc option is used. |
-c CORES | Specify the number of CPU cores to be secured per process. If not specified, the value of c in the --rsc option is used. |
--ntasks-per-node=PROCS_PER_NODE | Specify the number of processes per node. If not specified, they will be scheduled to execute on a small number of nodes. |
To check the queue of jobs available for submission, use the spartition command.
Displays the queue name, Rmin/Rstd/Rmax, and standard and maximum elapsed time values for jobs that can be submitted. For more information on Rmin/Rstd/Rmax
$ spartition
Partition State Rmin Rstd Rmax DefTime MaxTime
gr19999g UP 0 64 64 01:00:00 1-00:00:00
To submit a job to the queue, use the sbatch command.
$ sbatch sample.sh
Submitted batch job 20
$ sbatch sample.sh
sbatch: cli_filter/accms_resource_req: convert_rsc_option: Updated the number of cores from c=10 to c=40 based on memory size request
Submitted batch job 21
The message above is that the number of cores has been changed based on the requested memory size. This is not an error and the job will be submitted.
To display information on submitted jobs, use the squeue or sacct command.
Displays information about jobs currently registered in the queue.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 gr19999b interact b59999 R 0:33 1 no0001
If a job is waiting for execution, the reason is indicated in NODELIST(REASON). Typical examples are shown below. For other reasons, please refer to Official Manual for more information.
REASON | Meaning |
---|---|
Resources | There are no available resources at this time. |
QOSJobLimit | The maximum number of concurrent executions has been reached. |
Displays information about jobs in the accounting database. Information about past jobs can also be displayed. Please refrain from repeating the command mechanically because it overloads the system.
$ sacct -X
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1 test.sh gr19999b b59999 8 COMPLETED 0:0
2 test.sh gr19999b b59999 8 COMPLETED 0:0
Option | Meaning | Examples |
---|---|---|
-j | Display statistics for the specified job. | sacct -j 1234556 |
-X | Display only statistics related to the job assignment itself, without considering the job steps. | sacct -X |
-l | Display all information about the job. | sacct -l / sacct -Xl |
-S | Displays jobs after the specified time. | sacct -S 2022-10-01 |
For other options, please refer to the Official Manual.
Displays job information.
$ qs
QUEUE USER JOBID STATUS PROC CORE MEM ELAPSE( limit)
gr19999b b59999 1 RUN 4 1 4570M 00:00:07( 01:00:00)
Header | Summary |
---|---|
QUEUE | queue name |
USER | user name |
JOBID | job ID |
STATUS | Job Status |
PROC | Number of Process |
CORE | Number of cores per process |
MEM | Amount of memory per process |
ELAPSE | Job elapsed time |
limit | Elapsed time limit of the job |
To check the queue status of a group, use the qgroup command.
Displays information on the entire group queue and statistics for each user.
$ qgroup
QUEUE SYS | RUN PEND OTHER | ALLOC ( MIN/ STD/ MAX)
----------------------------------------------------------------
gr19999b B | 0 0 0 | 0 ( 0/224/224)
QUEUE USER | RUN(ALLOC) PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
gr19999b b59999 | 1( 4) 0( 0) 0( 0)
If you specify the -l option, information per job is displayed In addition to the overall queue information of the group and statistics per user, information per job is displayed.
$ qgroup -l
QUEUE SYS | RUN PEND OTHER | ALLOC ( MIN/ STD/ MAX)
----------------------------------------------------------------
gr19999b B | 0 0 0 | 0 ( 0/224/224)
QUEUE USER | RUN(ALLOC) PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
gr19999b b59999 | 1( 4) 0( 0) 0( 0)
QUEUE USER JOBID | STAT SUBMIT_AT | RSC:core | PROC CORE MEM ELAPSE
------------------------------------------------------------------------------------------------
gr19999b b59999 1 | RUN 2024-06-06 16:51 | 4 | 4 1 4570M 01:00:00
Header | Summary |
---|---|
QUEUE | queue name |
SYS | system name |
RUN,PEND,OTHER | Number of jobs |
ALLOC | Number of cores allocated |
MIN | Minimum guaranteed number of cores for queue |
STD | Standard number of cores for queue |
MAX | Maximum number of cores in queue |
Header | Summary |
---|---|
RUN(ALLOC) | Number of running jobs and allocated resources (in terms of number of cores) |
PEND(REQUEST) | Number of jobs waiting for execution and required resources (in terms of number of cores) |
OTHER(REQUEST) | Number of jobs and required resources (in terms of number of cores) other than the above |
Header | Summary |
---|---|
STAT | Job Status |
SUBMIT_AT | Job submission date and time |
RSC:core | Amount of resources (in terms of number of cores) |
PROC | Number of processes |
CORE | Number of cores per process |
MEM | Amount of memory per process |
ELAPSE | Elapsed time limit for the job |
To cancel a submitted job, use the scancel command.
$ scancel 20
You can cancel all submitted jobs with an instruction below.
## How to delete by specifying a user name.
$ scancel -u b59999
## How to delete by specifying a queue name.
$ scancel -p gr19999b
## How to delete by specifying a running status.
$ scancel -t pending