--- title: 'Batch Processing' --- [toc] ## Flow of Job Execution{#execute_flow} 1. [Creation of Job script](#jobscript) 2. [Job Submission](#submit) 3. (If necessary) [Confirmation of the job status](#squeue) 4. (If necessary) [Job Cancellation](#jobcancel) ## Job script{#jobscript} A job script is the same format as the shell script. A job script consists of an option area describing Slurm job submission options and a user program area describing the program to be executed. Please refer to [here](/run/tips#env_val) for environment variables that are automatically set when a job is executed. ### Example of hybrid parallelism with 4 processes x 8 threads An example of a job script is shown below, so you can see the script to keep in perspective. More detailed information for each is given in the next section. ```nohighlight #!/bin/bash #============ Slurm Options =========== #SBATCH -p gr19999b # Specify the job queue (partition). It must be changed to the queue name which you want to submit. #SBATCH -t 1:0:0 # Specify the elapsed time (example of specifying 1 hour) #SBATCH --rsc p=4:t=8:c=8:m=8G # Specify requested resources #SBATCH -o %x.%j.out # Specify the standard output file for the job ## %x is replaced by the job name and %j is replaced by the job ID. #============ Shell Script ============ # (Optional) Specify set -x to keep track of the execution progress of the job script. set -x # Environment variables such as the number of MPI processes and OMP_NUM_THREADS are automatically set based on the value of the --rsc option. # If necessary, it is possible to specify overwrite within the range of resources allocated by the srun command argument or environment variables. srun ./a.out ## Notes on job scripts ## # Lines beginning with '#' or after '#' in a line are treated as comments. Only lines beginning with #SBATCH are exceptionally recognised as slurm option specifications. # The current directory at the job execution is automatically moved to the current directory at the job submission. # Environment variables set at the job submission are inherited during job execution. ``` ### Sample script A sample job script is available for your reference. | Execution Type | Sample File| | ------------------------- | --------------- | | Non-parallelism | [Download](./sample_normal.txt) | Thread parallelism | [Download](./sample_thread.txt) | Process parallelism(Intel MPI) | [Download](./sample_process.txt) | Hybrid parallelism | [Download](./sample_hybrid.txt) ### Slurm Option Specify in the Slurm Options part of the job script followed by "#SBATCH". #### Major options | Option | Meaning | Example | | --- | --- | --- | | -p _QUEUE_ | Specify the queue **(required item)** | -p gr19999b | | -t _HOUR:MINUTES:SECONDS_ | Set the upper limit of execution time | -t 24:0:0 | | --rsc p=_PROCS_:t=_THREADS_:c=_CORES_:m=_MEMORY_ <br> or <br> --rsc g=_GPU_ | Specify the resources. For more details [click here](/run/resource#resource) | --rsc p=4:t=8:c=8:m=8G <br> or <br> --rsc g=1| | -o _FILENAME_ | Specify the destination to save for standard output. Refer to [Official Manual](https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E) for the available special characters. | -o result.out | | -e _FILENAME_ | Specify the designation to save for standard error output. Refer to [Official Manual](https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E) for the available special characters. | -e result.err | | -J _JOBNAME_ | Specify the job name. | -J ReplaceJobName | | --comment=_Comment_ | Specify the Comments. | --comment=ThisIsComment | | -a _ARRAY\_SPEC_ | Specify the array job. For more details [click here](/run/tips/#arrayjob) | -a 1-5 | | -d _TYPE:JOBID_ | Specify the order of job execution. For more details [click here](/run/tips/#dependency) | -d afterok:999999 | | --no-requeue | Declare that batch request is not re-runnable | --no-requeue | | --mail-user=_MAILADDR_ | Specify the e-mail address | --mail-user=bar@sample.com | | --mail-type=_TYPE_ | Specify the event notification<br>Specify BEGIN, END, FAIL, REQUEUE, and ALL if necessary | --mail-type=BEGIN,END | Please refer to the [Official Manual](https://slurm.schedmd.com/sbatch.html) for other options and details on options. Also, please check [Options not available when submitting jobs](/run/tips#sbatch_ignore) if necessary. ### How to execute the program (important){#srun} In order to execute a program on a computing node, you **must** use the _**srun**_ command at the point where you execute the program of the job script whether it is a sequential program or an MPI program. The following is a typical list of options for the srun command. Please refer to the [Official Manual](https://slurm.schedmd.com/srun.html#SECTION_OPTIONS) for other options and details on options. #### **Basic Options** {#basic_options} Option |Function :--------------------:|----------------------- -n _PROCS_ | Specify the number of processes to be started. If not specified, the value of p in the --rsc option is used. -c _CORES_ | Specify the number of CPU cores to be secured per process. If not specified, the value of c in the --rsc option is used. --ntasks-per-node=_PROCS\_PER\_NODE_ | Specify the number of processes per node. If not specified, they will be scheduled to execute on a small number of nodes. ## Confirmation of available queues{#queue} To check the queue of jobs available for submission, use the spartition command. ### spartition command {#spartition} Displays the queue name, Rmin/Rstd/Rmax, and standard and maximum elapsed time values for jobs that can be submitted. For more information on [Rmin/Rstd/Rmax](/run/resource#resources) ```nohighlight $ spartition Partition State Rmin Rstd Rmax DefTime MaxTime gr19999g UP 0 64 64 01:00:00 1-00:00:00 ``` ## Job Submission{#submit} To submit a job to the queue, use the sbatch command. ### sbatch Command{#sbatch} ```nohighlight $ sbatch sample.sh Submitted batch job 20 ``` * Enter the job script file you created followed by the end of the command. * The execution of the job script is requested to the system and the job ID is displayed. * If option is specified when a job is submitted, the option in the job script will be overwritten. #### Message when a job is submitted{#message} ```nohighlight $ sbatch sample.sh sbatch: cli_filter/accms_resource_req: convert_rsc_option: Updated the number of cores from c=10 to c=40 based on memory size request Submitted batch job 21 ``` The message above is that the number of cores has been changed based on the requested memory size. This is not an error and the job will be submitted. ## Confirmation of the job status{#squeue} To display information on submitted jobs, use the squeue or sacct command. ### squeue Command Displays information about jobs currently registered in the queue. ```nohighlight $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1 gr19999b interact b59999 R 0:33 1 no0001 ``` * Information is displayed only during job execution. * Please refer to the [Official Manual](https://slurm.schedmd.com/squeue.html#SECTION_OPTIONS) for options. #### Notes on job execution(REASON) If a job is waiting for execution, the reason is indicated in NODELIST(REASON). Typical examples are shown below. For other reasons, please refer to [Official Manual](https://slurm.schedmd.com/squeue.html#SECTION_JOB-REASON-CODES) for more information. | REASON | Meaning | | ------ | -----| | Resources | There are no available resources at this time. | | QOSJobLimit | The maximum number of concurrent executions has been reached.| ### sacct Command Displays information about jobs in the accounting database. Information about past jobs can also be displayed. Please refrain from repeating the command mechanically because it overloads the system. ```nohighlight $ sacct -X JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 1 test.sh gr19999b b59999 8 COMPLETED 0:0 2 test.sh gr19999b b59999 8 COMPLETED 0:0 ``` * By default, a list of jobs executed on the day the command was issued is displayed. #### Main options | Option | Meaning | Examples | | ---------- | -----|--------| | -j | Display statistics for the specified job.| sacct -j 1234556 | | -X | Display only statistics related to the job assignment itself, without considering the job steps.| sacct -X | | -l | Display all information about the job.| sacct -l / sacct -Xl | | -S | Displays jobs after the specified time.| sacct -S 2022-10-01 | For other options, please refer to the [Official Manual](https://slurm.schedmd.com/sacct.html#SECTION_OPTIONS). ## Job Cancellation{#jobcancel} To cancel a submitted job, use the scancel command. ### scancel Command{#scancel} ```nohighlight $ scancel 20 ``` * Specify the job ID as an argument. * Please refer to the [Official Manual](https://slurm.schedmd.com/scancel.html#SECTION_OPTIONS) for options. ### How to cancel all submitted jobs You can cancel all submitted jobs with an instruction below. ```nohighlight ## How to delete by specifying a user name. $ scancel -u b59999 ## How to delete by specifying a queue name. $ scancel -p gr19999b ## How to delete by specifying a running status. $ scancel -t pending ```