GitList -GitList

Raw Blame History

--- title: 'Tips for Job Execution' taxonomy: category: - docs external_links: process: true title: false no_follow: true target: _blank mode: active --- [toc] ## Environment Variables{#env_val} ### Setting and Reference of the Environment Variables Use export command to set environment variables and add "$" at the beginning of variable names to refer to the environment variables. * Setting of the Environment Variables ```nohighlight #Format　Environment Variable Name=Value; export Environment Variable Name LANG=en_US.UTF-8; export LANG ``` * Referene of the Environment Variables ```nohighlight echo $LANG ``` ### Environment variables set when executing jobs |Environment Variable Name|Meaning| | --- |--- | |$SLURM_CPUS_ON_NODE|the number of Cores/Node| |$SLURM_DPC_CPUS|the number of Physical cores per task| |$SLURM_CPUS_PER_TASK|the number of Logical cores per task (twice the value of the physical core)| |$SLURM_JOB_ID|Job ID (Use $SLURM_ARRAY_JOB_ID for array jobs)| |$SLURM_ARRAY_JOB_ID|Parent job ID when executing array job| |$SLURM_ARRAY_TASK_ID|Task ID when executing array job| |$SLURM_JOB_NAME|Job name| |$SLURM_JOB_NODELIST|Name of nodes allocated to the job| |$SLURM_JOB_NUM_NODES|Number of nodes allocated to the job| |$SLURM_LOCALID|Index of executing node in node| |$SLURM_NODEID|Index relative to the node allocated to the job| |$SLURM_NTASKS|Number of processes for the job| |$SLURM_PROCID|Index of tasks for the job| |$SLURM_SUBMIT_DIR|Submit directory| |$SLURM_SUBMIT_HOST|Source host| ## Options not available when submitting jobs{#sbatch_ignore} Please note that the following options cannot be specified following #SBATCH in the job script. |Option | | | | | | --- | --- | --- | --- | --- | | --batch | --clusters(-M) | --constraint(-C) | --contiguous | --core-spec(-S) | | --cores-per-socket | --cpus-per-gpu | --cpus-per-task(-c) | --distribution(-m) | --exclude(-x) | | --exclusive | --gpu-bind | --gpus(-G) | --gpus-per-node | --gpus-per-socket | | --gres | --gres-flags | --mem | --mem-bind | --mem-per-cpu | | --mem-per-gpu | --mincpus | --nodefile(-F) | --nodelist(-w) | --nodes(-N) | | --ntasks(-n) | --ntasks-per-core | --ntasks-per-gpu | --ntasks-per-node | --ntasks-per-socket | | --overcommit(-O) | --oversubscribe(-s) | --qos(-q) | --sockets-per-node | --spread-job | | --switches | --thread-spec | --threads-per-core | --use-min-nodes | | | --get-user-env | --gid | --priority | --reboot | --uid | ## /tmp Area The /tmp area can be used as a temporary data write destination on our supercomputer system. In some cases, programs with detail file I/O can be processed faster by using /tmp. /tmp is a private area for each job so that files are not mixed with those of other jobs. Please take advantage of this feature to avoid mixing files with those of other jobs. Please note that the /tmp area is automatically deleted at the end of the job. To keep the files written to /tmp, you must include a description to copy the files to /home or /LARGE0,/LARGE1 in the job script . Please note that the deleted files cannot be retrieved later. ### Specific Examples of using /tmp * Specify /tmp as the write destination of the file if programmatically possible to specify. * Place files to be read repeatedly in /tmp before program execution starts. * Place programs and input files that access files by relative PATH in /tmp and execute. ### Available /tmp capacity /tmp capacity available for each system is calculated from `Number of processes x number of cores per process x capacity per core per system (see the following table)` |System Name | Capacity per core | |---------- | -----------------| |System A | 2.4G | |System B | 8.9G | |System C | 8.9G | |System G | 15.6G | |Cloud | 94G |  For example, if a job with 4 processes (8 cores per process) is submitted on System B, from `4 x 8 x 8.9`, **284.8GB** will be allocated. ## Specify the order of the job execution{#dependency} It is possible to control the timing of the job execution start to be submitted according to the execution status of jobs that have already been submitted. Specifically, it is available by adding the following information to the job submission options column. ```nohighlight #SBATCH -d <Commands specifying the execution order>:<Job ID already submitted> ``` For example, if job script A has been successfully completed execution and you wish to continue executing job script B, you can use the following procedure. 1. Submit job script A to the job. 2. If the job ID of the job submitted in 1. is "200", job script B will be as follows. * ```nohighlight #!/bin/bash #============ SBATCH Directives ======= #SBATCH -p gr19999b #SBATCH -t 2:0:0 #SBATCH --rsc p=4:t=8:c=8:m=8G #SBATCH -d afterok:200 #SBATCH -o %x.%j.out #============ Shell Script ============ srun ./a.out ``` 3. Submit job script B to the job. You can select the specified command of the execution order from the following four options. |Specified commands of the execution order| Meaning| |--- | --- | |after | Execute the job after the start of the specified job.| |afterany|Execute the job after the specified job ends.| |afterok|Execute the job when the specified job ends normally. If the specified job ends abnormally, it will ends without being executed. | |afternotok|Execute the job when the specified job ends abnormally. If the specified job ends normally, it will ends without being executed .| ## Execution of array jobs{#arrayjob} You can execute multiple jobs with different parameters in a single job script with the function of the array job. Specifically, it is available by adding the following information to the job submission options column. ```nohighlight #SBATCH -a <start_num>-<end_num>[option] ``` For example, you can execute a job to transfer 1.data, 2.data, and 3.data placed in the same directory as the job script to ./a.out and analyze by writing a following job script. ```nohighlight #!/bin/bash #============ SBATCH Directives ======= #SBATCH -p gr19999b #SBATCH -t 2:0:0 #SBATCH --rsc p=4:t=8:c=8:m=8G #SBATCH -a 1-3 #SBATCH -o %x.%A_%a.out ## %x is replaced by the job name, %A by the array job ID, and %a by the array task ID. #============ Shell Script ============ srun ./a.out ${SLURM_ARRAY_TASK_ID}.data ``` The following options can be set in the [option] field. |Description in [option] |Execution details| |--- | --- | |:[0-9] | Execute every specified number. For example, if you specify "1-5:2", 1,3,5 will be executed.| |%[0-9] | The specified number is set as the maximum number of the simultaneous executions.For example, if you specify "%2", the maximum number of simultaneous executions will be "2". | ## How to execute multiple programs simultaneously on a single computing node{#multirun} You can use computing resources effectively by executing multiple programs simultaneously as a single job. You can execute multiple programs simultaneously as a single job by describing multiple execution commands in a shell script and executing the shell script. ※This method is for a sequential program or a program that is thread-parallelized by OpenMP or automatic parallelization functions. ※To execute multiple MPI programs simultaneously as a single job, please refer to [How to execute multiple programs simultaneously (MPMD)](#mpmd). An example of each script is shown below. #### Sequential Execution (1 thread per program) ** Job scripts to be executed by sbatch command (sequential) ** ※The following script is a sample of executing four programs simultaneously. ```nohighlight #!/bin/bash #============ SBATCH Directives ======= #SBATCH -p gr19999b #SBATCH --rsc p=4:t=1:c=1:m=3G #SBATCH -o %x.%j.out #============ Shell Script ============ srun ./multiprocess.sh ``` * Specify the queue with the -p option. * Specify the resource to be used with the --rsc option as follows. * The argument p is the number of processes to be used. * Specify the number of programs you wish to execute simultaneously. In this case, the number is 4. * The argument c is the number of cores to be used per process (program). * Since the program is a single-threaded execution, specify 1. * The argument t is the number of threads to be used per process (program). * Since the program is a single-threaded execution, specify 1. * The argument m is the amount of memory per memory process (program) to be used. * Describe the path of the **shell script** to be executed following the srun command. ** Shell scripts to be executed in job scripts (sequential) ** ```nohighlight #!/bin/bash case $SLURM_PROCID in 0) ./a1.out ;; 1) ./a2.out ;; 2) ./a3.out ;; 3) ./a4.out ;; esac ``` * When resources are allocated, a rank number is assigned to each process. * When the process is executed, it is possible to read the assigned rank number from the environment variable SLURM_PROCID. * By creating shell scripts that use rank numbers to branch processing, it is possible to separate the programs to be executed by process. This allows multiple programs to be executed simultaneously as a single job. #### OpenMP Execution ** Job script to be executed by sbatch command (OpenMP) ** ※The following script is a sample to execute 8 programs with 4 threads execution simultaneously. ```nohighlight #!/bin/bash #============ SBATCH Directives ======= #SBATCH -p gr19999b #SBATCH -t 2:0:0 #SBATCH --rsc p=8:t=4:c=4:m=3G #SBATCH -o %x.%j.out #============ Shell Script ============ srun ./multiprocess.sh ``` * Specify the queue with the -p option. * Specify the resource to be used with the --rsc option as follows. * The argument p is the number of processes to be used. * Specify the number of programs you wish to execute simultaneously. In this case, the number is 8. * The argument c is the number of cores to be used per process (program) . * The program is a 4-thread execution, so 4 is specified. * The argument t is the number of threads to be used per process (program) used. * The program is a 4-thread execution, so 4 is specified. * The argument m is the amount of memory per process (program) to be used. * Describe the path of the **shell script** to be executed following the srun command. ** Shell scripts (OpenMP) to be executed in job scripts ** ```nohighlight #!/bin/bash case $SLURM_PROCID in 0) ./aaa.out ;; 1) ./bbb.out ;; 2) ./ccc.out ;; 3) ./ddd.out ;; 4) ./eee.out ;; 5) ./fff.out ;; 6) ./ggg.out ;; 7) ./hhh.out ;; esac ``` * When resources are allocated, a rank number is assigned to each process. * When the process is executed, it is possible to read the assigned rank number from the environment variable SLURM_PROCID. * By creating shell scripts that use rank numbers to branch processing, it is possible to separate the programs to be executed by process. This allows multiple programs to be executed simultaneously as a single job. By specifying "\#SBATCH --rsc t=4" in the job script, the job scheduler is automatically configured to use 4 threads per process. Therefore, in the above shell script, each program is executed with 4 threads. If you want to vary the number of threads per process, define ** OMP_NUM_THREADS={number of threads you want to use} ** for each process as follows. ```nohighlight #!/bin/bash case $SLURM_PROCID in 0) export OMP_NUM_THREADS=1; ./aaa.out ;; 1) export OMP_NUM_THREADS=2; ./bbb.out ;; 2) export OMP_NUM_THREADS=2; ./ccc.out ;; 3) export OMP_NUM_THREADS=3; ./ddd.out ;; 4) export OMP_NUM_THREADS=3; ./eee.out ;; 5) export OMP_NUM_THREADS=4; ./fff.out ;; 6) export OMP_NUM_THREADS=4; ./ggg.out ;; 7) export OMP_NUM_THREADS=4; ./hhh.out ;; esac ``` ## How to execute multiple programs simultaneously (MPMD) ** Job script to be executed by sbatch command (MPI) ** ※The following script is a sample to execute 3 MPI program (4 cores and 4 threads per process) that requires 4 processes concurrently. ```nohighlight #!/bin/bash #============ SBATCH Directives ======= #SBATCH -p gr19999b #SBATCH -t 2:0:0 #SBATCH --rsc p=12:t=4:c=4:m=3G #SBATCH -o %x.%j.out #============ Shell Script ============ srun --multi-prog multiprocess.conf ``` * Specify the queue with the -p option. * Specify the resource to be used with the --rsc option as follows. * The argument p is the number of processes to be used. * Specify the number of programs you want to execute concurrently.(If you want to execute 3 MPI programs that use 4 processes concurrently, specify 12 from 4 x 3) * The argument c is the number of cores to be used per process (program) . * The program is a 4-thread execution, so specify 4. * The argument t is the number of threads to be used per process (program) used. * The program is a 4-thread execution, so specify 4. * The argument m is the amount of memory per process (program) to be used. * The srun command is specified with the following options. * In the multi-prog, enter the path to the configuration file (multiprocess.conf) describing the execution of the MPI program to be created below. ** configuration file(multiprocess.conf) ** ```nohighlight ## Execute as a 4-process MPI program 0-3 ./aaa.out 4-7 ./bbb.out 8-11 ./ccc.out ``` * In the first column, enter the range of process ID used by the MPI program. (Note that process ID must begin with zero.) * In the second column, enter the path (absolute path or relative path) of the MPI program to be executed with the process ID specified in the first column. The number of processes used can be changed for each MPI program. For example, if you want the program a1.out to use 4 processes and a2.out to use 8 processes, you can create the following configuration file. However, the number of cores, threads, and memory per process are automatically set to the values specified by the --rsc option, so they cannot be customized for each MPI program. ```nohighlight ## Execute as a 4-process MPI program 0-3 ./aaa.out ## Execute as a 8-process MPI program 4-11 ./bbb.out ```