Batch Processing (For System A)

A program that is run in a batch processing is called a “job”. Jobs are maintained by the PBS job scheduler in the System A.

The PBS maintains a queue as a pool of the jobs. You can request a batch processing from the System by specifying the queue (submitting a job to the queue). Note that our Supercomputer Systems provide a queue for each service course. Thus, you can execute a job by using the provided queue.

The System A exclusively ensures the computing nodes at job executing time, and then start a batch processing. Thus, at least 68 CPU cores in the System A are ensured.

The syntax of a job script is generally the same as the syntax of a shell script. A job script consists of both an option area including job submission options of PBS, and a user program area including execution programs.

$ cat sample.sh
#!/bin/bash
#============ PBS Options ============
#QSUB -q gr19999a
#QSUB -ug gr19999
#QSUB -W 2:00
#QSUB -A p=4:t=8:c=8:m=1355M
#============ Shell Script ============
aprun -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN ./a.out

In an option area of a job script, write “#QSUB” at the beginning of the text, and specify the PBS option after that. For the PBS options, see PBS option.

The job script is executed by the qsub command.

Note that setup files such as .bashrc shall not be referred automatically when submitting a batch job.

Sample scripts of RBS jobs are distributed below.

Execution Method Sample File
Non-parallelism Download
Thread parallelism Download
Process parallelism Download
Hybrid parallelism Download

lsf2pbs converts scripts for LSF to those for PBS.

Syntax:

lsf2pbs lsf_script [pbs_script]

Example Usage:

[b59999@camphor1 script]$ cat lsf.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -rn
#QSUB -u kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -B
#QSUB -N
#====command====
aprun -n $LSB_PROCS -d $LSB_THREADS -N $LSB_PPN ./a.out
#====command====

[b59999@camphor1 script]$ lsf2pbs lsf.sh pbs.sh

[b59999@camphor1 script]$ cat pbs.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -r n
#QSUB -M kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -m be
#====command====
aprun -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN ./a.out
#====command====

By specifying the PBS option in an option area of a job script, you can set a job property. If the upper limit of the elapsed time is not set using the -W option, a job is terminated forcibly after one hour from its start time. The PBS option is customized for supercomputers of Kyoto University. See Distinctions from original PBS for details.

  • Main options

The functions of the main options are same as the LSF of the former system.

Option Description Example
-q QUEUENAME Specifying the queue for submitting a job. -q gr19999a
-ug GROUPNAME Specifying the effective group. -ug gr19999
-W HOUR : MINUTE Specifying the upper limit of the elapsed time.(Hour:Minute) -W 6:0
-A p=procs:t=threads:c=cores:m=memory Specifying resources for allocating to a job. -A p=4:t=8:c=8:m=2G

.

Argument of -A Description
p=procs Allocated number of processes when executing jobs.
t=threads Allocated number of threads per process when executing jobs. The environment variable “OMP_NUM_THREADS” is automatically set.
c=cores Allocated number of CPU cores per process when executing jobs. The same value of t is generally set.
m=memory Upper limit of allocated memory per process when executing jobs (unit: M, G, T).
  • Default and max values for -A option in system A
Option Default value Max value Note
p 1 the standard amount of resources of your queue (in case of c=1:m=1355M)
t 1 68 in hyper-threading, max 272
c 1 68
m 1355M 92160M (90G)
  • How -A option works

    • p is the number of processes
    • c is the number of cores per one process
    • t is the number of threads per one process
    • p X c is the number of consumed cores (when m value is default)
    • If you increase m value larger than the default value, you will consume cores accordingly. See "Notes about Computing Resources Allocated for Job Execution."

.

  • Notes about Computing Resources Allocated for Job Execution

Using the PBS job scheduler, any CPU cores allocated for job execution are regarded as being used regardless of whether they were actually used or not. For example, when you execute a job including “-A p=8:t=1:c=1:m=23040M” in the System A, one-fourth of the upper limit (92160M) of memory per process are used and four processes per node will be allocated based on the limit of memory capacity. At this time, the CPU cores used by the job are actually eight, but are regarded as being used 136CPU cores because two nodes have allocated for job execution.

For example, in order to be considered that 68 CPU cores are used when p=68 is specified, value of m must be equal or less than 90G/68 = about 1355M. This is the default value of m.

In the System A, a single job occupy the computing node exclusively. Therefore, for example, even if 34 CPU cores (equivalent to 0.5 nodes) are requested in the job script, 68 CPU cores (equivalent to 1 node) are considered to be used at the job execution.

  • Other options
Option Description Example
-o FILENAME Specifying a filename for saving standard output . -o result.out
-e FILENAME Specifying a filename for saving standard error output. -e result.err
-M MAILADDR Specifying the email address. -M bar@sample.com
-m b Receiving the notification email when starting a job. -m b
-m e Receiving the notification email when completed a job. -m e
-m be Receiving the notification email both at the start and the end of job. -m be
-r n Stopping the re-execution of the job when the failure occurs. -r n
-N JOBNAME Specifying the job's name indicated during execution -N sample_job

  • Replacement and addition of the options  

The following options are being replaced or added from the original for the convenience of the users.

Option Type meaning in our system Original meaning
-ug Addition Specifying the execution group. -
-W Replacement Specifying the upper limit of the elapsed time. configuration of job attributes
-A Replacement Specifying resources for allocating to a job. Adding an arbitrary explanatory text in the job
  • Prohibited options

The following options are invalid in our system.

Options Meaning in original PBS Notes
-S Specifying a shell for execution In our system the execution shell is fixedly our customized bash.
-C Changing the directive prefix The directive prefix in our system is #QSUB, not changeable.
-a Deferring execution
-A Specifying accounting string In our system -A specifies resources for allocating to a job.
-h Holding job
-k Keeping output and error files on execution host
-u Specifying job username
-c Setting checkpoint
-G Submitting interactive GUI Jobs on Windows
-p Setting priority of job
-P Setting belonging project of job

You can set up and refer to the environment variables in user program area. PBS environment variables are customized for supercomputers of Kyoto University. See Distinctions from original PBS for details.

To set up the environment variables, using the export command. To refer to the environment variables, use “$” at the beginning of the variable name.

  • Setting up the environment variable.

    #Format variablename=value; export variablename
    LANG=en_US.UTF-8; export LANG
  • Referring to the environment variable.

    echo $LANG

PBS automatically sets some variables when executing a job. The representatives of these environment variables are shown below.

Environment Variable Description
QSUB_JOBID Current job ID
QSUB_QUEUE Name of the queue submitted jobs
QSUB_WORKDIR Current directory submitted jobs
QSUB_PROCS Allocated number of processes when executing jobs
QSUB_THREADS Allocated number of threads per process when executing jobs
QSUB_CPUS Allocated number of CPU cores per process when executing jobs
QSUB_CPUS_ALLOC The number of CPU when converted from memory capacity to CPU cores
QSUB_MEMORY Upper limit of allocated memory per process when executing jobs
QSUB_PPN The number of placed processes per node when executing jobs

The following environment variables are being replaced or added from the original for the convenience of the users.

Environment variables Type Original variable name
QSUB_JOBID Replacement PBS_JOBID
QSUB_QUEUE Replacement PBS_O_CUEUE
QSUB_WORKDIR Replacement PBS_JOBDIR
QSUB_PROCS Addition
QSUB_THREADS Addition
QSUB_CPUS Addition
QSUB_CPUS_ALLOC Addition
QSUB_MEMORY Addition
QSUB_PPN Addition

The System A uses two different nodes: gateway node for running job scripts and computational node for running programs. Therefore, in order to run a program on a computational node, regardless of whether it is a non-parallelized program or an MPI program, you MUST insert the aprun command at the location of program execution in the job script.

Resources of gateway-nodes are smaller and less powerful than those of computation modes. Therefore, when you execute jobs without the use of aprun command, make sure to use low-load programs or commands.

The environment variable of PBS, as shown in the example below, allows you to run both non-parallelized program and MPI program.

aprun -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN ./a.out

Frequently-used aprun options are listed below. See the manual of man aprun command for details.

  • Basic options
Option Description
-n procs Specifying the number of processes
-d cores Specifying the number of CPU cores per process
-N procs_per_node Specifying the number of processes per node
  • Notes about CPU Time of Gateway Nodes

Gateway nodes handle job scripts of all users, and limit the CPU time on each job. If a large amount of data is processed to standard output in a program, the CPU time limits may be exceeded. At the point when the time limits are exceeded, the job is stopped. Thus, when you output a large amount of data, do the redirect, not the standard output.

  • Core assignment for user program

TBD

The execution mode of KNL is set fixedly to the Quadrant mode for clustering.

Please specify one of the following options to change the mode for MCDRAM. Note that the job may be waited around 20 minutes to run when you specify the mode for MCDRAM. The waiting job cannot be deleted by qdel command.

mode option note
Flat -mm flat
Cache -mm cache default mode

KNL has a structure in which tiles with two cores are arranged in a grid pattern.

("Knights Landing (KNL): 2nd Generation Intel(R) Xeon Phi(TM) Processor",hotchips 2016)

MCDRAM is a high-speed memory in KNL chip, whose size is 16 GiB. Users can choose the mode from the following four types.

  1. Flat mode

    The compute node places the MCDRAM and main memory to a different address and the program can access either directly. The address of MCDRAM is sequel to the tail of address of main DRAM by default. By placing the data that is often used in the MCDRAM, program execution may be accelerated. As a way to place the data in the MCDRAM, there is a way to use Intel MKL library or to use numactl command. MKL library attempts to use the MCDRAM by default compared to MKL functions such as mkl_malloc in the program. For details about MKL library, see here. The below code is an example for using numactl. Note that the job can use only MCDRAM for its memory space by this method and if the job use the memory space more than 16GB the job may be killed.

    #!/bin/bash
    
    #QSUB -q gr19999a
    #QSUB -ug gr19999
    #QSUB -A p=1:t=16:c=16:m=16G
    
    aprun -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN numactl -m 1 ./a.out
  2. Cache mode

    Use the MCDRAM as a low-level cache. As well as the general cache, the system uses automatically and users can not access directly. If you specify the -mm flat option or do not specify the -mm option in the job script, this mode is selected.

As for a job submitted to a queue, its execution order is determined by the job-scheduling policy which is set up for each queue.

In the System A, policies are set fixedly as shown bellow (We plan to implement the function to change the policies in the future.)

Policy Setting
SCHEDULING_POLICY
(policy for determining the executing order of a submitted job)
fcfs
(The submitted jobs are executed on a first-come, first-served basis.)
PASSING_POLICY
(policy for executing a job prior to a waiting job)
pass
(If the computing resources are adequate to run a particular job, the job are executed before jobs waiting to be executed.)

Jobs report
A jobs report is created when a job is completed. The report includes the following two information:

  • Standard output of a job
  • Standard error output of a job

Files created by default
If the option related to the report output is not used, files are created by following the naming convention as shown below.

Description File name
Standard output Ammddhh.oXXXXXX
Standard error output Ammddhh.eXXXXXX
  • Two files are automatically created for one job.
  • In a file name, “mmddhh” means a month, day, hour of a job execution time, and “XXXXXX” means a job ID.

If a job is abnormally terminated, the reason of the abnormal termination will be shown in the standard error output of the jobs report.

  • The message shown when terminated by the elapsed time limit (-W option).

    =>> PBS: job killed: walltime 3602 exceeded limit 3600
    aprun: Apid 379565: Caught signal Terminated, sending to application
  • The message shown when terminated by the system side due to the memory limit

    [NID 00181] 2016-10-07 16:04:40 Apid 391976: OOM killer terminated this process.
    Application 392007 exit signals: Killed

You can check the status of available queues using qstat -q command and -q option.

Example

$ qstat -q
Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
ta                 --      --       --     --      0     0   --   E R
pa                 --      --       --     --      0     0   --   E R
gr19999a           --      --       --     --      0     0   --   E R
                                               ----- -----
                                                   0     0
  • You can submit jobs to the queue by specifying the queue using the -q option in the job script.
  • Available queues are different for each service course. The queue names are shown as below.
Service Course Queue Name
Personal course pa
Group course Groupname+a
  • When you use the group course queue, you need to specify the group using the -ug option. Such specification is not required for the entry course and personal course.

  • The queue named “ta” are displayed when executing the qstat command. The queue are not allowed to be submitted batch processing jobs.

  • In the sixth and seventh column from the left, RUN means a number of submitted jobs and QUE means a number of pending jobs.

You can submit jobs to the queue using the qsub command.

Example

$ qsub sample.sh
11.pbs
  • Enter a job script file name after the command name.
  • Job script execution is requested in the System, and the job ID is displayed.
  • When you use the group course queue, you need to specify the group using #QSUB -ug option in the job script.

You can check the status of the submitted jobs using the qstat command.

Example 1

$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
86.pbs            qsubtest.sh      b59999            00:00:00 R workq
  • In the submitted jobs, pending jobs and running jobs are displayed.
  • The status is displayed in the S line, fifth line from the left. “Q” means the status of pending, “R” means the status of running.
  • "H" means the "hold" status, which is displayed when the job encountered an error etc. When "H" is displayed, cancel the job by qdel command.

To execute multiple programs on a single computation node, see "Running Multiple Programs on a Single Computation Node".

Example 2

[b59999 ~]$ qstat -f 10
Job Id: 10.pbs
Job_Name = jobscript.sh
Job_Owner = b59999@pbs.kudpc.kyoto-u.ac.jp
job_state = Q
queue = gr19999x
server = pbs
Checkpoint = u
ctime = Fri Jun 10 14:55:26 2016
Error_Path = camphor1.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.e10
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Fri Jun 10 14:55:26 2016
Output_Path = camphor1.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.o10
Priority = 0
qtime = Fri Jun 10 14:55:26 2016
Rerunable = True
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.place = pack
Resource_List.select = 1:ncpus=1
Resource_List.walltime = 01:00:00
substate = 10
Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
    PBS_O_HOME=/usr/home/b59999,PBS_O_LOGNAME=b59999,
    PBS_O_WORKDIR=/usr/home/b59999,PBS_O_LANG=ja_JP.UTF-8,
    PBS_O_PATH=/usr/lib64/qt-3.3/bin:/usr/home/b59999/perl5/bin:/opt/pbs/
    default/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/emacs-23.4/bin:/opt/ap
    p/intel/impi/5.0.3.049/bin64:/opt/app/intel/composer_xe_2015.6.233/bin/
    intel64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/
    bin:/usr/home/b59999/.local/bin:/usr/home/b59999/bin,
    PBS_O_MAIL=/var/spool/mail/b59999,PBS_O_QUEUE=gr19999x,
    PBS_O_HOST=pbs.kudpc.kyoto-u.ac.jp
comment = Not Running: Not enough free nodes available
etime = Fri Jun 10 14:55:26 2016
Submit_arguments = jobscript.sh
project = _pbs_project_default
  • You can check the job information details using the -f option.

You can cancel the submitted jobs using the qdel command.

Example)

$ qdel 44
  • Specify the job ID to the argument.
  • You can specify both pending jobs and running jobs.

You can check the job information details using the qs command.

(Example)

$ qs
 QUEUE     USER     JOBID   STATUS PROC THRD CORE   MEM ELAPSE( limit)
 gr19999a  w00001   5610    RUN       4   32   32   10G  01:12( 20:00)
 gr19999a  w00001   5611    PEND      1   32   32   20G  00:00( 01:00)

You can check the progress of the running job using the qcat command.
The -o option is used for displaying the standard output of jobs, the -e option is used for displaying the standard error output.

Example 1

$ qcat -o 5610
Tue May  1 00:00:01 JST 2012
Subroutine A step1 finished
Subroutine A step2 finished
Subroutine A step3 finished
  • Specify the job ID to the argument.
  • The standard output of the corresponded job is displayed.

Example 2

    $ qcat -e 5610
    Tue May  1 00:00:01 JST 2012
    STDERR 1
  • Specify the job ID to the argument.
  • The standard error output of the corresponded job is displayed.

You can check the job status of group using the qgroup command. You can check the detail information of each job using the -l option.

(Example 1)

$ qgroup
 QUEUE    SYS |   RUN  PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
 gr19999a  A  |     1     0     0 |    64 (  32/ 128/ 256) | 64
 gr19999a  A  |     0     0     0 |     0 (  64/ 256/ 512) | 192
 
 QUEUE    USER     |   RUN(ALLOC)  PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
 gr19999a b59999   |     1(   64)     0(     0)     0(     0)
  • The queue information of the group course and the statistics of each user are displayed.

Example 2

$ qgroup -l
 QUEUE    SYS |   RUN  PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
 gr19999a  A  |     1     1     0 |    64 (  32/ 128/ 256) | 64
 gr19999a  A  |     0     0     0 |     0 (  64/ 256/ 512) | 192
 
 QUEUE    USER           |   RUN(ALLOC)  PEND(REQUEST) OTHER(REQUEST)
 ----------------------------------------------------------------
 gr19999a taro123kyoto   |     1(   64)     0(     0)     0(     0)
 gr19999a b59999         |     0(    0)     1(    32)     0(     0)
 
 QUEUE    USER          JOBID   | STAT  SUBMIT_AT        | ALC/REQ | PROC THRD CORE    MEM  ELAPSE
------------------------------------------------------------------------------------------------
 gr19999a taro123kyoto  104545  | RUN   2012-06-06 15:10 |      64 |   32    1    1  2000M   00:30
 gr19999a b59999        104546  | PEND  2012-06-06 15:10 |      32 |    1    1    1   500M   00:30
  • By using -l option, the queue information of the group course, the statistics of each user, and information of each job can be displayed.
  • The number of cores in the column of “ALC/REQ” shows the amount of the resources that is used by the proper jobs adding the amount of the memory usage.

Header Overview
RUN, PEND, OTHER The number of jobs
ALLOC The number of assigned cores (The number of cores are incremented according to the amount of used memory. System A: 1355MB)
MIN The guaranteed minimum number of cores for the queue.
STD The standard number of cores for the queue.
MAX The maximum number of cores for the queue.
READY The number of cores that are available immediately.

Header Overview
RUN (ALLOC) The number of running jobs and the amount of assigned resources. (in the number of cores)
PEND (REQUEST) The number of waiting jobs and the amount of their requested resources. (in the number of cores)
OTHER (REQUEST) The number of jobs in states other than above and the amount of their requested resources. (in the number of cores)

Header Overview
STAT Batch request state.
SUBMIT_AT Batch request queuing date and time.
ALC/REQ The number of assigned cores.
PROC Allocated number of processes when executing jobs.
THRD Allocated number of threads per process when executing jobs.
CORE Allocated number of CPU cores per process when executing jobs.
MEM Upper limit of allocated memory per process when executing jobs.
ELAPSE Upper limit of the elapsed time.

For the differences from LSF that is supported on the previous Supercomputer System, see For Users of the Previous System.

By running multiple programs at the same time in one job, you can effectively use computation resources.
In order to run multiple programs simultaneously as one job, write multiple execution commands in the shell script and execute the shell script with the aprun command.

  • The target of this method is thread serialization program by sequential program or OpenMP or automatic parallelization function.
  • Multiple MPI programs can not be executed simultaneously as one job.

An example of each script is shown below.

When executing with the Cray compiler, you need to set the following environment variables. ※If the following settings are missing, problems arise in the calculation result.

export PMI_NO_FORK=1
export PMI_NO_PREINITIALIZE=1

Job script to be executed with the qsub command (sequential)

※The following script is a sample when 4 programs are executed simultaneously

#!/bin/bash
#QSUB -q gr19999a
#QSUB -ug gr19999
#QSUB -A p=4:t=1:c=1:m=3G
aprun -d $QSUB_THREADS -n $QSUB_PROCS -N $QSUB_PPN -b sh ./multiprocess.sh
  • Specify the queue with the -q option.
  • Specify the user group with the -ug option.
  • Specify the resource to be used with the -A option as shown below.
    • For argument p, the number of processes to use.
      • Please specify the number of programs you want to execute at the same time. Here it is 4.
    • For argument c, the number of cores used per process (program) .
      • Since the program is single threaded execution, specify 1.
    • For argument t, the number of threads used per process (program).
      • Since the program is single threaded execution, specify 1.
    • For the argument m, the capacity around the memory process (program) to be used.
  • Specify "- d $ QSUB_THREADS", "- n $ QSUB_PROCS", "- N $ QSUB_PPN" following the aprun command and describe the path of the shell script to be executed following "- b" .

Shell script to be executed in the job script (sequential)

#!/bin/bash
case $ALPS_APP_PE in
    0) ./a1.out ;;
    1) ./a2.out ;;
    2) ./a3.out ;;
    3) ./a4.out ;;

esac
  • When securing resources, rank numbers are assigned to each process.
  • When the process is executed, it is possible to read the assigned rank number from the environment variable ALPS_APP_PE.
  • By creating a shell script that performs processing branch using the rank number, it is possible to assign the program to be executed on each core. This makes it possible to execute multiple programs simultaneously as one job.

Job script to be executed with the qsub command (OpenMP)

※The following script is a sample when you execute eight 4-thread execution programs simultaneously.

#!/bin/bash
#QSUB -q gr19999a
#QSUB -ug gr19999
#QSUB -A p=8:t=4:c=4:m=3G
aprun -d $QSUB_THREADS -n $QSUB_PROCS -N $QSUB_PPN -cc none -b sh ./multiprocess.sh
  • Specify the queue with the -q option.
  • Specify the user group with the -ug option.
  • Specify the resource to be used with the -A option as shown below.
    • For argument p, the number of processes to use.
      • Please specify the number of programs you want to execute at the same time. Here it is 8.
    • For argument c, the number of cores used per process (program) .
      • Since each program uses 4 threads, specify 4.
    • For argument t, the number of threads used per process (program).
      • Since each program uses 4 threads, specify 4.
    • For argument m, the capacity around the process (program) of the memory to be used.
  • Specify "- d $ QSUB_THREADS", "- n $ QSUB_PROCS", "- N $ QSUB_PPN" and "- cc none" following the aprun command and describe the path of the shell script to be executed following "- b" .

Shell script to be executed in the job script(OpenMP)

#!/bin/bash
case $ALPS_APP_PE in
    0)  ./aaa.out ;;
    1)  ./bbb.out ;;
    2)  ./ccc.out ;;
    3)  ./ddd.out ;;
    4)  ./eee.out ;;
    5)  ./fff.out ;;
    6)  ./ggg.out ;;
    7)  ./hhh.out ;;

esac
  • When securing resources, rank numbers are assigned to each process.
  • When the process is executed, it is possible to read the assigned rank number from the environment variable ALPS_APP_PE.
  • By creating a shell script that performs processing branch using the rank number, it is possible to assign the program to be executed to each process. This makes it possible to execute multiple programs simultaneously as one job.

By specifying " #QSUB -A t=4" in the job script, the qsub system automatically sets up 4 threads per process. Therefore, in the above shell script, each program is executed with 4 threads.

If you want to change the number of threads for each process, define OMP_NUM_THREADS = {number of threads you want to use} for each process as shown below.

#!/bin/bash
case $ALPS_APP_PE in
    0) export OMP_NUM_THREADS=1; ./aaa.out ;;
    1) export OMP_NUM_THREADS=2; ./bbb.out ;;
    2) export OMP_NUM_THREADS=2; ./ccc.out ;;
    3) export OMP_NUM_THREADS=3; ./ddd.out ;;
    4) export OMP_NUM_THREADS=3; ./eee.out ;;
    5) export OMP_NUM_THREADS=4; ./fff.out ;;
    6) export OMP_NUM_THREADS=4; ./ggg.out ;;
    7) export OMP_NUM_THREADS=4; ./hhh.out ;;

esac

By running multiple programs on a single job, a single computation node resource can be utilized more effectively. But multiple MPI programs cannot be run on a single job at the same time, so this method applies to non-parallelized programs or thread-parallelized programs by OpenMP or autothreading.

To run multiple programs on a single job at the same time, include multiple execution commands in a shell script, and then execute the shell script by using the aprun command. Examples for each script are described as below.

Note that this method cannot be used for a program compiled with the Cray compiler.

Job script executed by the qsub command

#!/bin/bash
#QSUB -q gr19999a
#QSUB -A p=1:t=1:c=32:m=61440M
aprun -d $QSUB_CPUS -b sh ./multirun.sh
  • Specify the resource you use, by using the -A option. Specify the number of cores you use in the argument c, and specify the total amount of memory you use in the argument m.
  • Specify “-d $QSUB_CPUS” and “-b sh” in the aprun command’s option.
  • Specify the shell script not the execution file in the aprun command’s argument.

Shell program executed in a job script (non-parallelized)

#!/bin/bash
./a01.out &
./a02.out &
(Omitted)
./a32.out &
wait
  • Write non-parallelized programs that you want to execute on a single job side by side. Include & in the end of the execution command to run each program in background.
  • Include the wait command in the end of the shell script to synchronize.

Shell program executed in a job script (OpenMP)

#!/bin/bash
export OMP_NUM_THREADS=4
./aaa.out &
export OMP_NUM_THREADS=12
./bbb.out &
export OMP_NUM_THREADS=16
./ccc.out &
wait
  • Write OpenMP programs that you want to execute on a single job side by side. Include & in the end of the execution command to run each program in the background.
  • The number of OpenMP parallelization can be specified by the environment variable OMP_NUM_THREADS to execute each program by each different number of parallelizations.
  • Include the wait command in the end of the shell script to synchronize.

In addition to the default 4KB page size of virtual memory, the computation nodes in System A supports multiple page sizes as below:

Page size Modulefile
2MB craype-hugepages2M
8MB craype-hugepages8M
16MB craype-hugepages16M
64MB craype-hugepages64M
128MB craype-hugepages128M
256MB craype-hugepages256M
512MB craype-hugepages512M

When changing the page size of virtual memory, you MUST load an above-listed module file of a given size, and then compile a program. And, you also MUST load the module file, and then execute the job.

When submitting a job, using options of aprun command (see below), you MUST specify the amount of hugepage memory (MemorySize) needed per process. In the cases where the specified MemorySize is close to the max capacity (90GB) per node, you may fail to ensure the hugepage memory. Therefore, make sure to specify only the required amount of MemorySize.

Option Description
-m MemorySize hs Amount of hugepage memory needed per process (unit: MB)

(1) Before compilation, load an above-listed file of a given size.
$ module load craype-hugepages2M
$ ftn test.f90
(2) Before execution of the job, specify the amount of hugepage memory (MemorySize)
$ cat jobscript.sh
(Omitted)
module load craype-hugepages2M
aprun -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN -m20000hs ./a.out
$ qsub jobscript.sh

For details, see man intro_hugepages, man libhugetlbfs, and man aprun command.

Hyper-threading is available on processors in the System A. A single physical CPU core can be used as max 4 virtual cores. In the option -A of PBS, specify the number of threads (“t”) and cores (“c”) per process so that the value of “t” is N times “c”, you will be able to activate N threads on a CPU core. Here N = 1, 2, 3, 4.

(Example)

$ tssrun -A p=8:t=4:c=2 ./a.out
  • In the example above, two CPU cores are allocated on a per-process basis, and four threads are activated (two threads per core).
  • 16 CPU cores (one node) will be assured in a whole 8 processes, and 32 threads will become available.

You can specify the execution order of jobs by -W option. -W is used when submission.

-W depend=afterok:<jobid>

For example, assume you want to submit b.sh after a.sh ends. If a.sh is submitted and its job number is 10000.ja, by the following command b.sh is submitted after a.sh ends.

$ qsub -W depend=afterok:10000.ja b.sh

When you type qstat or qs, you will see that b.sh remains "HOLD" state till the end of a.sh.

To submit a.sh and b.sh at once, you can use like the following script.

i.e) using bash

  • script (assume its name is depend.sh)
    #!/bin/bash
    JOBID=`qsub a.sh`
    qsub -W depend=afterok:$JOBID b.sh
  • submission
    $ sh depend.sh
  • result of qs command
    QUEUE     USER     JOBID      STATUS  PROC THRD CORE    MEM ELAPSE( limit)
    gr19999a  w00001   116316     RUN        1    1    1  1355M  00:01( 01:00)
    gr19999a  w00001   116317     HOLD       1    1    1  1355M  00:00( 01:00)

To use array jobs by which similar jobs are submitted at once, specify -J option in your jobscript.

#QSUB -J <start number>-<end number>[:step]

For example, if you specify -J 1-4, the jobscript is submitted four times continuously. And by -J option, the counter $PBS_ARRAY_INDEX is set and is incremented each time the jobscript is submitted.

e.g.) if you want to submit four jobs, which are ./a.out < 1.data, /a.out < 2.data,...,./a.out < 4.data

  • jobscript (assume its name is array.sh)

    #!/bin/bash
    
    #QSUB -q gr19999a
    #QSUB -ug gr19999
    #QSUB -A p=68:t=1:c=1:m=1355M
    #QSUB -J 1-4
    
    aprun  -n $QSUB_PROCS -d $QSUB_THREADS -N $QSUB_PPN  ./a.out < ${PBS_ARRAY_INDEX}.data
  • submission
    $ qsub array.sh
  • result of qs command
    $ qs
    QUEUE     USER     JOBID        STATUS  PROC THRD CORE    MEM ELAPSE( limit)
    gr19999a  b59999   3023275[1]   PEND       68    1    1  1355M  00:00( 01:00)
    gr19999a  b59999   3023275[2]   PEND       68    1    1  1355M  00:00( 01:00)
    gr19999a  b59999   3023275[3]   PEND       68    1    1  1355M  00:00( 01:00)
    gr19999a  b59999   3023275[4]   PEND       68    1    1  1355M  00:00( 01:00)

/a.out < 1.data is executed by the job 3023275[1],/a.out < 2.data is executed by the job 3023275[2],...

Intel® Xeon Phi™ Processor Software Optimization Guide


Copyright © Academic Center for Computing and Media Studies, Kyoto University, All Rights Reserved.