Batch Processing (For System B and C)

A program that is run in a batch processing is called “job.” Jobs are maintained by the PBS job scheduler in the System B and C.

The PBS maintains a queue as a pool of the jobs. You can request a batch processing from the System by specifying the queue (submitting a job to the queue). Note that our Supercomputer System provides a queue individually for each service course. Accordingly, you can execute a job by using the provided queue.

In the System B: A batch processing is performed after ensuring the CPU cores and memory of the value specified by using PBS option. On the other hand, when you use multiple nodes for one job,those nodes are exclusively ensured,and then the batch processing is started.

In the System C: Even when a job is running on multiple nodes, those nodes are not exclusively ensured.

You can switch dynamically between the environment of system B and system C. Type module list to check the current system. The following example shows that the system B (pbs/SystemB) is loaded.

$ module list
Currently Loaded Modulefiles:
  1) pbs/SystemB        3) impi/2017.1
  2) intel/17.0.1.132   4) PrgEnv-intel/1.0

Type the following commands to switch the system.

  • from system B to C
    module switch pbs/SystemB pbs/SystemC
  • from system C to B
    module switch pbs/SystemC pbs/SystemB

The syntax of a job script is generally the same as the syntax of a shell script. A job script consists of both an option area including job submission options of PBS, and a user program area including execution programs.

$ cat sample.sh
#!/bin/bash
#============ PBS Options ============
#QSUB -q gr19999b
#QSUB -ug gr19999
#QSUB -W 2:00
#QSUB -A p=4:t=8:c=8:m=3413M
#============ Shell Script ============
cd $QSUB_WORKDIR

mpiexec.hydra ./a.out

In an option area of a job script, write “#QSUB” at the beginning of the text, and specify the PBS option after that. For the PBS options, see PBS Options.

Note that setup files such as .bashrc shall NOT be referred automatically.

Sample scripts of PBS jobs are distributed below.

Execution Method Sample File
Non-parallelism Download
  Thread parallelism Download
Process parallelism (Intel MPI) Download
Hybrid parallelism Download

lsf2pbs converts scripts for LSF to those for PBS.

Syntax:

lsf2pbs lsf_script [pbs_script]

Example Usage:

[b59999@laurel1 script]$ cat lsf.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -rn
#QSUB -u kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -B
#QSUB -N
#====command====
mpiexec.hydra ./a.out
#====command====

[b59999@laurel1 script]$ lsf2pbs lsf.sh pbs.sh

[b59999@laurel1 script]$ cat pbs.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -r n
#QSUB -M kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -m be
#====command====
mpiexec.hydra ./a.out
#====command====

By specifying the PBS option in an option area of a job script, you can set a job property. If the upper limit of the elapsed time is not set using the -W option, a job is terminated forcibly after one hour from its start time. The PBS option is customized for supercomputers of Kyoto University. See Distinctions from original PBSfor details.

  • Main options
Option Description Example
-q QUEUENAME Specifying the queue for submitting a job. -q gr19999b
-ug GROUPNAME Specifying the effective group. -ug gr19999
-W HOUR : MINUTE Specifying the upper limit of the elapsed time.(Hour:Minute) -W 6:0
-A p=procs:t=threads:c=cores:m=memory Specifying resources for allocating to a job. -A p=4:t=8:c=8:m=3413M

.

Argument of -A Description
p=procs Allocated number of processes when executing jobs.
t=threads Allocated number of threads per process when executing jobs. The environment variable “OMP_NUM_THREADS” is automatically set.
c=cores Allocated number of CPU cores per process when executing jobs. The same value of t is generally set.
m=memory Upper limit of allocated memory per process when executing jobs (unit: M, G, T).
  • Default and max values for -A option in system B
Option Default value Max value Note
p 1 the standard amount of resources of your queue
t 1 36 in hyper-threading, max 72
c 1 36
m 3413M 122880M (120G)
  • Default and max values for -A option in system C
Option Default value Max value Note
p 1 the standard amount of resources of your queue
t 1 72 in hyper-threading, max 144
c 1 72
m 42666M 3000G
  • How -A option works

    • p is the number of processes
    • c is the number of cores per one process
    • t is the number of threads per one process
    • p X c is the number of consumed cores (when m value is default)
    • If you increase m value larger than the default value, you will consume cores accordingly. See "Notes about Computing Resources Allocated for Job Execution."

.

  • Notes about Computing Resources Allocated for Job Execution

Using the PBS job scheduler, any CPU cores allocated for job execution are regarded as being used regardless of whether they were actually used or not. For example, when you execute a job including “-A p=8:t=1:c=1:m=30G” in the System B, one-fourth of the upper limit (120G) of memory per process are used and four processes per node will be allocated based on the limit of memory capacity. At this time, the CPU cores used by the job are actually eight, but are regarded as being used 72CPU cores because two nodes have allocated for the job execution.

For example, in the System B, in order to be considered that 36 CPU cores are used when p=36 is specified, value of m must be equal or less than 120G/36 = about 3413M. This is the default value of m.

In the System B, when a job requires more than one node, the required number of nodes are allocated for the job. For example, even if 54 CPU cores (equivalent to 1.5 nodes) are requested in the job script, 72 CPU cores (equivalent to 2 nodes) are considered to be used at the job execution. When job requires less than one node, several jobs may share a node and will be executed.

In the case of System C, only the requested number of cores will be allocated regardless of the number of cores required by the job.

  • Other options
Option Description Example
-o FILENAME Specifying a filename for saving standard output. -o sample.out
-e FILENAME Specifying a filename for saving standard error output. -e sample.err
-M MAILADDR Specifying the email address. -M bar@sample.com
-m b Receiving the notification email when starting a job. -m b
-m e Receiving the notification email when completed a job. -m e
-m be Receiving the notification email both at the start and the end of job. -m be
-r n Stopping the re-execution of the job when the failure occurs. -r n

  • Replacement and addition of the options

The following options are being replaced or added from the original for the convenience of the users.

Option Type meaning in our system Original meaning
-ug Addition Specifying the execution group. -
-W Replacement Specifying the upper limit of the elapsed time.   configuration of job attributes
-A Replacement Specifying resources for allocating to a job. Adding an arbitrary explanatory text in the job
  • Prohibited options

The following options are invalid in our system.

Options Meaning in original PBS Notes
-S Specifying a shell for execution In our system the execution shell is fixedly our customized bash.
-C Changing the directive prefix The directive prefix in our system is #QSUB, not changeable.
-a Deferring execution
-A Specifying accounting string In our system -A specifies resources for allocating to a job.
-h Holding job
-k Keeping output and error files on execution host
-u Specifying job username
-c Setting checkpoint
-G Submitting interactive GUI Jobs on Windows
-p Setting priority of job
-P Setting belonging project of job

You can set up and refer to the environment variables in user program area. PBS environment variables are customized for supercomputers of Kyoto University. See Distinctions from original PBS for details.

To set up the environment variables, using the export command. To refer to the environment variables, use “$” at the beginning of the variable name.

  • Setting up the environment variable.

    #Format variablename=value; export variablename
    LANG=en_US.UTF-8; export LANG
  • Referring to the environment variable.

    echo $LANG

PBS automatically sets some variables when executing a job. The representatives of these environment variables are shown below.

Environment Variable Description
QSUB_JOBID Current job ID
QSUB_QUEUE Name of the queue submitted jobs
QSUB_WORKDIR Current directory submitted jobs
QSUB_PROCS Allocated number of processes when executing jobs
QSUB_THREADS Allocated number of threads per process when executing jobs
QSUB_CPUS Allocated number of CPU cores per process when executing jobs
QSUB_CPUS_ALLOC The number of CPU when converted from memory capacity to CPU cores
QSUB_MEMORY Upper limit of allocated memory per process when executing jobs
QSUB_PPN The number of placed processes per node when executing jobs

The following environment variables are being replaced or added from the original for the convenience of the users.

Environment variables Type Original variable name
QSUB_JOBID Replacement PBS_JOBID
QSUB_QUEUE Replacement PBS_O_CUEUE
QSUB_WORKDIR Replacement PBS_JOBDIR
QSUB_PROCS Addition
QSUB_THREADS Addition
QSUB_CPUS Addition
QSUB_CPUS_ALLOC Addition
QSUB_MEMORY Addition
QSUB_PPN Addition

Use mpiexec.hydra to execute the MPI program in the system B or C. (For Intel MPI) The mpiexec.hydra command works in conjunction with PBS, and automatically starts the process of the value specified by p of the -A option.

mpiexec.hydra ./a.out

As for a job submitted to a queue, its execution order is determined by the job-scheduling policy which is set up for each queue.

In the System B and C, policies are set fixedly as shown bellow (We plan to implement the function to change the policies in the future.)

Policy Setting
SCHEDULING_POLICY
(policy for determining the executing order of a submitted job)
fcfs
(The submitted jobs are executed on a first-come, first-served basis.)
PASSING_POLICY
(policy for executing a job prior to a waiting job)
pass
(If the computing resources are adequate to run a particular job, the job are executed before jobs waiting to be executed.)

Jobs Report A jobs report is created when a job is completed. The report includes the following three information:

  • Standard output of a job
  • Standard error output of a job
  • Job information

Files created by default If the option related to the report output is not used, files are created by following the naming convention as shown below.

System Description File name
B Standard output/Job information Bmmddhh.oXXXXXX
Standard error output Bmmddhh.eXXXXXX
C Standard output/Job information Cmmddhh.oXXXXXX
Standard error output Cmmddhh.eXXXXXX
  • Two files are automatically created for one job.
  • In a file name, “mmddhh” means a month, day, hour of a job execution time, and “XXXXXX” means a job ID.

Job information on the PBS jobs report describes a summary regarding a job execution.

JobId: 3943.ja01
        Job_Name = p1t1c1m90g.sh
        tresource_used.cpupercent = 1
        resource_used.cput = 00:00:01
        resources_used.mem = 3708kb
        resources_used.ncpus = 272
        resources_used.vmem = 336264kb
        resources_used.walltime = 00:00:13
        queue = gr19999b

Each summary item is described below.

Item Description
JobId Job Id
Job_Name Job Name
tresource_used.cpupercent The instantaneous maximum value of the real CPU usage of jobs.
resource_used.cput The total value of CPU time which is used by jobs.
resources_used.mem The instantaneous maximum value of the real CPU usage of jobs.
resources_used.ncpus The instantaneous maximum value of the virtual CPU usage of jobs.
resources_used.vmem The instantaneous maximum value of the virtual memory usage of jobs.
resources_used.walltime Starting time of the job
queue queue name specified when executing jobs

If a job is abnormally terminated, the reason of the abnormal termination will be shown in the job information of the jobs report.

  • The message shown when terminated by the elapsed time limit (-W option).

    =>> PBS: job killed: walltime 3602 exceeded limit 3600
    aprun: Apid 379565: Caught signal Terminated, sending to application
  • The message shown when terminated by the system side due to the memory limit (the case that memory on whole node is about to be run out)

    [NID 00181] 2016-10-07 16:04:40 Apid 391976: OOM killer terminated this process.
    Application 392007 exit signals: Killed
  • The message shown when terminated by the system side due to the memory limit (the case that a memory usage of process exceeds the limit which is specified in the jobscript)

    /var/spool/PBS/mom_priv/jobs/1105396.jb.SC: line 28: 179454 Killed 

    See also FAQ

You can check the status of available queues using the qstat -q command.

(Example)

$ qstat -q
Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
eb                 --      --       --     --      0     0   --   E R
tb                 --      --       --     --      0     0   --   E R
tc                 --      --       --     --      0     0   --   E R
pb                 --      --       --     --      0     0   --   E R
pc                 --      --       --     --      0     0   --   E R
gr19999b           --      --       --     --      0     0   --   E R
                                               ----- -----
                                                   0     0
  • You can submit jobs to the queue by specifying the queue using the -q option in the job script.
  • Available queues are different for each service course. The queue names are shown as below.
Service Course Queue Name
Entry course eb
Personal course pb,pc
Group course Groupname+b/c
  • When you use the group course queue, you need to specify the group using the -ug option. Such specification is not required for the entry course and personal course.

  • The queues named “tb” and “tc” are displayed when executing the qstat command. These queues are not allowed to be submitted batch processing jobs.

  • In the sixth and seventh column from the left, RUN means a number of submitted jobs and QUE means a number of pending jobs.

You can submit jobs to the queue using the qsub command.

(Example)

$ qsub sample.sh
1100.jb
  • Enter a job script file name after the command name.
  • Job script execution is requested in the system, and the job ID is displayed.
  • When you use the group course queue, you need to specify the group using #QSUB -ug option in the job script.

You can check the status of the submitted jobs using the qstat command.

(Example 1)

$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
8622.jb             qsubtest.sh      b59999            00:00:00 R gr19999b
  • In the submitted jobs, pending jobs and running jobs are displayed.
  • The status is displayed in the S line, fifth line from the left. “Q” means the status of pending, “R” means the status of running.

(Example 2)

[b59999 ~]$ qstat -f 1033
Job Id: 1033.jb
Job_Name = jobscript.sh
Job_Owner = b59999@laurel.kudpc.kyoto-u.ac.jp
job_state = Q
queue = gr19999b
server = jb01
Checkpoint = u
ctime = Fri Jun 10 14:55:26 2016
Error_Path = laurel.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.e1033
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Fri Jun 10 14:55:26 2016
Output_Path = laurel.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.o1033
Priority = 0
qtime = Fri Jun 10 14:55:26 2016
Rerunable = True
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.place = pack
Resource_List.select = 1:ncpus=1
Resource_List.walltime = 01:00:00
substate = 10
Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
    PBS_O_HOME=/usr/home/b59999,PBS_O_LOGNAME=b59999,
    PBS_O_WORKDIR=/usr/home/b59999,PBS_O_LANG=ja_JP.UTF-8,
    PBS_O_PATH=/usr/lib64/qt-3.3/bin:/usr/home/b59999/perl5/bin:/opt/pbs/
    default/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/emacs-23.4/bin:/opt/ap
    p/intel/impi/5.0.3.049/bin64:/opt/app/intel/composer_xe_2015.6.233/bin/
    intel64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/
    bin:/usr/home/b59999/.local/bin:/usr/home/b59999/bin,
    PBS_O_MAIL=/var/spool/mail/b59999,PBS_O_QUEUE=gr19999x,
    PBS_O_HOST=laurel.kudpc.kyoto-u.ac.jp
comment = Not Running: Not enough free nodes available
etime = Fri Jun 10 14:55:26 2016
Submit_arguments = jobscript.sh
project = _pbs_project_default
  • You can check the job information details using the -f option.

You can cancel the submitted jobs using the qdel command.

(Example)

$ qdel 4431
qdel: Job <4431> has finished
  • Specify the job ID to the argument.
  • You can specify both pending jobs and running jobs.
  • Group managers can cancel any jobs in their managed queues, even if they did not submit the jobs themselves.

You can check the job information details using the qs command.

(Example)

$ qs
 QUEUE     USER     JOBID   STATUS PROC THRD CORE   MEM ELAPSE( limit)
 gr19999b  b59999   5610    RUN       4   32   32   10G  01:12( 20:00)
 gr19999b  b59999   5611    PEND      1   32   32   20G  00:00( 01:00)

You can check the progress of the running job using the qcat command.
The -o option is used for displaying the standard output of jobs, the -e option is used for displaying the standard error output.

(Example 1)

$ qcat -o 5610
Tue May  1 00:00:01 JST 2016
Subroutine A step1 finished
Subroutine A step2 finished
Subroutine A step3 finished
  • Specify the job ID to the argument.
  • The standard output of the corresponded job is displayed.

(Example 2)

    $ qcat -e 5610
    Tue May  1 00:00:01 JST 2016
    STDERR 1
  • Specify the job ID to the argument.
  • The standard error output of the corresponded job is displayed.

You can check the job status of group using the qgroup command. You can check the detail information of each job using the -l option.

(Example 1)

$ qgroup
 QUEUE    SYS |   RUN  PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
 gr19999b  B  |     1     0     0 |    72 (  36/  72/ 144) | 0
 gr19999b  B  |     0     0     0 |     0 (  144/ 288/ 576) | 288
 
 QUEUE    USER     |   RUN(ALLOC)  PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
 gr19999b b59999   |     1(   72)     0(     0)     0(     0)
  • The queue information of the group course and the statistics of each user are displayed.

(Example 2)

$ qgroup -l 
 QUEUE    SYS |   RUN  PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
 gr19999b  B  |     1     0     0 |    72 (  36/  72/ 144) | 0
 gr19999b  B  |     0     0     0 |     0 (  144/ 288/ 576) | 288
 
 QUEUE    USER           |   RUN(ALLOC)  PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
 gr19999b taro123kyoto   |     1(   72)     0(     0)     0(     0)
 gr19999b b59999         |     0(    0)     1(     1)     0(     0)
 
 QUEUE    USER          JOBID   | STAT  SUBMIT_AT        | ALC/REQ | PROC THRD CORE    MEM  ELAPSE
------------------------------------------------------------------------------------------------
 gr19999b taro123kyoto  104545  | RUN   2012-06-06 15:10 |      72 |   33    1    1  4000M   00:30
 gr19999b b59999        104546  | PEND  2012-06-06 15:10 |       1 |    1    1    1  3800M   00:30
  • By using -l option, the queue information of the group course, the statistics of each user, and information of each job can be displayed.
  • The number of cores in the column of “ALC/REQ” shows the amount of the resources that is used by the proper jobs adding the amount of the memory usage.

Header Overview
RUN, PEND, OTHER The number of jobs
ALLOC The number of assigned cores (The number of cores are incremented according to the amount of used memory.System B: 3413MB,System C:42666MB)
MIN The guaranteed minimum number of cores for the queue.
STD The standard number of cores for the queue.
MAX The maximum number of cores for the queue.
READY The number of cores that are available immediately.

Header Overview
RUN (ALLOC) The number of running jobs and the amount of assigned resources. (in the number of cores)
PEND (REQUEST) The number of waiting jobs and the amount of their requested resources. (in the number of cores)
OTHER (REQUEST) The number of jobs in states other than above and the amount of their requested resources. (in the number of cores)

Header Overview
STAT Batch request state.(RUN)
SUBMIT_AT Batch request queuing date and time.
ALC/REQ The number of assigned cores.
PROC Allocated number of processes when executing jobs.
THRD Allocated number of threads per process when executing jobs.
CORE Allocated number of CPU cores per process when executing jobs.
MEM Upper limit of allocated memory per process when executing jobs.
ELAPSE Upper limit of the elapsed time.

For the differences from LSF that is supported on the previous Supercomputer System, see For Users of the Previous System.

Hyper-threading is available on processors in the System B. A single physical CPU core can be used as two virtual cores. In the option -A of PBS, specify the number of threads (“t”) and cores (“c”) per process so that the value of “t” is two times “c”, you will be able to activate two threads on a CPU core.

(Example)

$ tssrun -A p=8:t=4:c=2 mpiexec.hydra ./a.out
  • In the example above, two CPU cores are allocated on a per-process basis, and four threads are activated. (2 threads per core)
  • 16 CPU cores (one node) will be assured in a whole 8 processes, and 32 threads will become available.

You can specify the execution order of jobs by -W option. -W is used when submission.

-W depend=afterok:<jobid>

For example, assume you want to submit b.sh after a.sh ends. If a.sh is submitted and its job number is 10000.jb, by the following command b.sh is submitted after a.sh ends.

$ qsub -W depend=afterok:10000.jb b.sh

When you type qstat or qs, you will see that b.sh remains "HOLD" state till the end of a.sh.

To submit a.sh and b.sh at once, you can use like the following script.

i.e) using bash

  • script (assume its name is depend.sh)
    #!/bin/bash
    JOBID=`qsub a.sh`
    qsub -W depend=afterok:$JOBID b.sh
  • submission
    $ sh depend.sh
  • result of qs command
    QUEUE     USER     JOBID      STATUS  PROC THRD CORE    MEM ELAPSE( limit)
    gr19999b  w00001   116316     RUN        1    1    1  3413M  00:01( 01:00)
    gr19999b  w00001   116317     HOLD       1    1    1  3413M  00:00( 01:00)

To use array jobs by which similar jobs are submitted at once, specify -J option in your jobscript.

#QSUB -J <start number>-<end number>[:step]

For example, if you specify -J 1-4, the jobscript is submitted four times continuously. And by -J option, the counter $PBS_ARRAY_INDEX is set and is incremented each time the jobscript is submitted.

e.g.) if you want to submit four jobs, which are ./a.out < 1.data, /a.out < 2.data,...,./a.out < 4.data

  • jobscript (assume its name is array.sh)

    #!/bin/bash
    
    #QSUB -q gr19999b
    #QSUB -ug gr19999
    #QSUB -A p=36:t=1:c=1:m=3413M
    #QSUB -J 1-4
    
    mpiexec.hydra  ./a.out < ${PBS_ARRAY_INDEX}.data
  • submission
    $ qsub array.sh
  • result of qs command
    $ qs
    QUEUE     USER     JOBID        STATUS  PROC THRD CORE    MEM ELAPSE( limit)
    gr19999b  b59999   3023275[1]   PEND      36    1    1  3413M  00:00( 01:00)
    gr19999b  b59999   3023275[2]   PEND      36    1    1  3413M  00:00( 01:00)
    gr19999b  b59999   3023275[3]   PEND      36    1    1  3413M  00:00( 01:00)
    gr19999b  b59999   3023275[4]   PEND      36    1    1  3413M  00:00( 01:00)

/a.out < 1.data is executed by the job 3023275[1],/a.out < 2.data is executed by the job 3023275[2],...

The direct connection from the computing nodes to the internet is available. Users can connect to the internet during the execution of batch job or interactive execution. The IP address for the internet connetion is 133.3.51.9. Please note that you cannot connect from the internet to the compute nodes.


Copyright © Academic Center for Computing and Media Studies, Kyoto University, All Rights Reserved.