A program that is run in a batch processing is called “job.” Jobs are maintained by the PBS job scheduler in the System B and C.
The PBS maintains a queue as a pool of the jobs. You can request a batch processing from the System by specifying the queue (submitting a job to the queue). Note that our Supercomputer System provides a queue individually for each service course. Accordingly, you can execute a job by using the provided queue.
In the System B: A batch processing is performed after ensuring the CPU cores and memory of the value specified by using PBS option. On the other hand, when you use multiple nodes for one job,those nodes are exclusively ensured,and then the batch processing is started.
In the System C: Even when a job is running on multiple nodes, those nodes are not exclusively ensured.
You can switch dynamically between the environment of system B and system C.
Type module list
to check the current system.
The following example shows that the system B (pbs/SystemB) is loaded.
$ module list
Currently Loaded Modulefiles:
1) pbs/SystemB 3) impi/2017.1
2) intel/17.0.1.132 4) PrgEnv-intel/1.0
Type the following commands to switch the system.
module switch pbs/SystemB pbs/SystemC
module switch pbs/SystemC pbs/SystemB
The syntax of a job script is generally the same as the syntax of a shell script. A job script consists of both an option area including job submission options of PBS, and a user program area including execution programs.
$ cat sample.sh
#!/bin/bash
#============ PBS Options ============
#QSUB -q gr19999b
#QSUB -ug gr19999
#QSUB -W 2:00
#QSUB -A p=4:t=8:c=8:m=3413M
#============ Shell Script ============
cd $QSUB_WORKDIR
mpiexec.hydra ./a.out
In an option area of a job script, write “#QSUB” at the beginning of the text, and specify the PBS option after that. For the PBS options, see PBS Options.
Note that setup files such as .bashrc shall NOT be referred automatically.
Sample scripts of PBS jobs are distributed below.
Execution Method | Sample File |
---|---|
Non-parallelism | Download |
Thread parallelism | Download |
Process parallelism (Intel MPI) | Download |
Hybrid parallelism | Download |
lsf2pbs converts scripts for LSF to those for PBS.
Syntax:
lsf2pbs lsf_script [pbs_script]
Example Usage:
[b59999@laurel1 script]$ cat lsf.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -rn
#QSUB -u kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -B
#QSUB -N
#====command====
mpiexec.hydra ./a.out
#====command====
[b59999@laurel1 script]$ lsf2pbs lsf.sh pbs.sh
[b59999@laurel1 script]$ cat pbs.sh
#!/bin/bash
#QSUB -q gr19999b
#QSUB -A p=1:t=1:c=1:m=1G
#QSUB -W 12:00
#QSUB -r n
#QSUB -M kyodai.taro.1a@kyoto-u.ac.jp
#QSUB -m be
#====command====
mpiexec.hydra ./a.out
#====command====
By specifying the PBS option in an option area of a job script, you can set a job property. If the upper limit of the elapsed time is not set using the -W option, a job is terminated forcibly after one hour from its start time. The PBS option is customized for supercomputers of Kyoto University. See Distinctions from original PBSfor details.
Option | Description | Example |
---|---|---|
-q QUEUENAME | Specifying the queue for submitting a job. | -q gr19999b |
-ug GROUPNAME | Specifying the effective group. | -ug gr19999 |
-W HOUR : MINUTE | Specifying the upper limit of the elapsed time.(Hour:Minute) | -W 6:0 |
-A p=procs:t=threads:c=cores:m=memory | Specifying resources for allocating to a job. | -A p=4:t=8:c=8:m=3413M |
.
Argument of -A | Description |
---|---|
p=procs | Allocated number of processes when executing jobs. |
t=threads | Allocated number of threads per process when executing jobs. The environment variable “OMP_NUM_THREADS” is automatically set. |
c=cores | Allocated number of CPU cores per process when executing jobs. The same value of t is generally set. |
m=memory | Upper limit of allocated memory per process when executing jobs (unit: M, G, T). |
Option | Default value | Max value | Note |
---|---|---|---|
p | 1 | the standard amount of resources of your queue | |
t | 1 | 36 | in hyper-threading, max 72 |
c | 1 | 36 | |
m | 3413M | 122880M (120G) |
Option | Default value | Max value | Note |
---|---|---|---|
p | 1 | the standard amount of resources of your queue | |
t | 1 | 72 | in hyper-threading, max 144 |
c | 1 | 72 | |
m | 42666M | 3000G |
How -A option works
.
Using the PBS job scheduler, any CPU cores allocated for job execution are regarded as being used regardless of whether they were actually used or not. For example, when you execute a job including “-A p=8:t=1:c=1:m=30G” in the System B, one-fourth of the upper limit (120G) of memory per process are used and four processes per node will be allocated based on the limit of memory capacity. At this time, the CPU cores used by the job are actually eight, but are regarded as being used 72CPU cores because two nodes have allocated for the job execution.
For example, in the System B, in order to be considered that 36 CPU cores are used when p=36 is specified, value of m must be equal or less than 120G/36 = about 3413M. This is the default value of m.
In the System B, when a job requires more than one node, the required number of nodes are allocated for the job. For example, even if 54 CPU cores (equivalent to 1.5 nodes) are requested in the job script, 72 CPU cores (equivalent to 2 nodes) are considered to be used at the job execution. When job requires less than one node, several jobs may share a node and will be executed.
In the case of System C, only the requested number of cores will be allocated regardless of the number of cores required by the job.
Option | Description | Example |
---|---|---|
-o FILENAME | Specifying a filename for saving standard output. | -o sample.out |
-e FILENAME | Specifying a filename for saving standard error output. | -e sample.err |
-M MAILADDR | Specifying the email address. | -M bar@sample.com |
-m b | Receiving the notification email when starting a job. | -m b |
-m e | Receiving the notification email when completed a job. | -m e |
-m be | Receiving the notification email both at the start and the end of job. | -m be |
-r n | Stopping the re-execution of the job when the failure occurs. | -r n |
The following options are being replaced or added from the original for the convenience of the users.
Option | Type | meaning in our system | Original meaning |
---|---|---|---|
-ug | Addition | Specifying the execution group. | - |
-W | Replacement | Specifying the upper limit of the elapsed time. | configuration of job attributes |
-A | Replacement | Specifying resources for allocating to a job. | Adding an arbitrary explanatory text in the job |
The following options are invalid in our system.
Options | Meaning in original PBS | Notes |
---|---|---|
-S | Specifying a shell for execution | In our system the execution shell is fixedly our customized bash. |
-C | Changing the directive prefix | The directive prefix in our system is #QSUB, not changeable. |
-a | Deferring execution | |
-A | Specifying accounting string | In our system -A specifies resources for allocating to a job. |
-h | Holding job | |
-k | Keeping output and error files on execution host | |
-u | Specifying job username | |
-c | Setting checkpoint | |
-G | Submitting interactive GUI Jobs on Windows | |
-p | Setting priority of job | |
-P | Setting belonging project of job |
You can set up and refer to the environment variables in user program area. PBS environment variables are customized for supercomputers of Kyoto University. See Distinctions from original PBS for details.
To set up the environment variables, using the export command. To refer to the environment variables, use “$” at the beginning of the variable name.
Setting up the environment variable.
#Format variablename=value; export variablename
LANG=en_US.UTF-8; export LANG
Referring to the environment variable.
echo $LANG
PBS automatically sets some variables when executing a job. The representatives of these environment variables are shown below.
Environment Variable | Description |
---|---|
QSUB_JOBID | Current job ID |
QSUB_QUEUE | Name of the queue submitted jobs |
QSUB_WORKDIR | Current directory submitted jobs |
QSUB_PROCS | Allocated number of processes when executing jobs |
QSUB_THREADS | Allocated number of threads per process when executing jobs |
QSUB_CPUS | Allocated number of CPU cores per process when executing jobs |
QSUB_CPUS_ALLOC | The number of CPU when converted from memory capacity to CPU cores |
QSUB_MEMORY | Upper limit of allocated memory per process when executing jobs |
QSUB_PPN | The number of placed processes per node when executing jobs |
The following environment variables are being replaced or added from the original for the convenience of the users.
Environment variables | Type | Original variable name |
---|---|---|
QSUB_JOBID | Replacement | PBS_JOBID |
QSUB_QUEUE | Replacement | PBS_O_CUEUE |
QSUB_WORKDIR | Replacement | PBS_JOBDIR |
QSUB_PROCS | Addition | |
QSUB_THREADS | Addition | |
QSUB_CPUS | Addition | |
QSUB_CPUS_ALLOC | Addition | |
QSUB_MEMORY | Addition | |
QSUB_PPN | Addition |
Use mpiexec.hydra to execute the MPI program in the system B or C. (For Intel MPI) The mpiexec.hydra command works in conjunction with PBS, and automatically starts the process of the value specified by p of the -A option.
mpiexec.hydra ./a.out
As for a job submitted to a queue, its execution order is determined by the job-scheduling policy which is set up for each queue.
In the System B and C, policies are set fixedly as shown bellow (We plan to implement the function to change the policies in the future.)
Policy | Setting |
---|---|
SCHEDULING_POLICY (policy for determining the executing order of a submitted job) |
fcfs (The submitted jobs are executed on a first-come, first-served basis.) |
PASSING_POLICY (policy for executing a job prior to a waiting job) |
pass (If the computing resources are adequate to run a particular job, the job are executed before jobs waiting to be executed.) |
Jobs Report A jobs report is created when a job is completed. The report includes the following three information:
Files created by default If the option related to the report output is not used, files are created by following the naming convention as shown below.
System | Description | File name |
---|---|---|
B | Standard output/Job information | Bmmddhh.oXXXXXX |
〃 | Standard error output | Bmmddhh.eXXXXXX |
C | Standard output/Job information | Cmmddhh.oXXXXXX |
〃 | Standard error output | Cmmddhh.eXXXXXX |
Job information on the PBS jobs report describes a summary regarding a job execution.
JobId: 3943.ja01
Job_Name = p1t1c1m90g.sh
tresource_used.cpupercent = 1
resource_used.cput = 00:00:01
resources_used.mem = 3708kb
resources_used.ncpus = 272
resources_used.vmem = 336264kb
resources_used.walltime = 00:00:13
queue = gr19999b
Each summary item is described below.
Item | Description |
---|---|
JobId | Job Id |
Job_Name | Job Name |
tresource_used.cpupercent | The instantaneous maximum value of the real CPU usage of jobs. |
resource_used.cput | The total value of CPU time which is used by jobs. |
resources_used.mem | The instantaneous maximum value of the real CPU usage of jobs. |
resources_used.ncpus | The instantaneous maximum value of the virtual CPU usage of jobs. |
resources_used.vmem | The instantaneous maximum value of the virtual memory usage of jobs. |
resources_used.walltime | Starting time of the job |
queue | queue name specified when executing jobs |
If a job is abnormally terminated, the reason of the abnormal termination will be shown in the job information of the jobs report.
The message shown when terminated by the elapsed time limit (-W option).
=>> PBS: job killed: walltime 3602 exceeded limit 3600
aprun: Apid 379565: Caught signal Terminated, sending to application
The message shown when terminated by the system side due to the memory limit (the case that memory on whole node is about to be run out)
[NID 00181] 2016-10-07 16:04:40 Apid 391976: OOM killer terminated this process.
Application 392007 exit signals: Killed
The message shown when terminated by the system side due to the memory limit (the case that a memory usage of process exceeds the limit which is specified in the jobscript)
/var/spool/PBS/mom_priv/jobs/1105396.jb.SC: line 28: 179454 Killed
See also FAQ
You can check the status of available queues using the qstat -q command.
(Example)
$ qstat -q
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- -----
eb -- -- -- -- 0 0 -- E R
tb -- -- -- -- 0 0 -- E R
tc -- -- -- -- 0 0 -- E R
pb -- -- -- -- 0 0 -- E R
pc -- -- -- -- 0 0 -- E R
gr19999b -- -- -- -- 0 0 -- E R
----- -----
0 0
Service Course | Queue Name |
---|---|
Entry course | eb |
Personal course | pb,pc |
Group course | Groupname+b/c |
When you use the group course queue, you need to specify the group using the -ug option. Such specification is not required for the entry course and personal course.
The queues named “tb” and “tc” are displayed when executing the qstat command. These queues are not allowed to be submitted batch processing jobs.
In the sixth and seventh column from the left, RUN means a number of submitted jobs and QUE means a number of pending jobs.
You can submit jobs to the queue using the qsub command.
(Example)
$ qsub sample.sh
1100.jb
You can check the status of the submitted jobs using the qstat command.
(Example 1)
$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
8622.jb qsubtest.sh b59999 00:00:00 R gr19999b
(Example 2)
[b59999 ~]$ qstat -f 1033
Job Id: 1033.jb
Job_Name = jobscript.sh
Job_Owner = b59999@laurel.kudpc.kyoto-u.ac.jp
job_state = Q
queue = gr19999b
server = jb01
Checkpoint = u
ctime = Fri Jun 10 14:55:26 2016
Error_Path = laurel.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.e1033
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Fri Jun 10 14:55:26 2016
Output_Path = laurel.kudpc.kyoto-u.ac.jp:/usr/home/b59999/jobscript.sh.o1033
Priority = 0
qtime = Fri Jun 10 14:55:26 2016
Rerunable = True
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.place = pack
Resource_List.select = 1:ncpus=1
Resource_List.walltime = 01:00:00
substate = 10
Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
PBS_O_HOME=/usr/home/b59999,PBS_O_LOGNAME=b59999,
PBS_O_WORKDIR=/usr/home/b59999,PBS_O_LANG=ja_JP.UTF-8,
PBS_O_PATH=/usr/lib64/qt-3.3/bin:/usr/home/b59999/perl5/bin:/opt/pbs/
default/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/emacs-23.4/bin:/opt/ap
p/intel/impi/5.0.3.049/bin64:/opt/app/intel/composer_xe_2015.6.233/bin/
intel64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/
bin:/usr/home/b59999/.local/bin:/usr/home/b59999/bin,
PBS_O_MAIL=/var/spool/mail/b59999,PBS_O_QUEUE=gr19999x,
PBS_O_HOST=laurel.kudpc.kyoto-u.ac.jp
comment = Not Running: Not enough free nodes available
etime = Fri Jun 10 14:55:26 2016
Submit_arguments = jobscript.sh
project = _pbs_project_default
You can cancel the submitted jobs using the qdel command.
(Example)
$ qdel 4431
qdel: Job <4431> has finished
You can check the job information details using the qs command.
(Example)
$ qs
QUEUE USER JOBID STATUS PROC THRD CORE MEM ELAPSE( limit)
gr19999b b59999 5610 RUN 4 32 32 10G 01:12( 20:00)
gr19999b b59999 5611 PEND 1 32 32 20G 00:00( 01:00)
You can check the progress of the running job using the qcat command.
The -o option is used for displaying the standard output of jobs, the -e option is used for displaying the standard error output.
(Example 1)
$ qcat -o 5610
Tue May 1 00:00:01 JST 2016
Subroutine A step1 finished
Subroutine A step2 finished
Subroutine A step3 finished
(Example 2)
$ qcat -e 5610
Tue May 1 00:00:01 JST 2016
STDERR 1
You can check the job status of group using the qgroup command. You can check the detail information of each job using the -l option.
(Example 1)
$ qgroup
QUEUE SYS | RUN PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
gr19999b B | 1 0 0 | 72 ( 36/ 72/ 144) | 0
gr19999b B | 0 0 0 | 0 ( 144/ 288/ 576) | 288
QUEUE USER | RUN(ALLOC) PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
gr19999b b59999 | 1( 72) 0( 0) 0( 0)
(Example 2)
$ qgroup -l
QUEUE SYS | RUN PEND OTHER | ALLOC ( MIN/ STD/ MAX) | READY
----------------------------------------------------------------
gr19999b B | 1 0 0 | 72 ( 36/ 72/ 144) | 0
gr19999b B | 0 0 0 | 0 ( 144/ 288/ 576) | 288
QUEUE USER | RUN(ALLOC) PEND(REQUEST) OTHER(REQUEST)
----------------------------------------------------------------
gr19999b taro123kyoto | 1( 72) 0( 0) 0( 0)
gr19999b b59999 | 0( 0) 1( 1) 0( 0)
QUEUE USER JOBID | STAT SUBMIT_AT | ALC/REQ | PROC THRD CORE MEM ELAPSE
------------------------------------------------------------------------------------------------
gr19999b taro123kyoto 104545 | RUN 2012-06-06 15:10 | 72 | 33 1 1 4000M 00:30
gr19999b b59999 104546 | PEND 2012-06-06 15:10 | 1 | 1 1 1 3800M 00:30
Header | Overview |
---|---|
RUN, PEND, OTHER | The number of jobs |
ALLOC | The number of assigned cores (The number of cores are incremented according to the amount of used memory.System B: 3413MB,System C:42666MB) |
MIN | The guaranteed minimum number of cores for the queue. |
STD | The standard number of cores for the queue. |
MAX | The maximum number of cores for the queue. |
READY | The number of cores that are available immediately. |
Header | Overview |
---|---|
RUN (ALLOC) | The number of running jobs and the amount of assigned resources. (in the number of cores) |
PEND (REQUEST) | The number of waiting jobs and the amount of their requested resources. (in the number of cores) |
OTHER (REQUEST) | The number of jobs in states other than above and the amount of their requested resources. (in the number of cores) |
Header | Overview |
---|---|
STAT | Batch request state.(RUN) |
SUBMIT_AT | Batch request queuing date and time. |
ALC/REQ | The number of assigned cores. |
PROC | Allocated number of processes when executing jobs. |
THRD | Allocated number of threads per process when executing jobs. |
CORE | Allocated number of CPU cores per process when executing jobs. |
MEM | Upper limit of allocated memory per process when executing jobs. |
ELAPSE | Upper limit of the elapsed time. |
For the differences from LSF that is supported on the previous Supercomputer System, see For Users of the Previous System.
Hyper-threading is available on processors in the System B. A single physical CPU core can be used as two virtual cores. In the option -A of PBS, specify the number of threads (“t”) and cores (“c”) per process so that the value of “t” is two times “c”, you will be able to activate two threads on a CPU core.
(Example)
$ tssrun -A p=8:t=4:c=2 mpiexec.hydra ./a.out
You can specify the execution order of jobs by -W
option.
-W
is used when submission.
-W depend=afterok:<jobid>
For example, assume you want to submit b.sh after a.sh ends. If a.sh is submitted and its job number is 10000.jb, by the following command b.sh is submitted after a.sh ends.
$ qsub -W depend=afterok:10000.jb b.sh
When you type qstat or qs, you will see that b.sh remains "HOLD" state till the end of a.sh.
To submit a.sh and b.sh at once, you can use like the following script.
i.e) using bash
#!/bin/bash
JOBID=`qsub a.sh`
qsub -W depend=afterok:$JOBID b.sh
$ sh depend.sh
QUEUE USER JOBID STATUS PROC THRD CORE MEM ELAPSE( limit)
gr19999b w00001 116316 RUN 1 1 1 3413M 00:01( 01:00)
gr19999b w00001 116317 HOLD 1 1 1 3413M 00:00( 01:00)
To use array jobs by which similar jobs are submitted at once, specify -J
option in your jobscript.
#QSUB -J <start number>-<end number>[:step]
For example, if you specify -J 1-4
, the jobscript is submitted four times continuously.
And by -J option, the counter $PBS_ARRAY_INDEX
is set and is incremented each time the jobscript is submitted.
e.g.) if you want to submit four jobs, which are ./a.out < 1.data
, /a.out < 2.data
,...,./a.out < 4.data
jobscript (assume its name is array.sh)
#!/bin/bash
#QSUB -q gr19999b
#QSUB -ug gr19999
#QSUB -A p=36:t=1:c=1:m=3413M
#QSUB -J 1-4
mpiexec.hydra ./a.out < ${PBS_ARRAY_INDEX}.data
$ qsub array.sh
$ qs
QUEUE USER JOBID STATUS PROC THRD CORE MEM ELAPSE( limit)
gr19999b b59999 3023275[1] PEND 36 1 1 3413M 00:00( 01:00)
gr19999b b59999 3023275[2] PEND 36 1 1 3413M 00:00( 01:00)
gr19999b b59999 3023275[3] PEND 36 1 1 3413M 00:00( 01:00)
gr19999b b59999 3023275[4] PEND 36 1 1 3413M 00:00( 01:00)
/a.out < 1.data
is executed by the job 3023275[1],/a.out < 2.data
is executed by the job 3023275[2],...
The direct connection from the computing nodes to the internet is available.
Users can connect to the internet during the execution of batch job or interactive execution.
The IP address for the internet connetion is 133.3.51.9
.
Please note that you cannot connect from the internet to the compute nodes.