How to Use IME

For systems A, B, and C, a mechanism called a burst buffer is connected to speed up file access. The burst buffer is a temporary storage with high speed and small capacity and is connected between the compute node and the file system. When the program writes from the compute node to the file system, high-speed writing is realized by temporarily writing the file to the burst buffer. Also, when reading from the file system, by placing a copy in the burst buffer, high speed reading can be realized the next time the same file is read. Files in the burst buffer are automatically written back to the file system, and data consistency is preserved. As a burst buffer, DataWarp of Cray is used for the system A, and IME of DDN is introduced to the systems B and C. The burst buffer for the system A is currently being prepared.

Since the burst buffer is a storage constituted by SSD (solid state drive), it is effective when random access which SSD is good at or when many processes access a shared file (single-shared file). However, since it is not effective for all programs, you need to examine the performance with your own program and check the effectiveness.

In order to use the burst buffer, application for use is necessary. Those who are eligible to apply for the burst buffer are those who apply for the personal course and the group course. The entry course and the private cluster users are not available. If you wish to apply, please send the following requirements by an inquiry formor an e-mail tothe online consultation office

Title:
Request to use burst buffer

Text:
To Academic Center for Computing and Media Studies, Kyoto University

We apply for the use of the burst buffer as follows.

User number:
Name:
Affiliation:
System:
Group name (for group course):

IME can be used as a burst buffer area of large capacity disk /LARGE0.It cannot be used for LARGE 1, 2, 3. If you have the area of large capacity disk in /LARGE2,3 and you want to use the burst buffer, please contact us. We will offer to move your area of large capacity disk to /LARGE0,1.

Declare the use of the burst buffer by specifying the -bb option when executing the batch job. IME supports the Posix interface, so you can access it like a regular file system. Since IME shares metadata (file information) with the Luster file system, IME is capable to transparently access to /LARGE0 files. As shown below, the absolute PATH of the directory is different, but you can refer to the same file tree except that /IME is added to the beginning of the IME PATH. If the file referenced by the program is specified in relative PATH, you can use it without changing the program or input file.

  • Lustre: /LARGE0/gr19999/file
  • IME: /IME/LARGE0/gr19999/file

To use the burst buffer, write the -bb option in the job script.

#QSUB -bb capacity=XXXX
or
#QSUB -bb capacity=XXXX:pfs=YYYY
Option Meaning Remarks
capacity The upper limit value of the temporary file secured in the burst buffer Unit specification is G(giga).Example)-bb capacity=2000G
pfs Directories to which the burst buffer is applied Specify the directory under LARGE 0. If omitted, the current directory is specified.It automatically moves (cd) to the directory specified by pfs. The directory PATH is set in the environment variable $ QSUB_BB_DIR.

For each service course, there is an upper limit as follows for the configurable value of capacity. Even if you set a value higher than the upper limit in the job script, you will get an error when you submit the job. Even if you do not apply for IME use and set -bb option, you will get an error.

Service Course Upper limit of service course capacity
Personal 800GB
Group 200GB x number of application nodes

#!/bin/bash
#============ PBS Options ============ 
#QSUB -q gr19999b
#QSUB -ug gr19999
#QSUB –W 2:00:00
#QSUB -A p=8:t=4:c=4:m=3413M
#QSUB –bb capacity=100G
#============ Shell Script ============
mpiexec.hydra ./a.out

IME reads the files in Lustre directly unless you cached the file in advance. When you want caching in advance, you need to execute a dedicated command, but there is no advantage of caching the file unless you need to read the same file repeatedly. Therefore, the method to cache files to read is not explained in this manual.

Please contact us if there is a useful usage method.

When writing process is done on the IME, it is cached on the IME. At the end of the job, the area specified at the job submission (pfs of –bb) is automatically exported to Lustre. If the area specification is omitted, the current directory at the time of job submission is targeted. By executing a dedicated command, it is also possible to explicitly write out by yourself. Also, if you write a file in an IME area not specified in the job script, you need to export the file by yourself.

The directory path of the IME is the one which "/IME" is added in the beginning of "/LARGE0". For example, if you specify pfs in the job script as "#QSUB -bb pfs = /LARGE0 /gr19999 /b59999 /data", the directory path on the IME is "/IME/LARGE0/gr19999/b59999/data".

Since the IME is not mounted on the login node, it is not possible to refer to the IME area from the login node.

In the case of a normal job, it automatically moves to the directory where the qsub command was executed and starts the job script. When declaring the use of a burst buffer with -bb, the job moves to the IME area (directory starting with / IME) in the directory specified by pfs and starts the job script. If you omit pfs, the job starts in the IME area corresponding to the current directory where qsub was executed. As a result, all files accessed with relative PATH will be accessed via the IME.

The directory at job submission is saved in the environment variable $ QSUB_WORK_DIR. The directory specified with pfs of the -bb option is stored in $QSUB_BB_DIR.

The IME refers to the same metadata server as /LARGE0, so the same file can be read and written via the IME. However, in the respect of access performance of metadata, IME is inferior to Lustre because IME refers to the metadata of remote /LARGE0. Please understand beforehand that it is not suitable for access patterns that open/close a large amount of data.

Minimizing the number of files placed in one directory has a good performance impact. The same can be said for /LARGE0, but IME has a strong influence.

When you write a file to the IME, file information is also written to the metadata server of /LARGE0, so you can immediately check the existence of the file. However, in the state where files are cached in the IME, there is no file in /LARGE0, so even if you check with the ls-l command, the capacity is 0.

You can not refer to the contents until you synchronize the file from /IME to /LARGE0 at the end of the job.

Example:

Create data in the IME directory
$ dd if=/dev/urandom of=/IME/LARGE0/gr19999/b59999/data/512mfile-2 bs=1M count=512

Execute ls –l 
$ ls -l /IME/LARGE0/gr19999/b59999/data
-rw-r--r-- 1 b59999 gr19999 536870912 10 Oct 13:53 /IME/LARGE0/gr19999/b59999/data/512mfile-2

Execute ls -l on same name directory of large disk
$ ls -l /LARGE0/gr19999/b59999/data
-rw-r--r-- 1 b59999 gr19999 0 10 Oct 13:53 /LARGE0/gr19999/b59999/data/512mfile-2
                              ↑The capacity is zero.

It automatically synchronizes and releases the files at the end of the job. When the capacity of IME you specified is to be exhausted, IME will automatically synchronize and release the files.

It is also possible to give instructions to the job script by commands. If an error occurs due to the capacity limit of /LARGE0 during synchronization process, it is necessary to synchronize by yourself on the login node.

Usage: ime-sync [-b] [-V] <file path of LARGE 0>
-b: This option does not end the command until synchronization is complete.
     If not specified, the command ends only by requesting a synchronization.
-V: Command to display the progress of synchronization.
Note: Please specify the absolute PATH of the LARGE area starting with /LARGE0.
       Even if you specify IME's PATH beginning with /IME, it will result in an error.

By synchronizing, file export processing from /IME to /LARGE0 is done, but the file remains on /IME. It is necessary to release the file on /IME and use the release command to free space.

Usage: ime-release [-k] <file path of LARGE0>
-k: Files whose synchronization is not completed are not released.
    If you do not specify it, please note that any files that is not synchronized will be released and lost.
Note: Please specify the absolute PATH of the LARGE area starting with /LARGE0.
      Even if you specify IME PATH beginning with /IME, it will result in an error.

Currently we omit it because it is not assumed to be used. If you have a good use case and request to try it, please contact us.

Even while using the IME, it is possible to read a file which already exists in /LARGE0 with the PATH of /LARGE0. However, please avoid writing files on /LARGE0 side as the file might be going to be broken.

Because IME has less capacity than LARGE, it monitors whether it is within the capacity declared with -bb. Please be aware that jobs exceeding capacity will be automatically killed. In addition, since monitoring is performed by totaling the usage under the directory in IME, when declaring to use the same directory with multiple jobs, all jobs will be killed if the total usage of the jobs exceeds the directory's declared capacity, which doesn't relate to whether each job itself doesn't exceed the capacity.

Considering that the metadata performance is not high as mentioned above, it is recommended to separate the directories for each job.

Since the area starting with /IME can be accessed by users and group permissions as usual, it is also possible to read and write files in areas not specified with -bb. However, as for the target to synchronize automatically at the end of job execution, only the area specified by pfs of -bb (if omitted specifying, the directory when submitting jobs) is targeted, so please avoid writing outside the specified area. There is no particular problem in reading.

All files remaining in the IME area will be deleted. Before maintenance, files in the directory which is specified with -bb pfs option by the working job are automatically written out to Lustre when putting the job in the Hold state.

The files other than that will be lost, so please synchronize them yourself before maintenance.

The commands to control IME is introduced as follows.

Output information of cached files in IME.

Usage: ime-stat [OPTION] ABSOLUTE_FILE_PATH
  -r, --enable-recursive     Enable Recursive

Synchronize data cached in IME to Lustre.

Usage: ime-sync [OPTION] ABSOLUTE_FILE_PATH
  -b, --block                Wait for command completion
  -r, --enable-recursive     Enable Recursive
  -V, --verbose              Verbose output

Delete data cached in IME.

Usage: ime-release [OPTION] ABSOLUTE_FILE_PATH
  -k, --keep-unsync          Remove only synchronized data
  -r, --enable-recursive     Enable Recursive
  -V, --verbose              Verbose output

List all the files cached in IME.

Usage: ime-lsfile ABSOLUTE_DIRECTORY_PATH

Concatenate files cached in IME and output them to the standard output.

Usage: ime-cat ABSOLUTE_FILE_PATH

Not provided

A vendor-supplied ime- * command is inconvenient because it is necessary to specify an absolute PATH, so a wrapper command is provided.

Common actions of commands:
・Convert the relative PATH to absolute PATH and execute the original command.
・/Convert PATH on IME including /IME to /LARGE0 PATH and execute original command.
・Ignore -c -l -L option..
Command name Action
imeutil-stat Execute ime-stat.
imeutil-sync Execute ime-sync.
imeutil-release, imeutil-force-release imeutil-release execute ime-release -k. imeutil-force-release execute ime-release.
imeutil-cat Execute ime-cat.
imeutil-lsfiles Execute ime-lsfiles.
imeutil-sync-and-release After ime-sync -V -b is executed, ime-release -V -k is executed.

We are publishing tutorial materials. (Japanese version only)

IME Tutorial


Copyright © Academic Center for Computing and Media Studies, Kyoto University, All Rights Reserved.