Use of Cloud System

This page explains how to use the Cloud System.

The cloud system consists of a small computing cluster using a commercial cloud service. It is a system that can execute jobs in the same way as the supercomputer system installed at our Center by connecting to the storage and job scheduler of the supercomputer system installed at our Center.

Anyone who has a user ID of the supercomputer system can use the cloud system.

The specifications of the computers used in the cloud system are as follows. The number of nodes in the cloud system varies depending on the demand.

The following bare metal instances will be used for the time being from April 1, 2023. A maximum of 30 nodes are expected to be available until around October 2023, when system A is scheduled to start operation.

Item Content
Number of nodes Variability
Processor Name Intel Xeon Gold 6154 3.0GHz 18 cores
Number of Processors (cores) 2 (36 cores/nodes)
Architecture x86-64
Performance 3.45TFlops/nodes
Memory Capacity 384GByte
Network 25Gbps Ethernet

The following bare metal instances were used from November 2022 to the end of March. The following resources have been temporarily suspended due to the situation where cloud-side resources tend to be used up. We will revert to the following resources once we have confirmed an improvement in the supply.

Item Content
Number of nodes Variability
Processor Name Intel Xeon Gold 6354 3.0GHz 18 cores
Number of Processors (cores) 2 (36 cores/nodes)
Architecture x86-64
Performance 3.45TFlops/nodes
Memory Capacity 512GByte
Network 50Gbps Ethernet

After logging into the supercomputer login node, jobs can be submitted to the cloud system by switching to the environment using the cloud system with the following module command.

$ module switch SysCL

You can use the same program without recompilation since the program is executed on a node with the same Xeon processor as system A/B/C. You can use the the same usability except for the differences in the configuration of job queues (partitions), number of cores per node and memory capacity.

For details, please refer to Program Execution.

The cloud system has the following queue configuration. The eo queue is limited to short duration jobs for debugging. The so queue can run small jobs. Large queues may be offered depending on the situation.

#!/bin/bash
#============ Slurm Options ===========
#SBATCH -p eo              # Specify the job queue  (partitions). You need to change to the name of the queue which you want to submit to. 
#SBATCH -t 1:00:00         # Specify the elapsed time (e.g. specifying an hour).
#SBATCH --rsc p=1:t=1:c=1  # Specify the request resource
#SBATCH -o %x.%j.out       # Specify the standard output file for the job.
#============ Shell Script ============
set -x

srun ./a.out

#!/bin/bash
#============ Slurm Options ===========
#SBATCH -p eo              # Specify the job queue  (partitions). You need to change to the name of the queue which you want to submit to. 
#SBATCH -t 1:00:00         # Specify the elapsed time (e.g. specifying an hour).
#SBATCH --rsc p=8:t=1:c=1  # Specify the request resource
#SBATCH -o %x.%j.out       # Specify the standard output file for the job.
#============ Shell Script ============
set -x

srun ./a.out

Queue name Available users
eo all users
so Users belonging to personal courses, group courses and private cluster courses

Description eo queue
Initial value

Maximum value
so queue
Initial value

Maximum value
Number of processes( --rsc p=X ) 1 36 1 36
Number of threads per process ( --rsc t=X ) 1 36 1 36
Number of cores per process ( --rsc c=X ) 1 36 1 36
Amount of memory per process( --rsc m=X )
(Unit:M, G)
13G 500G 13G 500G
Elapsed time ( -t ) 1 hour 1 hour 1 hour 7 days
Number of concurrent executions per user 1 1 1 1

The /tmp area can be used as a temporary data write destination on our supercomputer system. In some cases, programs with few file I/O can be processed faster by using /tmp.

/tmp is a private area for each job, so that files are not mixed with those of other jobs. Please take advantage of this feature.

Please note that the /tmp area is automatically deleted at the end of the job, so it is necessary to include a description in the job script that copies the files to /home and /LARGE0,/LARGE1 to keep the files written to /tmp. Please note that the deleted files cannot be retrieved later.

{#tmp_example}

  • If the file can be specified programmatically, specify /tmp as the file's write destination.
  • Place files to be read repeatedly in /tmp before program execution starts.
  • Run programs and input files that access files with a relative PATH after placed in /tmp.

Available /tmp capacity in the cloud system is obtained by Number of processes x Number of cores per process x 94GB.

For example, if you submit a job with 4 processes (8 cores per process), you will be allocated 3,008GB from 4 x 8 x 94.

In the cloud system, the files can be accessed with the same PATH since the home directory ($HOME) and large volume storage (/LARGE) are mounted in the same way as on-premise systems such as system B.

However, network distance and bandwidth constraints between the cloud system and the storage at our university prevent full storage performance.

For programs with few I/O, there is little difference in usability from System B. However, for programs with many I/O, please take advantage of /tmp as described in the previous section.

If you write huge files to /home or /LARGE0/1, it tends to cause poor response, so please transfer files efficiently. Below is an example of a tar command that combines the following specific directories into a single file.

## Example commands of rsyn(With compression(-z))
$ rsync -za /tmp/target-dir/ /LARGE0/gr19999/remote-dir/

## Example commands of tar+gzip 
$ cd /tmp
$ tar -zcvf target-dir archive-name.tar.gz; cp archive-name.tar.gz /LARGE0/gr19999/archive-name.tar.gz

## Example commands of tar+zst (Faster and higher compression than gzip)
$ cd /tmp
$ tar -Izstd -cvf target-dir archive-name.tar.zst; cp archive-name.tar.gz /LARGE0/gr19999/archive-name.tar.zst

The 50 Gbps Ethernet between nodes is not very high as supercomputer performance. As there are not many computing resources available, the number of nodes per job is limited to one node.