GitList -GitList

Raw Blame History

--- title: 'Use of Cloud System' published: true taxonomy: category: - docs external_links: process: true title: false no_follow: true target: _blank mode: active --- This page explains how to use the Cloud System. [toc] ## Overview{#overview} The cloud system consists of a small computing cluster using a commercial cloud service. It is a system that can execute jobs in the same way as the supercomputer system installed at our Center by connecting to the storage and job scheduler of the supercomputer system installed at our Center. Anyone who has a user ID of the supercomputer system can use the cloud system. ## System Configurations{#configration} The specifications of the computers used in the cloud system are as follows. The number of nodes in the cloud system varies depending on the demand. ### Specifications of the Computation Node{#specification} #### Currently available nodes The following bare metal instances will be used for the time being from April 1, 2023. A maximum of 30 nodes are expected to be available until around October 2023, when system A is scheduled to start operation. |Item | Content | |-------------- |------------------------------------| |Number of nodes | Variability | |Processor Name | Intel Xeon Gold 6154 3.0GHz 18 cores | |Number of Processors (cores) | 2 (36 cores/nodes) | |Architecture | x86-64 | |Performance | 3.45TFlops/nodes | |Memory Capacity | 384GByte | |Network | 25Gbps Ethernet | #### Nodes used in the past The following bare metal instances were used from November 2022 to the end of March. The following resources have been temporarily suspended due to the situation where cloud-side resources tend to be used up. We will revert to the following resources once we have confirmed an improvement in the supply. |Item | Content | |-------------- |------------------------------------| |Number of nodes | Variability | |Processor Name | Intel Xeon Gold 6354 3.0GHz 18 cores | |Number of Processors (cores) | 2 (36 cores/nodes) | |Architecture | x86-64 | |Performance | 3.45TFlops/nodes | |Memory Capacity | 512GByte | |Network | 50Gbps Ethernet | ## How to Use{#login} After logging into the supercomputer login node, jobs can be submitted to the cloud system by switching to the environment using the cloud system with the following module command. ```nohighlight $ module switch SysCL ``` ## Job Execution{#sbatch} You can use the same program without recompilation since the program is executed on a node with the same Xeon processor as system A/B/C. You can use the the same usability except for the differences in the configuration of job queues (partitions), number of cores per node and memory capacity. For details, please refer to [Program Execution](/run). ### Queue configuration for cloud system{#queue} The cloud system has the following queue configuration. The eo queue is limited to short duration jobs for debugging. The so queue can run small jobs. Large queues may be offered depending on the situation. #### Example of job script of non-parallelized program ```bash #!/bin/bash #============ Slurm Options =========== #SBATCH -p eo # Specify the job queue (partitions). You need to change to the name of the queue which you want to submit to. #SBATCH -t 1:00:00 # Specify the elapsed time (e.g. specifying an hour). #SBATCH --rsc p=1:t=1:c=1 # Specify the request resource #SBATCH -o %x.%j.out # Specify the standard output file for the job. #============ Shell Script ============ set -x srun ./a.out ``` #### Example of job script of MPI program (8 process) ```bash #!/bin/bash #============ Slurm Options =========== #SBATCH -p eo # Specify the job queue (partitions). You need to change to the name of the queue which you want to submit to. #SBATCH -t 1:00:00 # Specify the elapsed time (e.g. specifying an hour). #SBATCH --rsc p=8:t=1:c=1 # Specify the request resource #SBATCH -o %x.%j.out # Specify the standard output file for the job. #============ Shell Script ============ set -x srun ./a.out ``` #### Available queues{#available} |Queue name | Available users| |----- | ------------- | | eo | all users| | so | Users belonging to personal courses, group courses and private cluster courses| #### Available computing resources per queue{#resource} | Description | eo queue Initial value | Maximum value | so queue Initial value | Maximum value | | --- | --- | --- | | Number of processes( --rsc p=_X_ ) | 1 | 36 | 1 | 36 | | Number of threads per process ( --rsc t=_X_ ) | 1 | 36 | 1 | 36 | | Number of cores per process ( --rsc c=_X_ ) | 1 | 36 | 1 | 36 | | Amount of memory per process( --rsc m=_X_ ) (Unit：M, G) | 13G | 500G | 13G | 500G | | Elapsed time ( -t ) | 1 hour | 1 hour |1 hour | 7 days| | Number of concurrent executions per user | 1 | 1 | 1 | 1 | ### /tmp area{#tmp} The /tmp area can be used as a temporary data write destination on our supercomputer system. In some cases, programs with few file I/O can be processed faster by using /tmp. /tmp is a private area for each job, so that files are not mixed with those of other jobs. Please take advantage of this feature. Please note that the /tmp area is automatically deleted at the end of the job, so it is necessary to include a description in the job script that copies the files to /home and /LARGE0,/LARGE1 to keep the files written to /tmp. Please note that the deleted files cannot be retrieved later. #### Specific examples of using /tmp {#tmp_example} * If the file can be specified programmatically, specify /tmp as the file's write destination. * Place files to be read repeatedly in /tmp before program execution starts. * Run programs and input files that access files with a relative PATH after placed in /tmp. #### Available /tmp capacity{#tmp_available} Available /tmp capacity in the cloud system is obtained by `Number of processes x Number of cores per process x 94GB`. For example, if you submit a job with 4 processes (8 cores per process), you will be allocated **3,008GB** from `4 x 8 x 94`. ## Supplemental information on storage access for cloud system{#supplemental} ### Access to home directories and large volume storage{#storage} In the cloud system, the files can be accessed with the same PATH since the home directory ($HOME) and large volume storage (/LARGE) are mounted in the same way as on-premise systems such as system B. However, network distance and bandwidth constraints between the cloud system and the storage at our university prevent full storage performance. For programs with few I/O, there is little difference in usability from System B. However, for programs with many I/O, please take advantage of /tmp as described in the previous section. ### Handling of Large Files{#large} If you write huge files to /home or /LARGE0/1, it tends to cause poor response, so please transfer files efficiently. Below is an example of a tar command that combines the following specific directories into a single file. ```bash ## Example commands of rsyn(With compression(-z)) $ rsync -za /tmp/target-dir/ /LARGE0/gr19999/remote-dir/ ## Example commands of tar+gzip $ cd /tmp $ tar -zcvf target-dir archive-name.tar.gz; cp archive-name.tar.gz /LARGE0/gr19999/archive-name.tar.gz ## Example commands of tar+zst (Faster and higher compression than gzip) $ cd /tmp $ tar -Izstd -cvf target-dir archive-name.tar.zst; cp archive-name.tar.gz /LARGE0/gr19999/archive-name.tar.zst ``` ### MPI performance between nodes{#mpi} The 50 Gbps Ethernet between nodes is not very high as supercomputer performance. As there are not many computing resources available, the number of nodes per job is limited to one node.