GitList -GitList

Raw Blame History

--- title: 'Systems available in FY2022' published: false taxonomy: category: - docs external_links: process: true title: false no_follow: true target: _blank mode: active --- [toc] ## Overview {#overview} We will start the service of system D, system E and system G from 10:00 a.m. , January 13, 2023. System D and system E will be equipped with Intel Xeon processors, and system G will be equipped with AMD EPYC processors and GPUs. System D/E is the same Xeon processor as the previous system B and C, and is available with Intel compilers. You can use system D in place of system B, and system E in place of system C, in a similar way as before. System G is equipped with four GPUs (NVIDIA A100 80GB SXM ) per node as accelerators, and it is suitable for running applications that utilize GPGPU, such as machine learning.  User ID of a large-scale computer systems is required to use the system. You can use the system without usage fees in FY2022 due to the trial operation period. ## System Configurations {#system} The specifications of the system D/E/G are as follows. You can access the HOME and LARGE areas of storage from any system with the same PATH. Please refer to [Using File System](/filesystem) for details on storage.  ![system_deg](system_deg.png?width=900) ## How to access the system {#login} Login to the system is limited to SSH (Secure SHell) key authentication. [Generate a key pair](/login/pubkey#keygen)and [Register your public key](/login/pubkey#regist) from [User Portal](https://web.kudpc.kyoto-u.ac.jp/portal) and then you can login. Please refer to [Access](/login) for details on 1. To log in to the system, access to the following hosts via SSH public key authentication. ```nohighlight ## When logging in to system D/E $ ssh laurel.kudpc.kyoto-u.ac.jp -- ## When logging in to system G $ ssh gardenia.kudpc.kyoto-u.ac.jp ``` 2. Switch to the module file for each system. ```nohighlight ## When switching to the system D environment $ module switch SysD -- ## When switching to the system E environment $ module switch SysE -- ## When switching to the system G environment ## (When you log in to gardenia, the system G environment is automatically set by default. Normally, there is no need to switch.) $ module switch SysG ``` 3. Check your current environment if necessary. If system D and slurm are loaded, you can submit jobs to the system D environment. ```nohighlight ## For system D $ module list Currently Loaded Modulefiles: 1) slurm/2022 2) SysD/2022 3) intel/2022.3 4) intelmpi/2022.3 5) PrgEnvIntel/2022 ``` ## Login Environment{#env} The environmental settings such as programming environment and libraries are configured with the module command same as the previous system. However, the name of the module has been changed, you will need to review it if you are automatically loading it with .bashrc, etc when logging in. Please refer to the following pages for the login environment and changes from the previous system. * [For Users of the Previous System](/migration) * [Setting up your Environment](/config) * [Modules - a tool for setting up the environment](/config/modules) ## How to Compile {#compile} ### System D/E Intel compilers are available. It is set to automatically load the Intel compiler settings immediately after login. Please refer to [Compilers and Libraries](/compilers) for details. ### System G NVIDIA HPC SDK compilers are available. It is set to automatically load the NVIDIA HPC SDK compiler settings immediately after login. Please refer to [Compilers and Libraries](/compilers) for details. ## Job Execution {#job} Slurm is used as the job scheduler. Use the `spartition` command to check queues, the `squeue` command to check jobs, and the `sbatch` command to submit jobs. In addition to the standard slurm options, the --rsc option is added as a customized option specific to our center. You can specify the required computing resources with the same syntax used in the previous system's job scheduler (PBS) as the -A option. In principle, please add the srun command before the program to be executed when executing a job by slurm. It is used instead of the mpiexec command. Parameters such as the number of parallels are obtained automatically from the --rsc option and can be omitted. Although it can be omitted in the sequential program, it is recommended to add. Please refer to [Program Execution](/run) for details. #### Example of job script for a sequential program ```nohighlight #!/bin/bash #============ Slurm Options =========== #SBATCH -p gr19999d # Specify job queue name (partition name). Check the available queue names with the spartition command. #SBATCH -t 1:00:00 # Specify a maximum elapsed time limit of 1 hour. It is easier to be scheduled if a lean limit is specified. #SBATCH --rsc p=1:t=1:c=1 # Specify the requested resource. #SBATCH -o %x.%j.out # Specify the standard output/error output file for the job. %x is replaced by the job name (job script name) and %j is replaced by the job ID. #============ Shell Script ============ set -x srun ./a.out ``` #### Example of a job script for an MPI program (8 Process) ```nohighlight #!/bin/bash #============ Slurm Options =========== #SBATCH -p gr19999d # Specify job queue name (partition name). #SBATCH -t 1:00:00 # Specify an elapsed time. #SBATCH --rsc p=8:t=1:c=1 # Specify the requested resource. #SBATCH -o %x.%j.out # Specify the standard output/error output file for the job. #============ Shell Script ============ set -x srun ./a.out ``` #### Example of a job script for a sequential program using a GPU ```nohighlight #!/bin/bash #============ Slurm Options =========== #SBATCH -p gr19999g # Specify job queue (partition). #SBATCH -t 1:00:00 # Specify an elapsed time. #SBATCH --rsc g=1 # Specify the requested resource. #SBATCH -o %x.%j.out # Specify the standard output/error output file for the job. #============ Shell Script ============ set -x srun ./a.out ``` ### Queue Configuration{#queue} The following table shows the queue configuration during the 2022 trial operation period. The available job queue is allocated according to the status of applications in the first half of FY2022. Since the amount of computing resources is not yet sufficient compared to the previous system, the number of nodes per job available is approximately 25% to 40% of the previous system. We apologize for any inconvenience until the system will be the final configuration. Although many job queues are listed in the table, you can check queues authorized to submit with `spartition` command. ```bash ## Example of command display $ spartition Partition State Rmin Rstd Rmax DefTime MaxTime ed UP 0 40 5000 01:00:00 01:00:00 pd UP 0 40 5000 01:00:00 7-00:00:00 s2d UP 0 80 5000 01:00:00 7-00:00:00 ``` #### Available queues in system D{#queue-sysd} Authorization is granted primarily to those who used system A/B from April to July 2022. Depending on the system congestion, elapsed time limits, etc. may be adjusted. | Queue name | Maximum number of nodes per job | Elapsed time limit | Available Users| |----- | --- | -------------- | ------------- | | ed | 1 node | 1:00:00 (1 hour) | All users (entry course) | | pd | 1node | 7-00:00:00 (7 days) |Personal course of previous system B | | s2d | 2 nodes | 7-00:00:00 (7 days) | Group of previous system B/Group applied for 1 to 4 nodes for the private cluster| | s4d | 4 nodes | 7-00:00:00 (7 days) | Group of previous system B/Group applied for 5 to 8 nodes for the private cluster| | s8d | 8 nodes | 7-00:00:00 (7 days) | Group of previous system B/Group applied for 9 to 16 nodes for the private cluster | | s16d | 16 nodes | 7-00:00:00 (7 days) | Group of previous system B/Group applied for 17 to 32 nodes for the private cluster| | s32d | 32 nodes | 7-00:00:00 (7 days) | Group of previous system B/Group applied for 32 and more nodes for the private cluster | | Individual queue | Individual | 7-00:00:00 (7 days) | Large-scale group, users such as Organization Fixed Rate, etc. (set usage authorization according to the usage of the first half of FY2022) | l64d | 64 nodes | 2-00:00:00 (2 days) | Large-scale group of 32 or more nodes of the previous system B application | #### Available queues in system E{#queue-syse} Authorization is granted primarily to those who used system C from April to July 2022. Depending on the system congestion, elapsed time limits, etc. may be adjusted. | Queue name | Maximum number of nodes per job | Elapsed time limit | Available Users| |----- | --- | -------------- | ------------- | | pe | 1 node | 7-00:00:00 (7 days) | Personal course of previous system C | | s1e | 1 node | 7-00:00:00 (7 days) | Group course of previous system C，Private Cluster Course | | Individual queue | Individual | 7-00:00:00 (7 days) | Large-scale group course, users such as Organization Fixed Rate, etc. (set usage authorization according to the usage of the first half of FY2022) | #### Available queues in system G{#queue-sysg} Authorization is granted primarily to those who used system C from April to July 2022. Depending on the system congestion, elapsed time limits, etc. may be adjusted. | Queue name | Maximum number of nodes per job | Elapsed time limit | Available Users| |----- | --- | -------------- | ------------- | | eg | 1 node(4GPU) | 1:00:00 (1 hour) | All users (entry course) | | Individual queue | Individual | 7-00:00:00 (7 days) | Group course, some users of Private Cluster Course | #### Supplemental available computing resources per queue{#rscnote} Due to the hardware configuration restriction of the physical node, the --rsc option of the job script has the following restrictions. The maximum number of nodes per job available in each queue cannot be exceeded. | Description | System D Initial value | Maximum value | System E Initial value | Maximum value | System G Initial value | Maximum value | | --- | --- | --- | --- | --- | --- | --- | | Number of Process ( --rsc p=_X_ ) | 1 | 40 cores x Number of nodes | 1 | 40 cores x Number of nodes | 1 　　　　　 | 64 cores x Number of nodes | | Number of threads per process ( --rsc t=_X_ ) | 1 | 40 or 80(Logical core) | 1 | 40 or 80(Logical core) | 1 　　　　　 | 64 or 128(Logical core) | | Number of cores per process( --rsc c=_X_ ) | 1 | 40 cores | 1 | 40 cores | 1 　　　　　 | 64 cores | | Amount of memory per process( --rsc m=_X_ ) (Unit：M, G) | 4750M | 185G | 19250G | 751G | 8000M 　　　　 | 500G | | Number of GPUs 　　　　　　　　　　　　　　　　　 | - 　　　　　 | - 　　　　　 | - 　　　　　　| - 　　　　　　| 1 | 4 x Number of nodes | ### Job Submission{#submit} You can submit Jobs by specifying a job script as the argument to the `sbatch` command. Please refer to [Batch Processing](/run/batch) for details.