Intel Compiler Classic

Version Module file Name System A System B/C System G Cloud Systems Notes
2024.0 intel/2024.0 + + - + Introduced in April 2024. Intel compiler + MKL + TBB
2024.0-gpu intel/2024.0-gpu - - + - Introduced in September 2024. Intel oneAPI for NVIDIA GPU
2023.2 (default) intel/2023.2 + + + + Introduced in April 2024. Intel compiler + MKL + TBB
2023.2-gpu intel/2023.2 - - + - Introduced in September 2024. Intel oneAPI for NVIDIA GPU
2023.2-rt intel/2023.2-rt + + + + Introduced in April 2024. runtime libraries
2023.2-rt-gpu intel/2023.2-rt - - + - Introduced in September 2024. Intel oneAPI for NVIDIA GPU. runtime libraries
2023.1 intel/2023.1 + + - + Introduced in August 2023. Intel compiler + MKL + TBB
2023.1-rt intel/2023.1-rt + + - + Introduced in August 2023. runtime libraries
2022.3 (default) intel/2022.3 + + - + Introduced in November 2022. Intel compiler + MKL + TBB
2022.3 intel/2022.3-rt + + - + Introduced in November 2022. runtime libraries

Note: The Intel compilers are listed in the comprehensive version of Intel OneAPI, not in the individual Intel compiler versions.

+ : Available for all users
- : Not available

In system A, B, C and Cloud, the Intel compiler is set by default when you log into the system.

$ module list
Currently Loaded Modulefiles:
x) slurm x) SysB/2022  x) intel/2022.3 x) PrgEnvIntel/2022 

When you log into the system, the NVIDIA HPC SDK compiler is available by default. Please execute the module command as shown below to switch compilers.

$ module switch PrgEnvNvidia PrgEnvIntel

Language Command Executable form
C icc icc [Option] File Name
C++ icpc icpc [Option] File Name
Fortran ifort ifort [Option] File Name

Option Purpose
-o FILENAME Specifies the name of the object file.
-mcmodel=medium Enable to use memory in excess of 2 GB.
-shared-intel Dynamically link all Intel-provided libraries.
-fpic Generates position-independent code.
-qopenmp Enable OpenMP directive and compile.
-qmkl Link MKL library.
-parallel Performs automatic parallelization
-O0/-O1/-O2/-O3 Specifies the level of optimization (default is -O2).
-fast Optimizes the program for maximum execution speed. -fast option gives the following options.
-ipo, -O3, -no-prec-div, -static, -fp-model fast=2 -xHost
-ip Optimize processing between procedures within a single file.
-ipo Optimize processing between procedures among multiple files.
-qopt-report Displays information about the performed optimization.
-xHost Generates the code corresponded to the most significant instruction set which available in a processor.
-xCORE-AVX512/-xCORE-AVX2/-xSSE4.2/-xSSE3 IntelGenerates the optimized code corresponded to the specified instruction set, for Intel processor.
-static-intel Statically links libraries provided by Intel. Static linking has the effect of reducing the load of dynamic library search at program startup.
-Bstatic Statically links all libraries on the command line following the -Bstatic option. However, if the -Bdynamic option is found in the process, the link will be dynamically linked thereafter.
-Bdynamic Dynamically links all libraries on the command line following the -Bdynamic option. However, if the -Bstatic option is found in the process, the link will be statically thereafter.

To compile a program that uses more than 2 GB of memory, specify the following options.
-mcmodel=medium -shared-intel

$ icc test.c      # For C
$ icpc test.cpp   # For C++
$ ifort test.f90      # For Fortran
$ tssrun ./a.out  # Execution

$ ifort -parallel test.f90
$ tssrun --rsc p=1:t=4:c=4 ./a.out # Execute with the number of parallels specified as 4.

OpenMP is an open standard for parallelizing programs. You can have the compiler automatically perform parallelization only by writing instructions starting with #pragma omp in the source code and compiling with the given options.

To compile the source code with instructions to OpenMP, use the -qopenmp option.

$ icc -qopenmp test.c

When executing a compiled program, if you specify the number of parallelisms for t and c with --rsc option, the program will be executed with that number of parallelisms.

$ tssrun --rsc p=1:t=8:c=8 ./a.out # Execute with the number of parallels specified as 8.

Coarray is a function added in Fortran 2008 for program parallelization. You can enable Coarray by compiling as follows. If you use srun to execute coarray, you must specify coarray=single when compiling. If you run the program without using the srun command in a batch process, you can run the program by specifying shared or distributed.

## When srun is used
$ ifort -coarray=single coarray.f90
 
## When srun is not used (except for interactive processing)
$ ifort -coarray=distributed coarray.f90  

When executing the compiled program, if you specify the number of parallels in p with the --rsc option, the program will be executed with the number of parallels. If the -coarray option is set to 'shared' or 'distributed', you must specify the number of parallels in the environment variable named FOR_COARRAY_NUM_IMAGES. Please note that due to a bug when using IntelMPI via Infiniband, you must set to "0" in the environment variable named MPIR_CVAR_CH4_OFI_ENABLE_RMA to run a job. (Information as of June 2023)

#!/bin/bash
#============ Slurm Options ===========
#SBATCH -p gr19999b             # Specify the job queue (partition). You need to change to the name of the queue which you want to submit. 
#SBATCH -t 1:00:00              # Specify the elapsed time (Example of specifying 1 hour).
#SBATCH --rsc p=2               # Specify the requested resource (Example of executing Coarray with two parallels).
#SBATCH -o %x.%j.out            # Specify the standard output file for the job.The '%x' is replaced by the job name and '%j' is replaced by the job ID.
#============ Shell Script ============
# (optional) If you specify 'set -x', you can easily keep track of the execution progress of the job scripts.
set -x

# Supports when using IntelMPI via Infiniband
export MPIR_CVAR_CH4_OFI_ENABLE_RMA=0

# Set the number of parallels to an environment variable
export FOR_COARRAY_NUM_IMAGES=${SLURM_DPC_NPROCS}

# Job execution (when -coarray=single)
srun ./a.out

#Note: When '-coarray=distributed' or '-coarray=shared', please run without srun because the compiled executable file itself calls mpiexec internally.

The Intel compiler outputs messages with the following format when there is a program error or information to be notified.

File name (line number): XXX #YYY: Message text
Contents of the corresponding line of source code
^
  • XXX : Message type(error/warning)
  • YYY : Serial number of the message
  • Pointer(^) : The exact location where the error was found in the corresponding line of source code

Output Example

sample.c(27): warning #175: subscript out of range
    printf(" %d , %d\n",c[1][0],c[1][10]);
                                    ^                                    

File name (line number): XXX #YYY: Message text
Contents of the corresponding line of source code
--------------^
  • XXX : Message type(error/warning)
  • YYY : Serial number of the message
  • Pointer(^) : The exact location where the error was found in the corresponding line of source code

Output Example

sample.f90(26): error #5560: Subscript #2 of the array C has value 20 which is greater than the upper bound of 2
print *, c(1,1),",", c(1,20)
-----------------------^
compilation aborted for sample.f90 (code 1)

The Intel compiler is installed in /opt/system/app/intel, which is a shared storage. Since a common file can be referenced from all nodes, storage capacity can be saved and manageability can be improved.

On the other hand, there is a problem that access from all nodes concentrated in a shared storage.

To solve this problem, the runtime libraries of the intel compiler (a group of libraries that do not include compilation functions) are installed in the local storage of the computing nodes of System A/B/C. When a runtime library is used to execute a large MPI program, to start a large number of programs by array job, or to repeatedly execute a program for a short period of time, it reduces the load on /opt/system and speeds up program startup.

Please try this simple method by specifying a following command to switch modules in your job script. Static linking at compile time has the same effect, but there are cases where only dynamic linking is possible.

Switch to the same version of the runtime library as at the time of compilation. Runtime libraries are module files with "-rt" at the end of the version number.

It is necessary to force module switch with the -f option.

module switch -f intel/2022.3 intel/2022.3-rt -f
or
module switch -f intel/2022.3-rt

As with the Intel compiler, it is necessary to force module switch with the -f option.

module switch -f intelmpi/2022.3 intelmpi/2022.3-rt
or
module switch -f intelmpi/2022.3-rt

The following example shows the elapsed time required to complete a job for an MPI program started in 4096 parallel. When the arithmetic part of the program is timed, both are completed within 8 seconds, but in the elapsed time of the entire job, which can be checked in sacct, the elapsed time of the job is reduced by more than 1 minute by using the run-time library.

## Job without runtime library (1 min 26 sec)
$ sacct -X -j 234069 -o jobid,elapsed
JobID           Elapsed
------------ ----------
234069         00:01:26

## Job using runtime library (17 sec)
$ sacct -X -j 234060 -o jobid,elapsed
JobID           Elapsed
------------ ----------
234060         00:00:17

The Intel MPI library is available. Please refer to Intel MPI Library on how to compile, link and execute MPI programs.

When using the Intel compiler, the following numerical calculation libraries are available. Please refer to the individual pages for details on how to use each library.

Library System A System B System C System G Cloud
MKL Library + + + - +

+ : Available for all users
AU : + : Available for all users
- : Not available