user/pages/08.compilers/15.intel_vtune/docs.en.md
51437bd5
 ---
 title: 'Intel VTune Profiler'
 taxonomy:
     category:
         - docs
 external_links:
     process: true
     no_follow: true
     target: _blank
     mode: active
 ---
 
 [toc]
 
 ## Environment Settings{#enviroment}
 
 
 
 ### Software Version and System Requirements{#version}
 
 Version      | Module File Name| System A |  System B/C | System G | Cloud System | Note | 
 ---------------| ------------------  | --------- |  ----------- | --------- | ---------------- |      |
1c0a127b
 2024.0 |  intel-vtune/2024.0 |  +        |  +           |  -        |  -               | |
 2023.2 (default) |  intel-vtune/2023.2 |  +        |  +           |  -        |  -               | |
 2023.1  |  intel-vtune/2023.1 |  +        |  +           |  -        |  -               | |
 2022.3           |  intel-vtune/2022.3 |  +        |  +           |  -        |  -               |  |
51437bd5
 
 +: Available for all users <br>
 \- : Not available
 
 In the [Intel compiler](/compilers/intel) available environment, execute the module command as follows.
 
 
 ```nohighlight
 $ module load intel-vtune
 ```
 
 
 For details on the module command, please refer to [Modules](/config/modules).
 
 
 
 ## How to Use Intel VTune Profiler{#usage}
 
 
 
 ### Intel VTune Profiler Commands{#command}
 
 Commands      |  Purpose                                 
 ------------ | ---------------------------------
  vtune-gui   |   Acrivate the GUI version of the VTune Profiler.        
  vtune       |    Activate the command line version of the VTune Profiler.  
 
 
 ### Options{#option}
 
 
 #### Options of the vtune
 
 Options                  |  Purpose                      
 -------------------------- | ---------------------
  -collect=_string_         |  Specifies the analysis type.        
  -app-working-dir=_string_ |  Specifies the working directory.  
  -r, -result-dir=_string_  |  Specifies the directory in which to save the result.
 
 
 #### Main types that can be specified with -collect of vtune
 
 Types                     |  Purpose                         
 ---------------------- | ----------------------------
  threading             |  Displays multi-threaded parallelism.          
  hotspots              |   Analyze hotspots.                      
  memory-access         |  Analyze memory access.
 
 
 ## Examples{#sample}
 
 
 
 ### How to Use in GUI{#sample_gui}
 
 
 1. Compiling<br>
 When using the Intel VTune Profiler, compile with debugging options **-g** and the same optimization options as for the actual program execution.
 
     ```nohighlight
     $ icc -g -O2 test.c
     ```
 
 2. Activating VTune Profiler<br>
 In [MobaXterm](/login/mobaxterm) or [FastX](/login/fastx), the VTune Amplifier is activated when executing the **vtune-gui** command. Please refer to  [Interactive Processing](/run/interactive) for details on the **tssrun** command. 
 
     ```nohighlight
     $ tssrun --x11 vtune-gui 
     ```
 
     ![](vtune_1.png?width=600)
 3. Creating projects<br>
     Select **New Project** in the center of the screen and the following screen will appear.
     Then enter the appropriate **Project name** and click **Create Project**.
 
     ![](vtune_2.png?width=400)
 
 4. Setting projects<br>
 Next, the Configure Analysis screen will appear. Specify the target program in **Application** and click **OK**. 
 If an arguments is required, specify it here.
 
     ![](vtune_3.png?width=600)
 
 5. Start Analysis <br>
 Click Performance Snapshot, specify the target of the check, and then press **Start** with the triangle symbol.   
 
     ![](vtune_4.png?width=600)
 
 6. Confirmation of analysis results<br>
 After a short wait, the results of the analysis in the VTune Profiler are displayed. In this example, we can see that the multiply1 function is using a lot of CPU time.
 
     ![](vtune_5.png?width=600)
 
 
 #### Analysis of OpenMP programs
 
 The GUI version supports analysis of OpenMP programs.
 
 When starting with the tssrun command, the --rsc option specifies the number of threads.
 
 Example:8 threads execution
 ```nohighlight
 tssrun --x11 --rsc t=8:c=8 vtune-gui
 ```
 
 <!--
 VTuneの起動後、「Advanced Hotspots」を選び、「Analyze OpenMP Regions」を選択して「Start」を押せば、スレッド8並列での解析が行われます。 
 
 ![](amplxe07.png)
 -->
 
 
 #### MPI Program Analysis
 Please use the CUI version for an analysis of MPI program.
 
 ### How to Use in CUI{#sample_cui}
 
 1. Compiling<br>
  When using Intel VTune Profiler, compile with debugging options **-g** and the same optimization options as for the actual program execution.
 
     ```nohighlight
     $ icc -g -O2 test.c
     ```
 
 2. Executing Check<br>
 A CUI version of the VTune Profiler is available with the **vtune** command. In this example, we are analyzing hotspots by specifying the directory to output the results. Please refer to [Interactive Processing](/run/interactive) for details on the **tssrun** command.
 
 ```nohighlight
 $ tssrun vtune -collect hotspots  -r=./result.vtune ./a.out
 salloc: Pending job allocation 723318
 salloc: job 723318 queued and waiting for resources
 salloc: job 723318 has been allocated resources
 salloc: Granted job allocation 723318
 salloc: Waiting for resource configuration
 salloc: Nodes xb0013 are ready for job
 vtune: Analyzing data in the node-wide mode. The hostname (xb0013) will be added to the result path/name.
 vtune: Collection started.
 
 (Omitted)
 
 vtune: Executing actions 42 % Saving the resultElapsed Time: 0.075s
  | Application execution time is too short. Metrics data may be unreliable.
  | Consider reducing the sampling interval or increasing your application
  | execution time.
  |
     CPU Time: 0.050s
         Effective Time: 0s
         Spin Time: 0s
             Imbalance or Serial Spinning: 0s
             Lock Contention: 0s
             Other: 0s
         Overhead Time: 0.050s
          | A significant portion of CPU time is spent in synchronization or
          | threading overhead. Consider increasing task granularity or the scope
          | of data synchronization.
          |
             Creation: 0s
             Scheduling: 0s
             Reduction: 0s
             Atomics: 0s
             Other: 0.050s
     Total Thread Count: 1
     Paused Time: 0s
 
 Top Hotspots
 Function                       Module       CPU Time  % of CPU Time(%)
 -----------------------------  -----------  --------  ----------------
 __kmp_api_omp_get_max_threads  libiomp5.so    0.050s            100.0%
 Effective Physical Core Utilization: 100.0% (112.000 out of 112)
     Effective Logical Core Utilization: 148.8% (333.230 out of 224)
 Collection and Platform Info
     Application Command Line: ../oss/openmp/omp-sample/01_hello/src/c/run.x
     Operating System: 4.18.0-477.15.1.el8_8.x86_64 Red Hat Enterprise Linux release 8.8 (Ootpa)
     Computer Name: xb0013
     Result Size: 5.5 MB
     Collection start time: 09:39:56 21/09/2023 UTC
     Collection stop time: 09:39:57 21/09/2023 UTC
     Collector Type: Event-based counting driver,User-mode sampling and tracing
     CPU
         Name: Intel(R) Xeon(R) Processor code named Sapphirerapids
         Frequency: 2.000 GHz
         Logical CPU Count: 224
         LLC size: 110.1 MB
         Cache Allocation Technology
             Level 2 capability: available
             Level 3 capability: available
 
 If you want to skip descriptions of detected performance issues in the report,
 enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
 Alternatively, you may view the report in the csv format: vtune -report
 <report_name> -format=csv.
 vtune: Executing actions 100 % done
 salloc: Relinquishing job allocation 723318
 
 exit code: 0
     ```
 
 3. Checking the results<br>
 Execute the **vtune-gui** command in the X (GUI)  available environment, such as [MobaXterm](/login/mobaxterm) or [FastX](/login/fastx), and check the results with the GUI version of VTune Amplifier. Please refer to [Interactive Processing](/run/interactive) for details on the **tssrun** command.
 
     ```nohighlight
     $ tssrun --x11 vtune-gui ./result/result.vtune
     ```
 
     ![](vtune_6.png?width=600)
 
 #### Analysis of Parallel Programs{#parallel}
 
 The CUI version supports analysis of MPI and OpenMP programs.
 When executing VTune with tssrun, the --rsc option specifies the number of parallels.
 
 <!--
 It also requires the addition of mpiexec.hydra.
 -->
 Example: Analysis with MPI4 parallel
 
 ```nohighlight
 $ tssrun --rsc p=4 vtune -collect hotspots  -r=./result ./a.out
 salloc: Pending job allocation 723080
 salloc: job 723080 queued and waiting for resources
 salloc: job 723080 has been allocated resources
 salloc: Granted job allocation 723080
 vtune: Analyzing data in the node-wide mode. The hostname (xb0127) will be added to the result path/name.
 vtune: Collection started.
 
 (Omitted)                             
                                                          
 vtune: ExecutinElapsed Time: 1.015sg the result
     CPU Time: 1.710s
         Effective Time: 1.710s
         Spin Time: 0s
             MPI Busy Wait Time: 0s
             Other: 0s
         Overhead Time: 0s
             Other: 0s
     Total Thread Count: 8
     Paused Time: 0s
 
 Top Hotspots
 Function                Module                CPU Time  % of CPU Time(%)
 ----------------------  --------------------  --------  ----------------
 read                    libc.so.6               0.670s             39.2%
 PMPI_Init               libmpi.so.12            0.112s              6.6%
 main                    allrank                 0.100s              5.8%
 dlopen                  libdl.so.2              0.089s              5.2%
 [ld-linux-x86-64.so.2]  ld-linux-x86-64.so.2    0.080s              4.7%
 [Others]                N/A                     0.658s             38.5%
 Effective Physical Core Utilization: 89.5% (100.261 out of 112)
     Effective Logical Core Utilization: 45.0% (100.843 out of 224)
      | The metric value is low, which may signal a poor utilization of logical
      | CPU cores while the utilization of physical cores is acceptable. Consider
      | using logical cores, which in some cases can improve processor throughput
      | and overall performance of multi-threaded applications.
      |
 Collection and Platform Info
     Application Command Line: ../lecture/20230906/mpi/allrank
     Operating System: 4.18.0-477.15.1.el8_8.x86_64 Red Hat Enterprise Linux release 8.8 (Ootpa)
     Computer Name: xb0127
     Result Size: 7.2 MB
     Collection start time: 09:24:37 21/09/2023 UTC
     Collection stop time: 09:24:38 21/09/2023 UTC
     Collector Type: Event-based counting driver,User-mode sampling and tracing
     CPU
         Name: Intel(R) Xeon(R) Processor code named Sapphirerapids
         Frequency: 2.000 GHz
         Logical CPU Count: 224
         LLC size: 110.1 MB
         Cache Allocation Technology
             Level 2 capability: available
             Level 3 capability: available
 
 If you want to skip descriptions of detected performance issues in the report,
 enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
 Alternatively, you may view the report in the csv format: vtune -report
 <report_name> -format=csv.
 vtune: Executing actions 100 % done
 salloc: Relinquishing job allocation 723080
 
 exit code: 0
 ```
 
 
 ## Manuals{#manual}
 
 * [Intel VTune Profiler Documentation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler-documentation.html)
 * [Intel VTune Progiler User Guide](https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-2.html)