Browse code

20240321: intel vtune update

root authored on2024-03-21 19:01:44
Showing8 changed files
1 1
new file mode 100644
2 2
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe01.png differ
3 3
new file mode 100644
4 4
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe02.png differ
5 5
new file mode 100644
6 6
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe03.png differ
7 7
new file mode 100644
8 8
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe04.png differ
9 9
new file mode 100644
10 10
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe05.png differ
11 11
new file mode 100644
12 12
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe06.png differ
13 13
new file mode 100644
14 14
Binary files /dev/null and b/user/pages/08.compilers/15.intel_vtune/amplxe07.png differ
15 15
new file mode 100644
... ...
@@ -0,0 +1,310 @@
1
+---
2
+title: 'Intel VTune Profiler'
3
+taxonomy:
4
+    category:
5
+        - docs
6
+external_links:
7
+    process: true
8
+    no_follow: true
9
+    target: _blank
10
+    mode: active
11
+---
12
+
13
+[toc]
14
+
15
+## Environment Settings{#enviroment}
16
+
17
+
18
+
19
+### Software Version and System Requirements{#version}
20
+
21
+Version      | Module File Name| System A |  System B/C | System G | Cloud System | Note | 
22
+---------------| ------------------  | --------- |  ----------- | --------- | ---------------- |      |
23
+2023 (default) |  intel-vtune/2023.1 |  +        |  +           |  -        |  -               | Basically, please use this.|
24
+2022           |  intel-vtune/2022.3 |  +        |  +           |  -        |  -               |  |
25
+
26
++: Available for all users <br>
27
+\- : Not available
28
+
29
+In the [Intel compiler](/compilers/intel) available environment, execute the module command as follows.
30
+
31
+
32
+```nohighlight
33
+$ module load intel-vtune
34
+```
35
+
36
+
37
+For details on the module command, please refer to [Modules](/config/modules).
38
+
39
+
40
+
41
+## How to Use Intel VTune Profiler{#usage}
42
+
43
+
44
+
45
+### Intel VTune Profiler Commands{#command}
46
+
47
+Commands      |  Purpose                                 
48
+------------ | ---------------------------------
49
+ vtune-gui   |   Acrivate the GUI version of the VTune Profiler.        
50
+ vtune       |    Activate the command line version of the VTune Profiler.  
51
+
52
+
53
+### Options{#option}
54
+
55
+
56
+#### Options of the vtune
57
+
58
+Options                  |  Purpose                      
59
+-------------------------- | ---------------------
60
+ -collect=_string_         |  Specifies the analysis type.        
61
+ -app-working-dir=_string_ |  Specifies the working directory.  
62
+ -r, -result-dir=_string_  |  Specifies the directory in which to save the result.
63
+
64
+
65
+#### Main types that can be specified with -collect of vtune
66
+
67
+Types                     |  Purpose                         
68
+---------------------- | ----------------------------
69
+ threading             |  Displays multi-threaded parallelism.          
70
+ hotspots              |   Analyze hotspots.                      
71
+ memory-access         |  Analyze memory access.
72
+
73
+
74
+## Examples{#sample}
75
+
76
+
77
+
78
+### How to Use in GUI{#sample_gui}
79
+
80
+
81
+1. Compiling<br>
82
+When using the Intel VTune Profiler, compile with debugging options **-g** and the same optimization options as for the actual program execution.
83
+
84
+    ```nohighlight
85
+    $ icc -g -O2 test.c
86
+    ```
87
+
88
+2. Activating VTune Profiler<br>
89
+In [MobaXterm](/login/mobaxterm) or [FastX](/login/fastx), the VTune Amplifier is activated when executing the **vtune-gui** command. Please refer to  [Interactive Processing](/run/interactive) for details on the **tssrun** command. 
90
+
91
+    ```nohighlight
92
+    $ tssrun --x11 vtune-gui 
93
+    ```
94
+
95
+    ![](vtune_1.png?width=600)
96
+3. Creating projects<br>
97
+    Select **New Project** in the center of the screen and the following screen will appear.
98
+    Then enter the appropriate **Project name** and click **Create Project**.
99
+
100
+    ![](vtune_2.png?width=400)
101
+
102
+4. Setting projects<br>
103
+Next, the Configure Analysis screen will appear. Specify the target program in **Application** and click **OK**. 
104
+If an arguments is required, specify it here.
105
+
106
+    ![](vtune_3.png?width=600)
107
+
108
+5. Start Analysis <br>
109
+Click Performance Snapshot, specify the target of the check, and then press **Start** with the triangle symbol.   
110
+
111
+    ![](vtune_4.png?width=600)
112
+
113
+6. Confirmation of analysis results<br>
114
+After a short wait, the results of the analysis in the VTune Profiler are displayed. In this example, we can see that the multiply1 function is using a lot of CPU time.
115
+
116
+    ![](vtune_5.png?width=600)
117
+
118
+
119
+#### Analysis of OpenMP programs
120
+
121
+The GUI version supports analysis of OpenMP programs.
122
+
123
+When starting with the tssrun command, the --rsc option specifies the number of threads.
124
+
125
+Example:8 threads execution
126
+```nohighlight
127
+tssrun --x11 --rsc t=8:c=8 vtune-gui
128
+```
129
+
130
+<!--
131
+VTuneの起動後、「Advanced Hotspots」を選び、「Analyze OpenMP Regions」を選択して「Start」を押せば、スレッド8並列での解析が行われます。 
132
+
133
+![](amplxe07.png)
134
+-->
135
+
136
+
137
+#### MPI Program Analysis
138
+Please use the CUI version for an analysis of MPI program.
139
+
140
+### How to Use in CUI{#sample_cui}
141
+
142
+1. Compiling<br>
143
+ When using Intel VTune Profiler, compile with debugging options **-g** and the same optimization options as for the actual program execution.
144
+
145
+    ```nohighlight
146
+    $ icc -g -O2 test.c
147
+    ```
148
+
149
+2. Executing Check<br>
150
+A CUI version of the VTune Profiler is available with the **vtune** command. In this example, we are analyzing hotspots by specifying the directory to output the results. Please refer to [Interactive Processing](/run/interactive) for details on the **tssrun** command.
151
+
152
+```nohighlight
153
+$ tssrun vtune -collect hotspots  -r=./result.vtune ./a.out
154
+salloc: Pending job allocation 723318
155
+salloc: job 723318 queued and waiting for resources
156
+salloc: job 723318 has been allocated resources
157
+salloc: Granted job allocation 723318
158
+salloc: Waiting for resource configuration
159
+salloc: Nodes xb0013 are ready for job
160
+vtune: Analyzing data in the node-wide mode. The hostname (xb0013) will be added to the result path/name.
161
+vtune: Collection started.
162
+
163
+(Omitted)
164
+
165
+vtune: Executing actions 42 % Saving the resultElapsed Time: 0.075s
166
+ | Application execution time is too short. Metrics data may be unreliable.
167
+ | Consider reducing the sampling interval or increasing your application
168
+ | execution time.
169
+ |
170
+    CPU Time: 0.050s
171
+        Effective Time: 0s
172
+        Spin Time: 0s
173
+            Imbalance or Serial Spinning: 0s
174
+            Lock Contention: 0s
175
+            Other: 0s
176
+        Overhead Time: 0.050s
177
+         | A significant portion of CPU time is spent in synchronization or
178
+         | threading overhead. Consider increasing task granularity or the scope
179
+         | of data synchronization.
180
+         |
181
+            Creation: 0s
182
+            Scheduling: 0s
183
+            Reduction: 0s
184
+            Atomics: 0s
185
+            Other: 0.050s
186
+    Total Thread Count: 1
187
+    Paused Time: 0s
188
+
189
+Top Hotspots
190
+Function                       Module       CPU Time  % of CPU Time(%)
191
+-----------------------------  -----------  --------  ----------------
192
+__kmp_api_omp_get_max_threads  libiomp5.so    0.050s            100.0%
193
+Effective Physical Core Utilization: 100.0% (112.000 out of 112)
194
+    Effective Logical Core Utilization: 148.8% (333.230 out of 224)
195
+Collection and Platform Info
196
+    Application Command Line: ../oss/openmp/omp-sample/01_hello/src/c/run.x
197
+    Operating System: 4.18.0-477.15.1.el8_8.x86_64 Red Hat Enterprise Linux release 8.8 (Ootpa)
198
+    Computer Name: xb0013
199
+    Result Size: 5.5 MB
200
+    Collection start time: 09:39:56 21/09/2023 UTC
201
+    Collection stop time: 09:39:57 21/09/2023 UTC
202
+    Collector Type: Event-based counting driver,User-mode sampling and tracing
203
+    CPU
204
+        Name: Intel(R) Xeon(R) Processor code named Sapphirerapids
205
+        Frequency: 2.000 GHz
206
+        Logical CPU Count: 224
207
+        LLC size: 110.1 MB
208
+        Cache Allocation Technology
209
+            Level 2 capability: available
210
+            Level 3 capability: available
211
+
212
+If you want to skip descriptions of detected performance issues in the report,
213
+enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
214
+Alternatively, you may view the report in the csv format: vtune -report
215
+<report_name> -format=csv.
216
+vtune: Executing actions 100 % done
217
+salloc: Relinquishing job allocation 723318
218
+
219
+exit code: 0
220
+    ```
221
+
222
+3. Checking the results<br>
223
+Execute the **vtune-gui** command in the X (GUI)  available environment, such as [MobaXterm](/login/mobaxterm) or [FastX](/login/fastx), and check the results with the GUI version of VTune Amplifier. Please refer to [Interactive Processing](/run/interactive) for details on the **tssrun** command.
224
+
225
+    ```nohighlight
226
+    $ tssrun --x11 vtune-gui ./result/result.vtune
227
+    ```
228
+
229
+    ![](vtune_6.png?width=600)
230
+
231
+#### Analysis of Parallel Programs{#parallel}
232
+
233
+The CUI version supports analysis of MPI and OpenMP programs.
234
+When executing VTune with tssrun, the --rsc option specifies the number of parallels.
235
+
236
+<!--
237
+It also requires the addition of mpiexec.hydra.
238
+-->
239
+Example: Analysis with MPI4 parallel
240
+
241
+```nohighlight
242
+$ tssrun --rsc p=4 vtune -collect hotspots  -r=./result ./a.out
243
+salloc: Pending job allocation 723080
244
+salloc: job 723080 queued and waiting for resources
245
+salloc: job 723080 has been allocated resources
246
+salloc: Granted job allocation 723080
247
+vtune: Analyzing data in the node-wide mode. The hostname (xb0127) will be added to the result path/name.
248
+vtune: Collection started.
249
+
250
+(Omitted)                             
251
+                                                         
252
+vtune: ExecutinElapsed Time: 1.015sg the result
253
+    CPU Time: 1.710s
254
+        Effective Time: 1.710s
255
+        Spin Time: 0s
256
+            MPI Busy Wait Time: 0s
257
+            Other: 0s
258
+        Overhead Time: 0s
259
+            Other: 0s
260
+    Total Thread Count: 8
261
+    Paused Time: 0s
262
+
263
+Top Hotspots
264
+Function                Module                CPU Time  % of CPU Time(%)
265
+----------------------  --------------------  --------  ----------------
266
+read                    libc.so.6               0.670s             39.2%
267
+PMPI_Init               libmpi.so.12            0.112s              6.6%
268
+main                    allrank                 0.100s              5.8%
269
+dlopen                  libdl.so.2              0.089s              5.2%
270
+[ld-linux-x86-64.so.2]  ld-linux-x86-64.so.2    0.080s              4.7%
271
+[Others]                N/A                     0.658s             38.5%
272
+Effective Physical Core Utilization: 89.5% (100.261 out of 112)
273
+    Effective Logical Core Utilization: 45.0% (100.843 out of 224)
274
+     | The metric value is low, which may signal a poor utilization of logical
275
+     | CPU cores while the utilization of physical cores is acceptable. Consider
276
+     | using logical cores, which in some cases can improve processor throughput
277
+     | and overall performance of multi-threaded applications.
278
+     |
279
+Collection and Platform Info
280
+    Application Command Line: ../lecture/20230906/mpi/allrank
281
+    Operating System: 4.18.0-477.15.1.el8_8.x86_64 Red Hat Enterprise Linux release 8.8 (Ootpa)
282
+    Computer Name: xb0127
283
+    Result Size: 7.2 MB
284
+    Collection start time: 09:24:37 21/09/2023 UTC
285
+    Collection stop time: 09:24:38 21/09/2023 UTC
286
+    Collector Type: Event-based counting driver,User-mode sampling and tracing
287
+    CPU
288
+        Name: Intel(R) Xeon(R) Processor code named Sapphirerapids
289
+        Frequency: 2.000 GHz
290
+        Logical CPU Count: 224
291
+        LLC size: 110.1 MB
292
+        Cache Allocation Technology
293
+            Level 2 capability: available
294
+            Level 3 capability: available
295
+
296
+If you want to skip descriptions of detected performance issues in the report,
297
+enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
298
+Alternatively, you may view the report in the csv format: vtune -report
299
+<report_name> -format=csv.
300
+vtune: Executing actions 100 % done
301
+salloc: Relinquishing job allocation 723080
302
+
303
+exit code: 0
304
+```
305
+
306
+
307
+## Manuals{#manual}
308
+
309
+* [Intel VTune Profiler Documentation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler-documentation.html)
310
+* [Intel VTune Progiler User Guide](https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-2.html)
0 311
\ No newline at end of file