使用 Perf 衡量程序 FLOPs

FLOPs

FLOPs 是用来衡量科学计算程序计算量的关键指标,表示一个程序完整运算所需的浮点运算次数。在此,我使用系统性能评测工具 Perf 来衡量一个程序的 FLOPs。

安装 Perf

Ubuntu/Debian

1
apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

CentOS

1
yum install perf

查看支持的事件及其代号

安装 libpfm4

1
2
3
git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4
cd libpfm4
make

查看事件

进入examples文件夹,运行showevtinfo程序,查看哪些事件是与 flops 相关的,在我的电脑中,我找到以下几个事件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
IDX	 : 419430470
PMU name : skl (Intel Skylake)
Name : FP_ARITH_INST_RETIRED
Equiv : None
Flags : None
Desc : Floating-point instructions retired
Code : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX : 419430469
PMU name : skl (Intel Skylake)
Name : FP_ARITH
Equiv : FP_ARITH_INST_RETIRED
Flags : None
Desc : Floating-point instructions retired
Code : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX : 419430414
PMU name : skl (Intel Skylake)
Name : FP_ASSIST
Equiv : None
Flags : None
Desc : X87 floating-point assists
Code : 0xca
Umask-00 : 0x1001e : PMU : [ANY] : [default] : Cycles with any input/output SEE or FP assists
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)

获取代号

在相同目录下,执行check_events程序,获取指定代号,程序的参数就是上一步骤中获取的Name和Umask,我的执行命令就是如下:

1
./check_events FP_ARITH_INST_RETIRED:SCALAR_SINGLE FP_ARITH:SCALAR_SINGLE FP_ASSIST

得到如下结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Requested Event: FP_ARITH_INST_RETIRED:SCALAR_SINGLE
Actual Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU : Intel Skylake
IDX : 419430470
Codes : 0x5302c7
Requested Event: FP_ARITH:SCALAR_SINGLE
Actual Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU : Intel Skylake
IDX : 419430470
Codes : 0x5302c7
Requested Event: FP_ASSIST
Actual Event: skl::FP_ASSIST:ANY:k=1:u=1:e=0:i=0:c=1:t=0:intx=0:intxcp=0
PMU : Intel Skylake
IDX : 419430414
Codes : 0x1531eca

结果中的 Codes,就是我们要的代号

衡量程序 FLOPs

找到要测量的程序,然后使用perf stat执行并给予事件代码,即可获得 FLOPs。示例如下:

1
sudo perf stat -e r5302c7 -e r1531eca  ./example.py

得到结果如下:

1
2
3
4
5
6
7
8
9
Performance counter stats for './example.py':

13,061,638 r5302c7
1 r1531eca

1.834101748 seconds time elapsed

1.888016000 seconds user
0.231023000 seconds sys

其中,r5302c7对应的数值,即为该程序的总 FLOPs。

出现的错误通过搜索都可以简单解决,有问题欢迎留言交流。

欢迎关注我的公众号~
public account

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×