使用gcc、icc和icx编译器编译SuperPI的性能对比测试
SuperPI是著名的圆周率计算软件,通常用于CPU的单线程性能对比测试。目前这款软件的源代码已经开放,可以在gitHub下载。
SuperPI的编译安装非常简单,只要本机安装了make工具和编译器,就可以使用make all命令对源码进行编译安装。如有需要,还可以通过修改makefile的方式更换编译器,以便更好地适应硬件平台特性。
测试环境:
CPU:Intel Core i9 12900KF
内存:2×32GB DDR4 3000
操作系统:Ubuntu20.04.4LTS
编译器版本:
$ gcc -v
.....
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
$ icc -v
icc version 2021.5.0 (gcc version 9.4.0 compatibility)
$ icx -v
Intel(R) oneAPI DPC++/C++ Compiler 2022.0.0 (2022.0.0.20211123)
......
SuperPI解压三份,目录分别重命名为:SuperPI-gcc、SuperPI-icc和SuperPI-icx,并将其中makefile中的编译器分别设置为gcc、icc和icx,随后使用make all命令进行编译。编译后生成的主程序对比:
$ ll ./SuperPI-*/pi_css5
-rwxrwxr-x 1 uxr uxr 1093360 May 11 11:57 ./SuperPI-gcc/pi_css5*
-rwxrwxr-x 1 uxr uxr 1301584 May 11 12:06 ./SuperPI-icc/pi_css5*
-rwxrwxr-x 1 uxr uxr 1099448 May 11 12:04 ./SuperPI-icx/pi_css5*
可以看到gcc编译生成的主程序是最小的;icx生成的程序略大于gcc;icc生成的程序是最大的,比gcc生成的程序大了将近20%。
测试命令:
$ ./pi_css5 $((1<<26))
这一命令的预计将计算圆周率小数点后2的26次方位的数值,即64M(6700万)位。
测试结果:
gcc:
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2a.memsave
initializing...
nfft= 16777216
radix= 10000
error_margin= 0.365078
calculating 67108864 digits of PI...
AGM iteration
precision= 48: 3.80 sec
precision= 80: 3.77 sec
precision= 176: 3.77 sec
precision= 352: 3.78 sec
precision= 688: 3.78 sec
precision= 1392: 3.78 sec
precision= 2784: 3.77 sec
precision= 5584: 3.78 sec
precision= 11168: 3.78 sec
precision= 22336: 3.78 sec
precision= 44688: 3.77 sec
precision= 89408: 3.80 sec
precision= 178816: 3.77 sec
precision= 357648: 3.77 sec
precision= 715312: 3.78 sec
precision= 1430640: 3.78 sec
precision= 2861280: 3.77 sec
precision= 5722592: 3.77 sec
precision= 11445200: 3.77 sec
precision= 22890416: 3.77 sec
precision= 45780848: 3.77 sec
precision= 91561728: 3.78 sec
writing pi67108864.txt...
93.72 sec. (real time)
icc:
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2a.memsave
initializing...
nfft= 16777216
radix= 10000
error_margin= 0.365078
calculating 67108864 digits of PI...
AGM iteration
precision= 48: 3.84 sec
precision= 80: 3.81 sec
precision= 176: 3.82 sec
precision= 352: 3.86 sec
precision= 688: 3.86 sec
precision= 1392: 3.86 sec
precision= 2784: 3.86 sec
precision= 5584: 3.82 sec
precision= 11168: 3.82 sec
precision= 22336: 3.82 sec
precision= 44688: 3.82 sec
precision= 89408: 3.82 sec
precision= 178816: 3.81 sec
precision= 357648: 3.81 sec
precision= 715312: 3.82 sec
precision= 1430640: 3.82 sec
precision= 2861280: 3.81 sec
precision= 5722592: 3.82 sec
precision= 11445200: 3.82 sec
precision= 22890416: 3.82 sec
precision= 45780848: 3.81 sec
precision= 91561728: 3.82 sec
writing pi67108864.txt...
94.92 sec. (real time)
icx:
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2a.memsave
initializing...
nfft= 16777216
radix= 10000
error_margin= 0.365078
calculating 67108864 digits of PI...
AGM iteration
precision= 48: 3.78 sec
precision= 80: 3.75 sec
precision= 176: 3.75 sec
precision= 352: 3.75 sec
precision= 688: 3.78 sec
precision= 1392: 3.75 sec
precision= 2784: 3.76 sec
precision= 5584: 3.75 sec
precision= 11168: 3.75 sec
precision= 22336: 3.75 sec
precision= 44688: 3.76 sec
precision= 89408: 3.76 sec
precision= 178816: 3.75 sec
precision= 357648: 3.75 sec
precision= 715312: 3.76 sec
precision= 1430640: 3.76 sec
precision= 2861280: 3.75 sec
precision= 5722592: 3.75 sec
precision= 11445200: 3.75 sec
precision= 22890416: 3.75 sec
precision= 45780848: 3.75 sec
precision= 91561728: 3.76 sec
writing pi67108864.txt...
93.29 sec. (real time)
结论
在本次测试的三种编译器中,作为测试基准的gcc9.4.0编译器,性能优于icc而逊于icx。icx性能最强,但性能仅比gcc强不到1%;icc性能最差,但也仅比gcc差1%左右。考虑到测试过程中可能存在误差,因此结论是:在运行单线程程序时,三种编译器编译而成的程序性能基本一致。
后记:
在运行2^26位测试后,又分别进行了2^27和2^28测试,但由于程序限制,实际生成的位数为0.75×2^27位和0.75×2^28位,近似于1亿位和2亿位。测试结果分别为:
gcc:
1亿位203.22s
2亿位467.73s
icc:
1亿位200.50s
2亿位465.03s
icx:
1亿位201.77s
2亿位460.28s
因此不难看出结论仍然成立,即在运行单线程程序时,三种编译器编译而成的程序性能基本一致。