纳秒优化训练营北京站正在进行中。今天是第一天,北京的天气非常好,蓝天白云,晴空万里,天气有些热,还好室内中央空调充足,而且人手一把新款的“格物致知”扇子(^_^)。
上午从Intel的CPU讲起,回顾IA历程,介绍Core、UnCore,讲的有点多,只做了一个小试验(感受x86的变长指令)。
吃过午饭,从试验做起,在各种不同的环境下跑大矩阵乘法,气氛异常活跃。AMD的CPU惊艳全场,跑1号算法时远超所有Intel CPU,包括至强。2048*2048的两个双精度浮点矩阵相乘,因为1号小白算法有明显的跨行访问,所以参与测试的所有英特尔CPU用时都在1秒以上,但是AMD的Ryzen CPU只用0.5秒左右。
激发了大家对AMD CPU强烈好奇感后,本次新增的《霄龙传奇》开始了。无论是桌面市场的AMD Ryzen,还是服务器市场的AMD EPYC都是基于Zen微架构,中文一般翻译为霄龙。关于霄龙的成功,当然离不开有硅仙人美名的霄龙之父Jim Keller......
使用霄龙笔记本的同学在得意了好一阵子后,在做VTune试验时遇到问题了,VTune可以安装在AMD系统上,但是很多试验都无法做——报告不支持的处理器型号(^-^)。
怎么办?解决方法是连接到远程的Intel本子做试验。
基本步骤如下:在配置分析界面的Where区域选Remote Linux(SSH),指定用户么和远程IP地址。然后点击Deply,VTune会自动把用于Linux系统的收集工具推到Linux上。
推成功后,接下来在What区域选择Linux上的被优化程序,在How区域选择优化方法。
简单来说,VTune支持两类试验一类是不需要内核驱动的,叫driver-less,另一类是需要驱动的。
对于课堂上使用的LunarLake本子,因为有大小核差异,不支持driver-less方式,这意味着必须要编译和加载驱动。
Current OS kernel does not support driverless collection on hybrid CPUs. Consider loading the VTune Profiler sampling driver using root credentials or updating the OS kernel.
编译和加载驱动的基本过程如下:
切换到VTune推送文件的目标位置,一般是/tmp下,将里面的sepdk文件夹复制到home
cp sepdk ~ -R
然后切换到sepdk/src目录,ls列出所有文件。
geduer@lunabook:~$ cd sepdkgeduer@lunabook:~/sepdk$ lsinclude src vtune-layergeduer@lunabook:~/sepdk$ cd srcgeduer@lunabook:~/sepdk/src$ lsapic.c linuxos.c pmi.c sepdk.spec unc_mmio.capic.o linuxos.o pmi.o sepdrv_p_state.c unc_mmio.oboot-script lwpmudrv.c pmu_list.c sepdrv_p_state.o unc_msr.cbuild-driver lwpmudrv.o pmu_list.o silvermont.c unc_msr.ocontrol.c Makefile read_dmisysfs.py silvermont.o unc_pci.ccontrol.o modules.order README.txt socperf unc_pci.ocore2.c Module.symvers read_slitsysfs.py socwatch unc_pmt.ccore2.o output.c read_smbios.py sys32.S unc_power.ccpumon.c output.o read_sratsysfs.py sys64.o unc_power.ocpumon.o pax rmmod-sep sys64.S unc_sa.ceventmux.c pci.c sep5.ko sys_info.c unc_sa.oeventmux.o pci.o sep5.mod sys_info.o utility.cinc pebs.c sep5.mod.c unc_common.c utility.oinsmod-sep pebs.o sep5.mod.o unc_common.o valleyview_sochap.cipt.c perfver4.c sep5.o unc_gt.c valleyview_sochap.oipt.o perfver4.o sep5-x32_64-6.11.0-17-genericsmp.ko unc_gt.o vtsspp
然后,执行其中的build-drivers来构建驱动。
geduer@lunabook:~/sepdk/src$ ./build-driver
C compiler to use: [ /bin/gcc ]C compiler version: 13.3.0
Make command to use: [ /bin/make ]Make version: 4.3
Kernel source directory: [ /lib/modules/6.11.0-17-generic/build ]Kernel version: 6.11.0-17-generic
Building socperf driver ...
构建过程有惊无险,顺利完成。
make[2]: Leaving directory '/usr/src/linux-headers-6.11.0-17-generic'make[1]: Leaving directory '/home/geduer/sepdk/src/socwatch/socwatch_driver'************ Built drivers are copied to /home/geduer/sepdk/src/socwatch/drivers directory ************Done
Done building the drivers
接下来再执行insmod-sep来加载驱动。
geduer@lunabook:~/sepdk/src$ sudo ./insmod-sepPYTHON is set to python3Detecting Secure Boot status...Secure Boot is disabled--------------------- Loading PAX driver ---------------------Detecting Secure Boot status...Secure Boot is disabledChecking for PMU arbitration service (PAX) ... not detected.Attempting to start PAX service ...Executing: insmod ./pax/pax-x32_64-6.11.0-17-genericsmp.koSetting group ownership of devices to group "vtune" ... done.Setting file permissions on devices to "660" ... done.The pax driver has been successfully loaded.
--------------------- Loading SOCPERF driver ---------------------Detecting Secure Boot status...Secure Boot is disabledChecking for socperf driver ... not detected.Attempting to start socperf service ...Executing: insmod ./socperf/src/socperf3-x32_64-6.11.0-17-genericsmp.koSetting group ownership of devices to group "vtune" ... done.Setting file permissions on devices to "660" ... done.The socperf3 driver has been successfully loaded.
--------------------- Loading SEP driver ---------------------Executing: insmod ./sep5-x32_64-6.11.0-17-genericsmp.ko sym_lookup_func_addr="ffffffffa243db10"Setting group ownership of devices to group "vtune" ... done.Setting file permissions on devices to "660" ... done.The sep5 driver has been successfully loaded.
--------------------- Loading VTSSPP driver ---------------------Checking for vtsspp driver ... not detected.Executing: insmod ./vtsspp/vtsspp-x32_64-6.11.0-17-genericsmp.ko gid=1001 mode=0660 ksyms="ffffffffa243db10"The vtsspp driver has been successfully loaded.
--------------------- Loading SOCWATCH driver ---------------------Checking for socwatch driver ... not detected.
Executing: insmod ./socwatch/drivers/socwatch2_16-x32_64-6.11.0-17-genericsmp.ko
Setting group ownership of device file to group "vtune" ... done.Setting file permissions of device file to "660" ... done.
The socwatch2_16-x32_64-6.11.0-17-genericsmp driver has been successfully loaded.
加载驱动后,脚本会有一个很友好的提示:
NOTE:
The driver is accessible only to users under the group 'vtune'.Please add the users to the group 'vtune' to use the tool.
To change driver access group, reinstall the driver using -g <desired_group> option.
NOTE:The driver is accessible only to users under the group vtune.Please add the users to the group vtune to use the tool.
To change driver access group, reload the driver using -g <desired_group> option.
意思是,只有vtune用户组的用户才能使用vtune驱动。这主要是从安全角度的考虑。只要执行如下命令把当前用户加到vtune组就可以了。
sudo adduser geduer vtune
接下来,再刷新VTune的界面,就可以选择各种高级优化试验了,比如基于硬件事件的采样。点击下面的开始按钮,VTune就开始工作了。
VTune会在远程的Linux机器上启动要优化的应用程序,并在本地显示它的输出信息:
应用程序退出后,VTune自动停止收集事件,把数据传回到主机端,进行分析和展示。
VTune也可以从远程下载符号文件和源代码,显示出很漂亮的源代码+汇编视图。
总的来说,把优化工具做到这样的境界,VTune绝对是天花板级别的了。
(写文章很辛苦,恳请各位读者点击“在看”,欢迎转发)
*************************************************
正心诚意,格物致知,以人文情怀审视软件,以软件技术改变人生
扫描下方二维码或者在微信中搜索“盛格塾”小程序,可以阅读更多文章和有声读物
也欢迎关注格友公众号