在使用paddlepaddle制作的docker镜像时出现如下错误,因在原有机器上是可以正常运行的,但是在新的机器上却不行。经过排查发现并不是paddlepaddle的问题,而是引入的dlib库在新的机器上因为指令集缺失而引起的。
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::framework::SignalHandle(char const*, int)
1 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
----------------------
Error Message Summary:
----------------------
FatalError: `Illegal instruction` is detected by the operating system.
[TimeInfo: *** Aborted at 1621819522 (unix time) try "date -d @1621819522" if you are using GNU date ***]
[SignalInfo: *** SIGILL (@0x7f46f81a26ea) received by PID 7 (TID 0x7f478c3b9740) from PID 18446744073577047786 ***]
可以使用以下命令查看系统指令集
cat /proc/cpuinfo
原有机器:
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel Core Processor (Broadwell)
stepping : 2
microcode : 0x1
cpu MHz : 2593.906
cache size : 16384 KB
physical id : 31
siblings : 1
core id : 0
cpu cores : 1
apicid : 31
initial apicid : 31
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
bogomips : 5187.81
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
新机器:
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU
stepping : 3
microcode : 0x1
cpu MHz : 2593.906
cache size : 16384 KB
physical id : 31
siblings : 1
core id : 0
cpu cores : 1
apicid : 31
initial apicid : 31
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
bogomips : 5187.81
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
对比上边两台机器可以发现缺失的指令集挺多的,如果尝试卸载并不使用缓存重装dlib库时就会报以下错误,可以发现缺失的是AVX指令集
pip --no-cache-dir install dlib
Dlib was compiled to use AVX instructions, but these aren't available on your machine.
解决办法:
从源码重新编译dlib
git clone https://github.com/davisking/dlib.git
进入根目录
mkdir build
cd build
cmake ..
cmake --build .
安装python包
python3 setup.py install --no USE_AVX_INSTRUCTIONS --no DLIB_USE_CUDA
(如果是python2 就 python运行)
经过编译后会依然提示缺失SSE指令集,但是实际测试不影响正常使用