Running a Keras model... the bad thing is that it is faster not to use CPU extentions (it should be the other way around).
Look at the output below.
Is there a config file where I can set inter_op_parallelism option ?
Using TensorFlow backend.
2018-10-18 17:21:32.620461: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 17:21:32.621535: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Results: -33.20 (23.69) MSE
real 2m55.990s
user 4m8.784s
sys 3m50.192s
Using TensorFlow backend.
2018-10-18 17:25:04.773578: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Results: -32.57 (23.16) MSE
real 1m48.847s
user 2m15.792s
sys 0m13.440s
解决方案
Here you are the code I'm using with keras, just put it on the top of your code.
from keras import backend as K
import tensorflow as tf
NUM_PARALLEL_EXEC_UNITS = 6
config = tf.ConfigProto(intra_op_parallelism_threads = NUM_PARALLEL_EXEC_UNITS,
inter_op_parallelism_threads = 1,
allow_soft_placement = True,
device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS })
session = tf.Session(config=config)
K.set_session(session)
import os
os.environ["OMP_NUM_THREADS"] = str(NUM_PARALLEL_EXEC_UNITS)
os.environ["KMP_BLOCKTIME"] = "30"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"
Note: I'm a little bit disappointed with the results. I could reach maximum 150% speeding up only playing with these parameters.