一、参考资料
Tensorflow 2.x入门教程
简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2
tensorflow python deploy
TensorFlow Basics: Tensor, Shape, Type, Sessions & Operators
TensorFlow官方文档
TensorFlow中文文档
TensorFlow官方源码
TensorFlow 官方文档中文版
Sklearn 与 TensorFlow 机器学习实用指南
二、相关介绍
1. compat
模块
compat
是TF2.X 专门为兼容TF1.X配置的模块。TF2.X默认采用动态计算图,推荐使用tf.keras高级API,用Keras写模型比PyTorch还要精简。但是TF天生就比tf.keras拥有更多的底层配置。可以使用 tf.function
等底层的API构建模型,能进行各方面的定制化;也可以使用 tf.keras
像搭积木一样搭建模型,开发者不用了解底层的架构如何搭建,只需要关注整体的设计流程即可。
TF2.X官方教程以Keras API为主,同一个功能可以由不同的API实现,但是不同API进行组合就会出现问题。也就是说,混淆了 tf.keras
和底层API。
2. TensorFlow低级API
TensorFlow2.0教程-使用低级api训练(非tf.keras)
3. tensorflow例程
4. tensorflow模型优化
Model optimization
TensorFlow 模型优化
三、相关经验
1. 安装tensorflow
1.1 TensorFlow 版本对齐
Tensorflow-gpu、Python、 cuda 、 cuDNN 版本对齐
安装前一定要查询 Tensorflow-gpu
、Python
、 cuda
、 cuDNN
版本关系,务必一一对应!
1.2 源码安装TensorFlow
1.3 安装ARM版本TensorFlow
tensorflow-on-arm
arm环境下编译好的tensorflow1.15的whl包
ubuntu18.04 +arm 环境下编译tensorflow
2. 构建简单的训练模型
import tensorflow as tf
import numpy as np
# 启用动态图机制
tf.enable_eager_execution()
# 设定学习率
learning_rate = 0.01
# 训练迭代次数
train_steps = 1000
# 构造训练数据
train_X = np.array([[3.3],[4.4],[5.5],[6.71],[6.93],[4.168],[9.799],[6.182],[7.59],[2.167],[7.042],[10.791],[5.313],[7.997],[5.654],[9.27],[3.1]],dtype = np.float32)
train_Y = np.array([[1.7],[2.76],[2.09],[3.19],[1.694],[1.573],[3.366],[2.596],[2.53],[1.221],[2.827],[3.465],[1.65],[2.904],[2.42],[2.94],[1.3]],dtype = np.float32)
# 输入数据
def network(data_x, data_y):
X = data_x
Y_ = data_y
# 定义模型参数
w = tf.Variable(tf.random_normal([1, 1]),name = "weight")
b = tf.Variable(tf.zeros([1]), name = "bias")
# 构建模型Y = weight*X + bias
Y = tf.add(tf.matmul(X, w), b)
# 定义损失函数
loss = tf.reduce_sum(tf.pow((Y-Y_), 2))/17
print(loss)
return loss
optimizer = tf.train.AdadeltaOptimizer(learning_rate= 0.01)
# 训练1000次
for i in range(0, 1000):
# 在动态图机制下,minimize要求接收一个函数
optimizer.minimize((lambda: network(train_X, train_Y)))
3. GPU相关操作
3.1 按需分配GPU显存
#先导入必要的库
import tensorflow as tf
#下面就是加入的部分
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
3.2 兼容GPU/CPU
allow_soft_placement
能让tensorflow遇到无法用GPU跑的数据时,自动切换成CPU进行。
log_device_placement
记录一些日志
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))
3.3 禁用GPU
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
4. 模型上线
四、常用技巧
1. TensorFlow与CUDA版本对齐
tensorflow版本对应关系
RTX3060无法运行tensorflow1.x
RTX3060深度学习tensorflow环境配置之踩坑记录
安装TensorFlow,需要与CUDA版本对齐,参考官网 版本对齐
- tensorflow 1.x版本只能在CUDA 10.0及以前版本上运行;
- GeForce RTX 30系列显卡目前支持CUDA 11.1及以上版本,TensorFlow 2.4及更高版本才支持 CUDA 11;
- RTX 3060只能装CUDA 11以上版本,对应只能装Tensorflow2.4以上版本,RTX 3060 无法运行tensorflow 1.x版本。
2. 解决tf1与tf2兼容问题
本以为tf2相比于tf2改动较大,但实际上并没有那么复杂。只需替换tensorflow的导包方式即可完美解决tf2与tf1的兼容问题。
2.1 方法一
# 替换包导入方式
import tensorflow as tf
替换为
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
2.2 方法二
TensorFlow 1.X
手动开启 tf.enable_eager_execution()
,手动关闭 tf.compat.v1.disable_eager_execution()
,TensorFlow 2.X
默认已开启。
# 替换包导入方式
import tensorflow as tf
替换为
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
# 替换其他API
x = tf.placeholder(tf.float32, shape=(1024, 1024))
替换为
x = tf.compat.v1.placeholder(tf.float32, shape=(1024, 1024))
with tf.Session() as sess:
替换为
with tf.compat.v1.Session() as sess:
3. 图片预处理
3.1 结合PIL和Opencv
# 读取图片
pil_image = Image.open(item_imgpath)
# 简单图像处理
image_np = np.array(pil_image) # pil转np
input_tensor = tf.convert_to_tensor(image_np) # np转tensor
input_tensor = input_tensor[tf.newaxis, ...] # 扩展维度,三维转四维
image_np = input_tensor.eval() # tensor转np
3.2 结合TensorFlow
# 图片path
image_path = './1.jpg'
# 定义输入格式
img = tf.io.read_file(image_path)
# 图片原始shape,(450, 600, 3)
img = tf.image.decode_jpeg(img, channels=3)
# 图片resize
img = tf.image.resize(img, (299, 299))
# 添加维度,3维变4维,(1, 450, 600, 3)
img = tf.expand_dims(img, axis=0)
# 类型转换
img = tf.cast(img, dtype=tf.uint8)
五、FAQ
Q:tf版本过低,读取pb文件失败
关于tensorflow的报错NodeDef mentions attr ‘xxx’ not in Op的解决方案和产生原因
Traceback (most recent call last):
File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node sequential/custom_layer/batch_normalization_56/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 132, in <module>
get_ops_name(pb_path=r'C:\Users\Seeking\Desktop\model.pb')
File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 117, in get_ops_name
tf.import_graph_def(graph_def, name='')
File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 505, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node sequential/custom_layer/batch_normalization_56/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
错误原因:
tensorflow的版本为1.15,但是训练保存的pb文件是2.x版本,即低版本的tensorflow无法读取高版本训练生成的pb文件。
方法一:
升级tensorflow版本。
方法二(推荐):
见上文章节【tensorflow版本】。
Q:输入节点和输出节点不一致
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'import/istraining' with dtype bool
[[node import/istraining (defined at \360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Original stack trace for 'import/istraining':
File "/MyDocumentes/PythonProjects/202202/20220228/site_infer.py", line 84, in <module>
return_tensors = read_pb_return_tensors(graph, pb_file, return_elements)
File "/MyDocumentes/PythonProjects/202202/20220228/site_infer.py", line 70, in read_pb_return_tensors
return_elements = tf.import_graph_def(frozen_graph_def, return_elements=return_elements) # 获取输入和输出的节点
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3451, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
错误原因:
模型的输入输出节点的数量不一致,
比如,输入输出节点为:["inputs:0", "istraining:0", "outputs:0"]
【错误】推理:output = sess.run([return_tensors[1]], feed_dict={return_tensors[0]: image_data})
【正确】推理:output = sess.run([return_tensors[2]], feed_dict={return_tensors[0]: image_data, return_tensors[1]: False})
使用技巧:
打印graph.get_operations的操作节点,查看输入输出节点的数量和名称。
"""print operations"""
for op in tf.Graph().get_operations():
print(op.name)
Q:tf版本问题
File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 42, in soft_demo
ndarray_data = sess.run(sm)
File "F:\360Downloads\Anaconda3\envs\tf22py37\lib\site-packages\tensorflow\python\client\session.py", line 971, in run
run_metadata_ptr)
File "F:\360Downloads\Anaconda3\envs\tf22py37\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
raise RuntimeError('The Session graph is empty. Add operations to the '
RuntimeError: The Session graph is empty. Add operations to the graph before calling run().
错误原因:
tensorflow版本不同导致的,tf2无法兼容版本tf1。
解决办法:
import tensorflow as tf
替换为
import tensorflow.compat.v1 as tf
tf.compat.v1.disable_eager_execution()
或者
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
Q:读取pb计算图失败
Use tf.gfile.GFile.
Traceback (most recent call last):
File "F:/MyDocumentes/PythonProjects/utils_tools/tensorflow_utils.py", line 139, in <module>
get_ops_name(pb_path=r'G:\ModelZoo\yolov3模型\yolov3\yolov3.pb')
File "F:/MyDocumentes/PythonProjects/utils_tools/tensorflow_utils.py", line 38, in get_ops_name
graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'
错误原因:
SavedModel格式
/home/jitao/raccoon_dataset-master/train/saved_model/saved_model.pb
TensorFlow Graph格式
/home/jitao/raccoon_dataset-master/train/frozen_inference_graph.pb
读取计算图的方式,应该选用 TensorFlow Graph格式,我错误的选择了SavedModel格式,导致了错误发生。
解决办法:
修改为第二个路径之后,问题解决。
Q:tensorflow安装包不兼容
[ma-user software]$pip install tensorflow-1.15.0-cp37-cp37m-manylinux2014_aarch64.whl
Looking in indexes: http://192.168.0.122:8888/repository/pypi/simple
Processing ./tensorflow-1.15.0-cp37-cp37m-manylinux2014_aarch64.whl
ERROR: tensorflow has an invalid wheel, could not read 'tensorflow-1.15.0.dist-info/WHEEL' file: BadZipFile('Bad magic number for file header')
解决办法:
安装其他安装包
[tensorflow-on-arm](https://github.com/lhelontra/tensorflow-on-arm/releases)
Q: ValueError: The name “input_5” is used 2 times in the model…
ValueError: The name “input_5” is used 2 times in the model. All layer names should be unique**
错误原因:
解决办法:
load_model()之前,进行clear_session()
tf.keras.backend.clear_session()