TensorFlow相关教程（经验版）

花花少年

已于 2024-06-09 10:31:59 修改

阅读量2.4k

点赞数

分类专栏：深度学习文章标签： tensorflow

于 2022-04-15 23:37:04 首次发布

本文链接：https://blog.csdn.net/m0_37605642/article/details/124206293

版权

深度学习专栏收录该内容

133 篇文章 126 订阅

订阅专栏

一、参考资料

Tensorflow 2.x入门教程
 简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2
tensorflow python deploy
TensorFlow Basics: Tensor, Shape, Type, Sessions & Operators
TensorFlow官方文档
 TensorFlow中文文档
 TensorFlow官方源码
 TensorFlow 官方文档中文版
 Sklearn 与 TensorFlow 机器学习实用指南

二、相关介绍

1. `compat`模块

compat 是TF2.X 专门为兼容TF1.X配置的模块。TF2.X默认采用动态计算图，推荐使用tf.keras高级API，用Keras写模型比PyTorch还要精简。但是TF天生就比tf.keras拥有更多的底层配置。可以使用 tf.function 等底层的API构建模型，能进行各方面的定制化；也可以使用 tf.keras 像搭积木一样搭建模型，开发者不用了解底层的架构如何搭建，只需要关注整体的设计流程即可。

TF2.X官方教程以Keras API为主，同一个功能可以由不同的API实现，但是不同API进行组合就会出现问题。也就是说，混淆了 tf.keras 和底层API。

三、相关经验

1. 安装tensorflow

1.1 TensorFlow 版本对齐

Tensorflow-gpu、Python、 cuda 、 cuDNN 版本对齐

安装前一定要查询 Tensorflow-gpu、Python、 cuda 、 cuDNN 版本关系，务必一一对应！

在这里插入图片描述

1.2 源码安装TensorFlow

Ubuntu16.04系统Tensorflow源码安装

1.3 安装ARM版本TensorFlow

tensorflow-on-arm
arm环境下编译好的tensorflow1.15的whl包
 ubuntu18.04 +arm 环境下编译tensorflow

2. 构建简单的训练模型

【工具】Tensorflow2.x（一）建立模型的三种方式

import tensorflow as tf
import numpy as np

# 启用动态图机制
tf.enable_eager_execution()

# 设定学习率
learning_rate = 0.01
# 训练迭代次数
train_steps = 1000
# 构造训练数据
train_X = np.array([[3.3],[4.4],[5.5],[6.71],[6.93],[4.168],[9.799],[6.182],[7.59],[2.167],[7.042],[10.791],[5.313],[7.997],[5.654],[9.27],[3.1]],dtype = np.float32)
train_Y = np.array([[1.7],[2.76],[2.09],[3.19],[1.694],[1.573],[3.366],[2.596],[2.53],[1.221],[2.827],[3.465],[1.65],[2.904],[2.42],[2.94],[1.3]],dtype = np.float32)
# 输入数据
def network(data_x, data_y):
    X = data_x
    Y_ = data_y
    # 定义模型参数
    w = tf.Variable(tf.random_normal([1, 1]),name = "weight")
    b = tf.Variable(tf.zeros([1]), name = "bias")
    # 构建模型Y = weight*X + bias
    Y = tf.add(tf.matmul(X, w), b)
    # 定义损失函数
    loss = tf.reduce_sum(tf.pow((Y-Y_), 2))/17

    print(loss)

    return loss

optimizer = tf.train.AdadeltaOptimizer(learning_rate= 0.01)

# 训练1000次
for i in range(0, 1000):
    # 在动态图机制下，minimize要求接收一个函数
    optimizer.minimize((lambda: network(train_X, train_Y)))

3. GPU相关操作

3.1 按需分配GPU显存

#先导入必要的库
import tensorflow as tf
#下面就是加入的部分
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

3.2 兼容GPU/CPU

allow_soft_placement 能让tensorflow遇到无法用GPU跑的数据时，自动切换成CPU进行。

log_device_placement 记录一些日志

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))

3.3 禁用GPU

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

4. 模型上线

TensorFlow模型的签名推荐与快速上线

四、常用技巧

1. TensorFlow与CUDA版本对齐

tensorflow版本对应关系
 RTX3060无法运行tensorflow1.x
RTX3060深度学习tensorflow环境配置之踩坑记录

安装TensorFlow，需要与CUDA版本对齐，参考官网版本对齐

tensorflow 1.x版本只能在CUDA 10.0及以前版本上运行；
GeForce RTX 30系列显卡目前支持CUDA 11.1及以上版本，TensorFlow 2.4及更高版本才支持 CUDA 11；
RTX 3060只能装CUDA 11以上版本，对应只能装Tensorflow2.4以上版本，RTX 3060 无法运行tensorflow 1.x版本。

2. 解决tf1与tf2兼容问题

本以为tf2相比于tf2改动较大，但实际上并没有那么复杂。只需替换tensorflow的导包方式即可完美解决tf2与tf1的兼容问题。

2.1 方法一

# 替换包导入方式
import tensorflow as tf
替换为
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

2.2 方法二

TensorFlow 1.X 手动开启 tf.enable_eager_execution()，手动关闭 tf.compat.v1.disable_eager_execution()，TensorFlow 2.X 默认已开启。

# 替换包导入方式
import tensorflow as tf
替换为
import tensorflow as tf
tf.compat.v1.disable_eager_execution()

# 替换其他API
x = tf.placeholder(tf.float32, shape=(1024, 1024))
替换为
x = tf.compat.v1.placeholder(tf.float32, shape=(1024, 1024))

with tf.Session() as sess:
替换为
with tf.compat.v1.Session() as sess:

3. 图片预处理

3.1 结合PIL和Opencv

# 读取图片
pil_image = Image.open(item_imgpath)
# 简单图像处理
image_np = np.array(pil_image)  # pil转np
input_tensor = tf.convert_to_tensor(image_np)  # np转tensor
input_tensor = input_tensor[tf.newaxis, ...]  # 扩展维度，三维转四维
image_np = input_tensor.eval()  # tensor转np

3.2 结合TensorFlow

# 图片path
image_path = './1.jpg'

# 定义输入格式
img = tf.io.read_file(image_path)
# 图片原始shape，(450, 600, 3)
img = tf.image.decode_jpeg(img, channels=3)  
# 图片resize
img = tf.image.resize(img, (299, 299))
# 添加维度，3维变4维，(1, 450, 600, 3)
img = tf.expand_dims(img, axis=0) 
# 类型转换
img = tf.cast(img, dtype=tf.uint8)

五、FAQ

Q：tf版本过低，读取pb文件失败

关于tensorflow的报错NodeDef mentions attr ‘xxx’ not in Op的解决方案和产生原因

Traceback (most recent call last):
  File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 501, in _import_graph_def_internal
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node sequential/custom_layer/batch_normalization_56/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 132, in <module>
    get_ops_name(pb_path=r'C:\Users\Seeking\Desktop\model.pb')
  File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 117, in get_ops_name
    tf.import_graph_def(graph_def, name='')
  File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "F:\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 505, in _import_graph_def_internal
    raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node sequential/custom_layer/batch_normalization_56/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

错误原因：
tensorflow的版本为1.15，但是训练保存的pb文件是2.x版本，即低版本的tensorflow无法读取高版本训练生成的pb文件。

方法一：
升级tensorflow版本。

方法二（推荐）：
见上文章节【tensorflow版本】。

Q：输入节点和输出节点不一致

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'import/istraining' with dtype bool
	 [[node import/istraining (defined at \360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]

Original stack trace for 'import/istraining':
  File "/MyDocumentes/PythonProjects/202202/20220228/site_infer.py", line 84, in <module>
    return_tensors = read_pb_return_tensors(graph, pb_file, return_elements)
  File "/MyDocumentes/PythonProjects/202202/20220228/site_infer.py", line 70, in read_pb_return_tensors
    return_elements = tf.import_graph_def(frozen_graph_def, return_elements=return_elements)  # 获取输入和输出的节点
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 517, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3451, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "\360Downloads\Anaconda3\envs\tf15\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

错误原因：
模型的输入输出节点的数量不一致，
比如，输入输出节点为：["inputs:0", "istraining:0", "outputs:0"]
【错误】推理：output = sess.run([return_tensors[1]], feed_dict={return_tensors[0]: image_data})
【正确】推理：output = sess.run([return_tensors[2]], feed_dict={return_tensors[0]: image_data, return_tensors[1]: False})

使用技巧：
打印graph.get_operations的操作节点，查看输入输出节点的数量和名称。

"""print operations"""
for op in tf.Graph().get_operations():
	print(op.name)

Q：tf版本问题

File "F:/MyDocumentes/PythonProjects/202202/20220228/tensorflow_demo.py", line 42, in soft_demo
    ndarray_data = sess.run(sm)
  File "F:\360Downloads\Anaconda3\envs\tf22py37\lib\site-packages\tensorflow\python\client\session.py", line 971, in run
    run_metadata_ptr)
  File "F:\360Downloads\Anaconda3\envs\tf22py37\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    raise RuntimeError('The Session graph is empty. Add operations to the '
RuntimeError: The Session graph is empty. Add operations to the graph before calling run().

错误原因：
tensorflow版本不同导致的，tf2无法兼容版本tf1。

解决办法：
import tensorflow as tf
替换为
import tensorflow.compat.v1 as tf
tf.compat.v1.disable_eager_execution()

或者
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Q：读取pb计算图失败

Use tf.gfile.GFile.
Traceback (most recent call last):
  File "F:/MyDocumentes/PythonProjects/utils_tools/tensorflow_utils.py", line 139, in <module>
    get_ops_name(pb_path=r'G:\ModelZoo\yolov3模型\yolov3\yolov3.pb')
  File "F:/MyDocumentes/PythonProjects/utils_tools/tensorflow_utils.py", line 38, in get_ops_name
    graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

错误原因：
SavedModel格式
/home/jitao/raccoon_dataset-master/train/saved_model/saved_model.pb 

TensorFlow Graph格式
/home/jitao/raccoon_dataset-master/train/frozen_inference_graph.pb

读取计算图的方式，应该选用 TensorFlow Graph格式，我错误的选择了SavedModel格式，导致了错误发生。 

解决办法：
修改为第二个路径之后，问题解决。

Q：tensorflow安装包不兼容

[ma-user software]$pip install tensorflow-1.15.0-cp37-cp37m-manylinux2014_aarch64.whl
Looking in indexes: http://192.168.0.122:8888/repository/pypi/simple
Processing ./tensorflow-1.15.0-cp37-cp37m-manylinux2014_aarch64.whl
ERROR: tensorflow has an invalid wheel, could not read 'tensorflow-1.15.0.dist-info/WHEEL' file: BadZipFile('Bad magic number for file header')

解决办法：
安装其他安装包
[tensorflow-on-arm](https://github.com/lhelontra/tensorflow-on-arm/releases)

Q: ValueError: The name “input_5” is used 2 times in the model…

ValueError: The name “input_5” is used 2 times in the model. All layer names should be unique**

错误原因：

解决办法：
load_model()之前，进行clear_session()

tf.keras.backend.clear_session()