tensorflow入门

最新推荐文章于 2022-04-12 16:35:14 发布

echo__Moon

最新推荐文章于 2022-04-12 16:35:14 发布

阅读量210

点赞数

分类专栏： python 深度学习学习笔记

本文链接：https://blog.csdn.net/qy724728631/article/details/88869214

版权

学习笔记同时被 3 个专栏收录

21 篇文章 0 订阅

订阅专栏

深度学习

11 篇文章 0 订阅

订阅专栏

python

8 篇文章 0 订阅

订阅专栏

一般引入tensorflow都用语句： import tensorflow as tf

1. 常量：

a=tf.constant(10)

2. 变量：

x=tf.Variable(tf.ones([3,3])) 
y=tf.Variable(tf.zeros([3,3]))

变量定义完后，还必须显式的执行一下初始化操作，即需要在后面加上一句：

init=tf.initialize_all_variables()
sess.run(init)

3. 占位符

变量在定义时要初始化，但是如果有些变量刚开始我们并不知道它们的值，无法初始化，就用占位符来占个位置。

x = tf.placeholder(tf.float32, [None, 784])

指定这个变量的类型和shape，以后再用feed的方式来输入值。若维数不定，则用None。

4. 图(graph)

tensorflow将tensor对象间的运算称之为操作(op)都放入到一个图(graph）中，图中的每一个结点就是一个操作。然后行将整个graph 的计算过程交给一个 TensorFlow 的Session, 此 Session 可以运行整个计算过程，比起操作(operations)一条一条的执行效率高的多。其中sess.run()即是执行，注意要先执行变量初始化操作，再执行运算操作。

Session需要先创建，使用完后还需要释放。因此我们使用with...as..语句，让系统自动释放。

范例：Titanic

import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('train.csv')
data.info()
data['Sex'] = data['Sex'].apply(lambda s: 1 if s == 'male' else 0)
data = data.fillna(0)
dataset_x = data[['Sex', 'Age', 'Pclass', 'SibSp', 'Parch', 'Fare']]
dataset_x = dataset_x.values
data['Deceased'] = data['Survived'].apply(lambda s: int(not s))
dataset_y = data[['Survived', 'Deceased']]
dataset_y = dataset_y.values
X_train, X_val, y_train, y_val = train_test_split(dataset_x, dataset_y,test_size=0.2,random_state=42)
print('train samples: ', X_train.shape, y_train.shape)
print('val samples: ', X_val.shape, y_val.shape)
X = tf.placeholder(tf.float32, shape = [None, 6])
y = tf.placeholder(tf.float32, shape = [None, 2])
b = tf.Variable(tf.zeros([2]), name = 'bias')
w = tf.Variable(tf.random_normal([6, 2]), name = 'weight')
y_pred = tf.nn.softmax(tf.matmul(X, w) + b)
cross_entropy = - tf.reduce_sum(y * tf.log(y_pred + 1e-10),reduction_indices = 1)
cost = tf.reduce_mean(cross_entropy)
train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
correct = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, "float"))
init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)
    with open('./acc.txt', 'w') as f:
        for epoch in range(10):
            total_loss = 0
            for i in range(len(X_train)):
                result, loss = sess.run([train_op, cost], feed_dict={X: X_train[i].reshape(1,6), y: y_train[i].reshape(1,2)})
                total_loss += loss
            print('Epoch: %02d, total loss = %.6f' % (epoch + 1, total_loss))
            acc, loss = sess.run([accuracy, cost], feed_dict={X: X_val, y: y_val})
            f.write('Epoch: %02d, Accuracy = %.6f, Loss = %.6f \n' % (epoch + 1, acc, loss))
            print('Epoch: %02d, Accuracy = %.6f, Loss = %.6f' % (epoch + 1, acc, loss))

5. 存储和加载模型参数

变量的存储和读取是通过tf.train.Saver类完成的。Saver对象初始化时为计算图加入存储和加载变量的算在，通过参数指定存储哪些变量。Saver对象的save()和restore()方法是触发图中算子的入口。需要注意Saver对象初始化时，若不指定变量列表，默认只会自动收集声明之前的所有变量。checkpoints是用于存储变量的二进制文件，其内部使用字典结构存储变量。

saver=tf.train.Saver(max_to_keep=1)

在创建这个Saver对象的时候，有一个参数我们经常会用到，就是 max_to_keep 参数，这个是用来设置保存模型的个数，默认为5，即 max_to_keep=5，保存最近的5个模型。当然，如果你只想保存最后一代的模型，则只需要将max_to_keep设置为1即可。如果你想每训练一代（epoch)就想保存一次模型，则可以将 max_to_keep设置为None或者0。当达到最大保存数目后，每次保存的模型会覆盖之前的。

创建完saver对象后，就可以保存训练好的模型

saver.save(sess,'ckpt/mnist.ckpt',global_step=step)

第一个参数sess。第二个参数设定保存的路径和名字，第三个参数将训练的次数作为后缀加入到模型名字中。

模型的恢复用的是restore()函数，它需要两个参数restore(sess, save_path)，save_path指的是保存的模型路径。我们可以使用tf.train.latest_checkpoint（）来自动获取最后一次保存的模型。

model_file=tf.train.latest_checkpoint('./ckpt')
saver.restore(sess,model_file)

范例：

with tf.Session() as sess:
    sess.run(init)
    saver = tf.train.Saver(max_to_keep=1)
    max_acc = 0
    with open('./acc.txt', 'w') as f:
        for epoch in range(10):
            total_loss = 0
            for i in range(len(X_train)):
                result, loss = sess.run([train_op, cost], feed_dict={X: X_train[i].reshape(1,6), y: y_train[i].reshape(1,2)})
                total_loss += loss
            print('Epoch: %02d, total loss = %.6f' % (epoch + 1, total_loss))
            acc, loss = sess.run([accuracy, cost], feed_dict={X: X_val, y: y_val})
            f.write('Epoch: %02d, Accuracy = %.6f, Loss = %.6f \n' % (epoch + 1, acc, loss))
            print('Epoch: %02d, Accuracy = %.6f, Loss = %.6f' % (epoch + 1, acc, loss))
            if acc > max_acc:
                max_acc = acc
                saver.save(sess, "./ckpt/model.ckpt", global_step=epoch)
    model_file = tf.train.latest_checkpoint('./ckpt')
    saver.restore(sess, model_file)
    prediction = np.argmax(sess.run(y_pred, feed_dict={X: X_test}),1)
    submission = pd.DataFrame({"PassengerId": testdata["PassengerId"], "Survived": prediction})
    submission.to_csv("./data/submission.csv",index = False)

6. TensorBoard可视化

目前TensorBoard可以展示几种数据：标量指标、图片、音频、计算图的有向图、参数变量的分布和直方图，还有最新添加的画出模型的计算图的图形，可以用曲线图显示损失代价等量化指标的变化过程，还可以展示必要的图片和音频数据。

TensorBoard的工作方式是启动一个Web服务，该服务进程从TensorFlow程序执行所得的事件日志（event files）中读取概要（summary）数据，然后将数据在网页上绘制成可视化的图表。概要数据及其记录算子包括：

    标量数据：tf.summary.scalar
    参数数据：tf.summary.histogram
    图像数据：tf.summary.image
    音频数据：tf.summary.audio
    计算图结构：tf.summary.FileWriter

记录算子需要手动通过Session.run()接口触发。

tf.summary.merge_all() 可以把所有概要操作合并，其执行的结果是经过protocol buffer序列化后的tf.Summary对象。

Tensorflow是一个完整的Python应用，通过命令行启动Web。

tensorboard --logdir XXX

## --logdir 为写入日志文件的目录路径
## --port 设置服务端口，默认为6006
## --event_file 指定某一特定的事件日志文件
## --reload_interval 服务后台重新加载数据的间隔，默认为每120秒

本地服务启动后，可在浏览器打开http://localhost:6006/进行访问。

范例：

with tf.name_scope("inputs"):
    X = tf.placeholder(tf.float32, shape = [None, 6])
    y = tf.placeholder(tf.float32, shape = [None, 2])
with tf.name_scope("classifier"):
    b = tf.Variable(tf.zeros([2]), name = 'bias')
    w = tf.Variable(tf.random_normal([6, 2]), name = 'weight')
    y_pred = tf.nn.softmax(tf.matmul(X, w) + b)
    tf.summary.histogram('weights', w)
    tf.summary.histogram('bias', b)
with tf.name_scope("cost"):
    cross_entropy = - tf.reduce_sum(y * tf.log(y_pred + 1e-10),reduction_indices = 1)
    cost = tf.reduce_mean(cross_entropy)
    tf.summary.scalar('loss', cost)
train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
with tf.name_scope("accuary"):
    correct = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct, "float"))
    tf.summary.scalar('accuracy', accuracy)
init = tf.initialize_all_variables()
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./logs', sess.graph)
    merged = tf.summary.merge_all()
    sess.run(init)
    saver = tf.train.Saver(max_to_keep=1)
    max_acc = 0
    with open('./acc.txt', 'w') as f:
        for epoch in range(10):
            total_loss = 0
            for i in range(len(X_train)):
                result, loss = sess.run([train_op, cost], feed_dict={X: X_train[i].reshape(1,6), y: y_train[i].reshape(1,2)})
                total_loss += loss
            print('Epoch: %02d, total loss = %.6f' % (epoch + 1, total_loss))
            summary, acc, loss = sess.run([merged, accuracy, cost], feed_dict={X: X_val, y: y_val})
            writer.add_summary(summary, epoch)
            f.write('Epoch: %02d, Accuracy = %.6f, Loss = %.6f \n' % (epoch + 1, acc, loss))
            print('Epoch: %02d, Accuracy = %.6f, Loss = %.6f' % (epoch + 1, acc, loss))
            if acc > max_acc:
                max_acc = acc
                saver.save(sess, "./ckpt/model.ckpt", global_step=epoch)

7. TensorFlow的数据读取

三种数据加载：a. Python代码提供 b. 构建计算图开始，利用管道从文件读取 c. 预先加载用常量或变量保存在内存中。

对于小量级数据，直接加载到内存或显存中：

    csv：pandas.read_csv()与XX.to_csv()
    npy或npz：numpy.save()与numpy.load()
    pickle：pickle.dump(XX, open('XX.pkl', 'wb'))与pickle.load(open('XX.pkl', 'rb'))
    hdf：h5py.File('XX.h5','w')与h5py.File('XX.h5','r')

对于大数据，TensorFlow推荐使用自家的TFRecord文件，普通数据很容易转换成为TFRecord格式的文件。只需将每一条样本组装成为protocol buffer定义的Example结构的对象，序列化成为字符串，再由tf.python_io.TFRecordWriter写入文件即可。

范例：

def transform_to_tfrecord(filename):
    data = pd.read_csv(filename)
    tfrecord_file = filename.split('.')[0] + '.tfrecords'
    def int_feature(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
    def float_feature(value):
        return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    for i in range(len(data)):
        features = tf.train.Features(feature = {
            'Sex': int_feature(1 if data['Sex'][i] == 'male' else 0),
            'Age': float_feature(data['Age'][i]),
            'Pclass': int_feature(data['Pclass'][i]),
            'SibSp': int_feature(data['SibSp'][i]),
            'Parch': int_feature(data['Parch'][i]),
            'Fare': float_feature(data['Fare'][i]),
            'Survived': int_feature(data['Survived'][i]),
            'Deceased': int_feature(int(not data['Survived'][i]))})
        example = tf.train.Example(features=features)
        writer.write(example.SerializeToString())
    writer.close()

从TFRecord格式的文件读取数据使用TFRecordReader：

tf.train.string_input_producer：用于定义TFRecord文件作为模型结构的输入部分。
TFRecordReader.read：打开文件
tf.parse_single_example：解析成一条可用的数据
tf.train.shuffle_batch：设置内存读取样本的上限与上限训练batch批次大小等参数
tf.train.start_queue_runners：开启对应运行会话Session的所有线程队列并返回线程句柄
Coordinator类：负责实现数据输入与训练程序的同步

范例：

def decode_from_tfrecord(tfrecord_file, num_threads=2, num_epochs=100, batch_size=10, min_after_dequeue=10):
    reader=tf.TFRecordReader()
    filename_queue = tf.train.string_input_producer(tfrecord_file, num_epochs=num_epochs)
    _,serialized_example=reader.read(filename_queue)
    features = {
                'Sex': tf.FixedLenFeature([], tf.int64),
                'Age': tf.FixedLenFeature([], tf.float32),
                'Pclass': tf.FixedLenFeature([], tf.int64),
                'SibSp': tf.FixedLenFeature([], tf.int64),
                'Parch': tf.FixedLenFeature([], tf.int64),
                'Fare': tf.FixedLenFeature([], tf.float32),
                'Survived': tf.FixedLenFeature([], tf.int64),
                'Deceased': tf.FixedLenFeature([], tf.int64)}
    featuresdict=tf.parse_single_example(serialized_example,features=features)
    labels = featuresdict.pop(['Survived', 'Deceased'])
    features = [tf.cast(value, tf.float32) for value in featuresdict.values()]
    features, labels = tf.train.shuffle_batch(
        [features, labels],
        batch_size = batch_size,
        num_threads = num_threads,
        capacity = min_after_dequeue + 3*batch_size,
        min_after_dequeue = min_after_dequeue)
    return features, labels

def train_with_dequeuerunner(tfrecord_file):
    x,y = decode_from_tfrecord(tfrecord_file)
    with tf.Session() as sess:
        tf.group(tf.global_variables_initializer(),tf.local_variables_initializer()).run()
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
        try:
            i = 0
            while not coord.should_stop():
                features, labels = sess.run([x, y])
                i = i + 1
                if i % 100 == 0:
                    print('i %d:' % i, labels)
        except tf.errors.OutOfRangeError:
            print('Done training -- epoch limit reached')
        finally:
            coord.request_stop()
        coord.join(threads)

8. SkFlow、TFLearn与TF-Slim

SkFlow是Tensorflow推出的仿照Scikit-Learn设计的高级API，对多种常用的分类回归模型做了封装。

TFLearn是开源社区贡献完成的仿照Scikit-Learn设计的高级API，同样对多种常用的分类回归模型做了封装。

TF-Slim是Tensorflow推出高级接口库，在图像方面有较大优势，包含很多新的层和评估标准，还内置了各种图像识别模型。

echo__Moon

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tensorflow入门

一般引入tensorflow都用语句： import tensorflow as tf1. 常量：a=tf.constant(10)2. 变量：x=tf.Variable(tf.ones([3,3])) y=tf.Variable(tf.zeros([3,3]))变量定义完后，还必须显式的执行一下初始化操作，即需要在后面加上一句：init=tf.initializ...
复制链接

扫一扫