关于tf.feature_column

最新推荐文章于 2023-03-21 11:41:07 发布

虚拟搬运工

最新推荐文章于 2023-03-21 11:41:07 发布

阅读量1.2k

点赞数

文章标签： python tensorflow keras

本文链接：https://blog.csdn.net/u010048197/article/details/124076380

版权

先写结论

tf.feature_column已经过期，在tensorflow2.x版本中，直接用keras的preprocessing layers中的接口。参看官方说明。
给模型输入数据也可以自己写方法，tf.feature_column和keras的preprocessing layers中的接口都只是工具，简化使用。
了解tf.feature_column，可以看这篇官方写的Introducing TensorFlow Feature Columns，或者自己搜一篇中文拷贝版，例如这篇

再说稀疏矩阵和密度矩阵

看代码，

import tensorflow as tf
dense_tensor = tf.eye(8,8)
sparse_tensor = {'v': tf.sparse.from_dense(dense_tensor)}
print(dense_tensor)
print(sparse_tensor["v"].values)
print(sparse_tensor["v"].indices)

#密度矩阵
tf.Tensor(
[[1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]], shape=(8, 8), dtype=float32)
 
 #稀疏矩阵
tf.Tensor([1. 1. 1. 1. 1. 1. 1. 1.], shape=(8,), dtype=float32)
tf.Tensor(
[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]
 [5 5]
 [6 6]
 [7 7]], shape=(8, 2), dtype=int64)

稀疏矩阵只记录值不为0的数据和位置，可以大大降低数据量。

现在说tf.feature_column

其实只要看前面的文章就足够了。另外再啰嗦几点，

输入模型的数据类型就几种，one_hot，multi_hot，张量，embedding张量。
原始数据根据需要，映射为1）中的4种类型之一。
编写上，先写一个原始数据到1）数据类型的处理“管道”，“管道”中调用tf.feature_column的各种方法，然后把原始数据输入“管道”加工，然后进入模型的输入层，例如，

import tensorflow as tf
video_id = tf.feature_column.categorical_column_with_identity(
    key='video_id', num_buckets=1000000, default_value=0)
columns = [video_id] #columns就是数据处理的“管道”

features = {'video_id': tf.sparse.from_dense([[2, 85, 0, 0, 0],
[33,78, 2, 73, 1]])}
linear_prediction = tf.compat.v1.feature_column.linear_model(features,
columns) #数据features投入管道columns，然后进入linear_model层