tf.lookup: 在tf内部做string2id的映射

最新推荐文章于 2021-11-19 11:55:10 发布

我是女孩

最新推荐文章于 2021-11-19 11:55:10 发布

阅读量624

点赞数

分类专栏： tensorflow 文章标签： tensorflow 人工智能 python

本文链接：https://blog.csdn.net/u013385018/article/details/121184873

版权

tensorflow 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

之前在写tf模型的时候，对于category类型的特征，经常是预处理成id，然后才输入到模型中去，category->id的映射通常是独立与tf代码的。tf.lookup模块中提供了使用tensorflow原生api将category特征映射为id的方法，本文将介绍这些方法。

tf.lookup模块中有两类方法：

Initializer：负责构建 category -> id 映射表
- tf.lookup.KeyValueTensorInitializer: 通过显式指定 category->id构建映射表
- tf.lookup.TextFileInitializer：从文件读取数据构建映射表
Table：复制执行 category -> id 的映射
- tf.lookup.StaticHashTable：通过给定的映射表，进行映射，如果没有找到，则返回默认值
- tf.lookup.StaticVocabularyTable：通过给定的映射表，进行映射，如果没有找到，则会映射为 hash(<term>) % num_oov_buckets + vocab_size

输入

import tensorflow as tf 
keys_tensor = tf.constant(['牛奶', '鸡蛋'])
vals_tensor = tf.constant([3, 4])
input_tensor = tf.constant(['鸡蛋', '白菜'])
table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1)
out = table.lookup(input_tensor)
with tf.Session() as sess:
    sess.run(tf.tables_initializer())
    print(sess.run(out))

输出

输入

""" profile_feats.txt
    hello
    world
"""
init = tf.lookup.TextFileInitializer(
  filename='profile_feats.txt',
  key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
  value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticHashTable(init, -1)
out = table.lookup(tf.constant('world'))
with tf.Session() as sess:
    sess.run(tf.tables_initializer())
    print(sess.run(out))

输出

TF中的哈希处理列&哈希冲突处理

tf.lookup: 在tf内部做string2id的映射 - 知乎tensorflow1.15之前在写tf模型的时候，对于category类型的特征，经常是预处理成id，然后才输入到模型中去，category->id的映射通常是独立与tf代码的。tf.lookup模块中提供了使用tensorflow原生api将category特…https://zhuanlan.zhihu.com/p/346341718

TF中的哈希处理列&哈希冲突处理 - 知乎一. TF中的哈希处理列: 1.hash冲突的可能性高，它是调用的string_to_hash_bucket_fast进行的hash处理 2.hash冲突的可能性低，它是调用的string_to_hash_bucket_strong进行的hash处理二. sparse_column_with_hash_…https://zhuanlan.zhihu.com/p/38044142

我是女孩

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
tf.lookup: 在tf内部做string2id的映射

之前在写tf模型的时候，对于category类型的特征，经常是预处理成id，然后才输入到模型中去，category->id的映射通常是独立与tf代码的。tf.lookup模块中提供了使用tensorflow原生api将category特征映射为id的方法，本文将介绍这些方法。tf.lookup模块中有两类方法：Initializer：负责构建 category -> id 映射表 tf.lookup.KeyValueTensorInitializer: 通过显式指定 category-
复制链接

扫一扫