【Tensorflow】TF中的字符串tf.string

最新推荐文章于 2025-04-02 15:15:37 发布

不用先生

最新推荐文章于 2025-04-02 15:15:37 发布

阅读量1.7w

点赞数

分类专栏： TensorFlow 文章标签： Tensorflow tf.string 深度学习

本文链接：https://blog.csdn.net/u013921430/article/details/101221896

版权

TensorFlow 专栏收录该内容

13 篇文章

订阅专栏

tf.string_to_number()

Tesorflow 版本：1.7.0,1.14.0

string 的定义

Tensorflow 中string 类型的定义与其他数值类型的定义一样，需要使用多种函数定义。主要包括以下几个函数和占位符的方式；

tf_str=tf.convert_to_tensor("Hello Tensorflow")
tf_str=tf.constant('Hello Tensorflow',dtype=tf.string)
tf_str=tf.Variable("Hello Tensorflow",shape=[2,2],dtype=tf.string)

string 类型常用的函数

了解一种类型，其常用的函数操作是必须要了解的，下面将介绍几种常用的函数。

tf.as_string()

从函数名字就可以看出这个函数的功能是将其他类型的Tensor转换成string类型。这个函数的形参有很多，但是大都带有默认值。使用方法为：

tf.as_string(input)

其中input要求为int32等数值类型以及bool类型的Tensor。当然，与许多其他函数一样，当输入为非Tensor的普通类型时，会自动转为Tensor再进行类型转换。

tf_num=tf.constant(123,dtype=tf.int32)
as_str=tf.as_string(tf_num)            #将数值类型Tensor转为字符串Tensor

as_str=tf.as_string(123)               #将普通数值转为字符串Tensor

tf.substr()

首先要说，与Python跟C++不同，Tensorflow对于单独的字符串不支持下标索引。

sub_str=_str[0:5]   #Python 支持，C++、TF不支持
sub_str=_str[5]     # C++、Python 支持， TF不支持

想要获取字符串的子串只能通过函数tf.substr()进行，函数名和形参列表如下；

def substr(input, pos, len, name=None)   #函数名和形参

sub_str=tf.substr(tf_str,3,2)            #函数使用

主要的参数有输入字符串，子串起始位置pos和长度len。在1.7.0等早期的版本中，要求pos必须为小于input长度的正整数，这点与C++字符串类型的成员函数substr()保持一致，毕竟Tensorflow的底层主要由C++实现。但是在1.14.0等后续的版本中对该参数进行了修改，pos可以使负数，此时与Python中的用法一样，为从后至前排序。

tf.string_to_number()

在Tensorflow中，数值类型和bool类型的Tensor之间类型转换使用tf.cast()函数，但是该函数不支持字符串类型。字符串转数值必须通过tf.string_to_number()函数，该函数与tf.as_string()功能相反，但是并不支持bool类型，默认的转换类型为tf.float32。

def string_to_number(string_tensor, out_type=_dtypes.float32, name=None)    #函数名和形参

tf_num=tf.string_to_number(tf_str,out_type=tf.int32)                        #函数使用

tf.string_split()

tf.string_split()函数的功能是分离字符串。他要求输入为一维的字符串，并根据delimiter来分离字符串，当不提供delimiter时，按照空格的位置来分割字符串。函数的返回一个稀疏Tensor，包含indeces和values两个属性，values记录分割得到的子串的值，indeces对应地指出子串在input中的位置。

def string_split(input, delimiter, skip_empty=True, name=None)    #函数命和参数

tf_str=tf.constant(['abcdabcda bc', "abda c"],dtype=tf.string)
split_d=tf.string_split(tf_str,'d')    #根据‘d’的位置分割字符串
split_d_values=split_d.values          #['abc' 'abc' 'a bc' 'ab' 'a c']

split_=tf.string_split(tf_str)         #根据空格分割字符串
split_values=split_.values             #['abcdabcda' 'bc' 'abda' 'c']
split_indices=split_.indices           #[[0 0][0 1][1 0][1 1]]

tf.string_join()

刚刚介绍了字符串分离，对应的就有字符串连接，tf.string_join()函数的功能是连接多个字符串，函数参数中的separator表示用什么字符将合并的字符串隔开。当然，Tensorflow支持字符串直接相加。

def string_join(inputs, separator="", name=None):           #函数名和参数

tf_str5=tf.constant('Hello Tensorflow:',shape=[2,2],dtype=tf.string)
tf_str6=tf.constant('Hello world',shape=[2,2],dtype=tf.string)
join=tf.string_join([tf_str5,tf_str6],separator=" ")   

#join=[['Hello Tensorflow: Hello world' 'Hello Tensorflow: Hello world']
#      ['Hello Tensorflow: Hello world' 'Hello Tensorflow: Hello world']]

add_=tf_str5+tf_str6

#add_=[['Hello Tensorflow:Hello world' 'Hello Tensorflow:Hello world']
#      ['Hello Tensorflow:Hello world' 'Hello Tensorflow:Hello world']]

tf.reduce_join()

从函数名可以看出这个函数的的功能与reduce_mean()、reduce_max()等类似，是将函数在给定的维度上拼接起来。用法业与这些函数相似。

def reduce_join(inputs, reduction_indices, keep_dims=False, separator="", name=None)  #函数名与参数

tf_str=tf.constant([['a' ,'b'],['c' ,'d']],dtype=tf.string)
reduce_join_0=tf.reduce_join(tf_str,axis=0)      #reduce_join_0=['ac' 'bd']
reduce_join_1=tf.reduce_join(tf_str,axis=1)      #reduce_join_1=['ab' 'cd']
reduce_join_=tf.reduce_join(tf_str)              #reduce_join_='abcd'

总结

从tf.string的一些常用的函数可以看出，Tensorflow的作者团队其实也是很矛盾的，这导致TF似乎想要追求python那样的方便，又想尽量靠近C++以求高效，最终都不太理想。目前TF在学术研究者中的使用率是要明显低于PyTorch的，也许Tensorflow 2.0的发布会给Tensorflow涨一波粉，作为一名TF坑中人，希望他越来越好吧！

已完。。。。