函数原型:
tf.string_split(source, delimiter=’ ', skip_empty=True)
参数:
-
source:一维tensor 对象,其数据类型为tf_string,作为需要被分割的字符串。这里要注意的是: 输入的字符串必须以列表的格式传入,比如[‘I am Chinese’], 不用方括号[]括起来话,会报错。
-
delimiter=’ ': 分割符,默认为空字符串
-
skip_empty=True:bool 型,表示是否跳过空的字符串
输出包括3个值: indices, value, dense_shape
举例 1: 分割一个字符串
import tensorflow as tf
A_str_tf = tf.constant('I am Chinese', dtype=tf.string)
A_arr_tf = tf.string_split([A_str_tf], delimiter=' ')
with tf.Session() as sess:
A_arr = sess.run(A_arr_tf)
print('A_arr:\n', A_arr)
print('\nindices:\n', A_arr.indices)
print('\nvalues:\n', [v.decode() for v in A_arr.values])
print('\ndense_shape:\n', A_arr.dense_shape)
# 输出为:
# A_arr:
# SparseTensorValue(indices=array([[0, 0],
# [0, 1],
# [0, 2]]), values=array([b'I', b'am', b'Chinese'], dtype=object), dense_shape=array([1, 3]))
#
# indices:
# [[0 0]
# [0 1]
# [0 2]]
#
# values:
# ['I', 'am', 'Chinese']
#
# dense_shape:
# [1 3]
举例 2: 分割两个字符串
import tensorflow as tf
A_str_tf = tf.constant('I am Chinese', dtype=tf.string)
B_str_tf = tf.constant('I love China so much', dtype=tf.string)
C_arr_tf = tf.string_split([A_str_tf, B_str_tf], delimiter=' ')
with tf.Session() as sess:
C_arr = sess.run(C_arr_tf)
print('C_arr:\n', C_arr)
print('\nindices:\n', C_arr.indices)
print('\nvalues:\n', [v.decode() for v in C_arr.values])
print('\ndense_shape:\n', C_arr.dense_shape)
# 输出为:
# C_arr:
# SparseTensorValue(indices=array([[0, 0],
# [0, 1],
# [0, 2],
# [1, 0],
# [1, 1],
# [1, 2],
# [1, 3],
# [1, 4]]), values=array([b'I', b'am', b'Chinese', b'I', b'love', b'China', b'so', b'much'],
# dtype=object), dense_shape=array([2, 5]))
#
# indices:
# [[0 0]
# [0 1]
# [0 2]
# [1 0]
# [1 1]
# [1 2]
# [1 3]
# [1 4]]
#
# values:
# ['I', 'am', 'Chinese', 'I', 'love', 'China', 'so', 'much']
#
# dense_shape:
# [2 5]
-
values 分割后的数据内容,返回的是一维向量
-
dense_shape(维度),会按最长的那个句子填充!!!