TensorFlow使用next_batch()读取/tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 3 fi

分批次读取数据

数据模板:



源代码

-------------------------------------------------

import tensorflow as tf
import numpy as np

def readMyFileFormat(fileNameQueue):
    reader = tf.TextLineReader()
    key, value = reader.read(fileNameQueue)

    record_defaults = [[1.0], [1.0], [1.0]]
    x1, x2, x3 = tf.decode_csv(value, record_defaults=record_defaults)
    features = tf.stack([x1, x2])
    label = x3
    return features, label

def inputPipeLine(fileNames=["Test.csv"], batchSize = 4, numEpochs = None):
    fileNameQueue = tf.train.string_input_producer(fileNames, num_epochs = numEpochs)
    example, label = readMyFileFormat(fileNameQueue)
    min_after_dequeue = 8
    capacity = min_after_dequeue + 3 * batchSize
    exampleBatch, labelBatch = tf.train.shuffle_batch([example, label], batch_size=batchSize, num_threads=3,  capacity=capacity, min_after_dequeue=min_after_dequeue)
    return exampleBatch, labelBatch

featureBatch, labelBatch = inputPipeLine(["Test.csv"], batchSize=4)
with tf.Session() as sess:
    # Start populating the filename queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    example, label = sess.run([featureBatch, labelBatch])
    print(example, label)
    coord.request_stop()
    coord.join(threads)
    sess.close()

----------------------------------------------------------------------------------------------------------------------------------------------------

出现错误:

C:/Users/User/PycharmProjects/Test/test.py
2017-09-25 15:04:46.598434: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.598711: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.598974: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.599224: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.599460: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.599708: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.599948: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-25 15:04:46.600168: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[[ 7.  8.]
 [ 2.  4.]
 [ 1.  2.]
 [ 1.  2.]] [ 11.  42.  10.  10.]
Traceback (most recent call last):
  File "C:/Users/JZ/PycharmProjects/Test/test.py", line 30, in <module>
    coord.join(threads)
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1063, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\JZ\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 3 fields but have 0 in record 0
[[Node: DecodeCSV = DecodeCSV[OUT_TYPE=[DT_FLOAT, DT_FLOAT, DT_FLOAT], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReaderReadV2:1, DecodeCSV/record_defaults_0, DecodeCSV/record_defaults_1, DecodeCSV/record_defaults_2)]]

错误原因:很可能是CSV文件数据最后还有多余的行,例如我出现的原因




Similarly, you might want to check out your data file to see if there is any wrong with it, even if it is as trivial as "a blank line".


参考资料:

https://stackoverflow.com/questions/43781143/tensorflow-decode-csv-expect-3-fields-but-have-5-in-record-0-when-given-5-de

http://blog.csdn.net/yiqingyang2012/article/details/68485382

http://wiki.jikexueyuan.com/project/tensorflow-zh/how_tos/reading_data.html

tensorflow官方文档


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
好的,对于这个任务,我们可以使用 TensorFlow 和 Keras 来构建一个文本分类模型。以下是步骤: 1. 加载数据集 我们可以使用 Pandas 库来加载 csv 文件,并使用 Sklearn 库中的 train_test_split 将数据集分为训练集和测试集。 ```python import pandas as pd from sklearn.model_selection import train_test_split df = pd.read_csv('https://raw.githubusercontent.com/SophonPlus/ChineseNlpCorpus/master/datasets/waimai_10k/waimai_10k.csv', header=None) df.columns = ['label', 'text'] X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42) ``` 2. 数据预处理 我们需要对文本进行预处理,包括去除停用词、分词、转化为数字向量等等。我们可以使用中文自然语言处理库 jieba 来进行分词,并使用 Keras 库中的 Tokenizer 对文本进行转化为数字向量。 ```python import jieba from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences # 分词 def cut_words(text): return ' '.join(jieba.cut(text)) X_train = X_train.apply(cut_words) X_test = X_test.apply(cut_words) # Tokenizer tokenizer = Tokenizer(num_words=5000) tokenizer.fit_on_texts(X_train) X_train_seq = tokenizer.texts_to_sequences(X_train) X_test_seq = tokenizer.texts_to_sequences(X_test) # Padding X_train_pad = pad_sequences(X_train_seq, maxlen=100) X_test_pad = pad_sequences(X_test_seq, maxlen=100) ``` 3. 构建模型 我们使用 Keras 库构建一个简单的神经网络模型,包括一个 Embedding 层、一个 LSTM 层和一个全连接层。 ```python from keras.models import Sequential from keras.layers import Embedding, LSTM, Dense model = Sequential() model.add(Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=100, input_length=100)) model.add(LSTM(units=64, dropout=0.2, recurrent_dropout=0.2)) model.add(Dense(units=1, activation='sigmoid')) model.summary() ``` 4. 训练模型 我们使用二元交叉熵作为损失函数,使用 Adam 优化器进行优化,并使用准确率作为评估指标。 ```python model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X_train_pad, y_train, validation_split=0.2, epochs=10, batch_size=64) ``` 5. 评估模型 我们使用测试集来评估模型的性能。 ```python loss, accuracy = model.evaluate(X_test_pad, y_test) print('Test loss:', loss) print('Test accuracy:', accuracy) ``` 完整代码如下: ```python import pandas as pd from sklearn.model_selection import train_test_split import jieba from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers import Embedding, LSTM, Dense # 加载数据集 df = pd.read_csv('https://raw.githubusercontent.com/SophonPlus/ChineseNlpCorpus/master/datasets/waimai_10k/waimai_10k.csv', header=None) df.columns = ['label', 'text'] X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42) # 分词 def cut_words(text): return ' '.join(jieba.cut(text)) X_train = X_train.apply(cut_words) X_test = X_test.apply(cut_words) # Tokenizer tokenizer = Tokenizer(num_words=5000) tokenizer.fit_on_texts(X_train) X_train_seq = tokenizer.texts_to_sequences(X_train) X_test_seq = tokenizer.texts_to_sequences(X_test) # Padding X_train_pad = pad_sequences(X_train_seq, maxlen=100) X_test_pad = pad_sequences(X_test_seq, maxlen=100) # 模型构建 model = Sequential() model.add(Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=100, input_length=100)) model.add(LSTM(units=64, dropout=0.2, recurrent_dropout=0.2)) model.add(Dense(units=1, activation='sigmoid')) model.summary() # 模型训练 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X_train_pad, y_train, validation_split=0.2, epochs=10, batch_size=64) # 模型评估 loss, accuracy = model.evaluate(X_test_pad, y_test) print('Test loss:', loss) print('Test accuracy:', accuracy) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

@RichardWang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值