python 报错汇总【持续更新中....】

最新推荐文章于 2024-05-24 19:19:30 发布

wamg潇潇

最新推荐文章于 2024-05-24 19:19:30 发布

阅读量1.8k

点赞数 1

本文链接：https://blog.csdn.net/qq_29831163/article/details/109222762

版权

1. Variable encoder/embedding_encoder already exists, disallowed.

总结： 由于跑的翻译模型需要构建两个embed,一直报这个错误：

InvalidArgumentError (see above for traceback): assertion failed: [All values in memory_sequence_length must greater than zero.] [Condition x > 0 did not hold element-wise:] [x (source_seq_lengths:0) = ] [8 13 16...]

解决方式：

1. 增加 tf.get_variable_scope().reuse_variables() # 将在当前的variable_scope下，将变量设置为reuse = True

2. 先在main函数调用模型之前，加上 tf.reset_default_graph()

tf.reset_default_graph()  # 避免 Variable ./encoder/kernel already exists, disallowed.
model = Seq2Seq(args,logger=logger, batch_size=args.batch_size, model_path=args.ckpt_path)

然后在报错的Seq2Seq model的embeding 函数里，加一个assert声明变量全称：

"""    建立词向量    """
with tf.variable_scope("encoder"):
    embedding_encoder = tf.get_variable("embedding_encoder", [src_vocab_size, src_embed_size], dtype=dtype)
    
assert embedding_encoder.name == "encoder/embedding_encoder:0"  # 加上变量空间的变量全称，具有唯一性

with tf.variable_scope("decoder"):
    embedding_decoder = tf.get_variable("embedding_decoder", [tgt_vocab_size, tgt_embed_size], dtype=dtype)

2. memory_seq_length 必须都大于0；

解决方式：原始的文本数据中存在空行的样本，导致那一整行都是用PAD填充的，样本的实际长度就为0，模型会报错；

因此可以重新清洗数据，删除空行，可以用notepad++ 打开， Ctrl +F查找，扩展模式：搜 \r\n\r\n 替换为\r\n

3. ValueError: Dimensions must be equal,

ValueError: Dimensions must be equal, but are 128 and 192 for 'decoder_1/while/BasicDecoderStep/decoder_1/attention_wrapper/attention_wrapper/multi_rnn_cell/cell_0/lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [64,128], [192,256].

 # 报错的位置：
outputs, _, _ = contrib.seq2seq.dynamic_decode(decoder=decoder, output_time_major=True, maximum_iterations=maximum_iterations)

原因：在decoder 的第一层和第二层中使用了同一个cell

之前是

decoder_lstmcell = tf.nn.rnn_cell.LSTMCell(self.rnn_size)  # 用来构造一个decoder_cell
decoder_cell = tf.nn.rnn_cell.MultiRNNCell([decoder_lstmcell for _ in range(self.rnn_layer)])
decoder_init_state = self.encoder_final_state      # thought vector: 表示句子含义的数字序列

改正后，创建一个函数，

def get_decoder_cell(rnn_size):
    decoder_lstmcell = tf.nn.rnn_cell.LSTMCell(rnn_size)  # 用来构造一个decoder_cell
    return decoder_lstmcell
  
decoder_cell = tf.nn.rnn_cell.MultiRNNCell([get_decoder_cell(self.rnn_size) for _ in range(self.rnn_layer)])
decoder_init_state = self.encoder_final_state      # thought vector: 表示句子含义的数字序列

4. pip install pkgs 出现 Fatal error in launcher: Unable to create process using '".... python.exe"

解决方式：强制重装 pip
python3  -m pip install --upgrade --force-reinstall pip

5. (模型训练时) OOM when allocating tensor with shape[177,128,27926] ...

OOM when allocating tensor with shape[177,128,27926] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 
GPU内存不足导致 Out Of Memory,

解决方式：把batchsize由128 改为 32，vocab大小由27926改为10000, 改小。

6. InvalidArgumentError: Found Inf or NaN global norm.

报错的代码： clipped_gradients, _ = tf.clip_by_global_norm(gradients, 2)  # 梯度截断
报错信息：
InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had Inf values
	 [[node VerifyFinite/CheckNumerics (defined at D:\projects\seq2seq\bin\model.py:215)  = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", 
_device="/job:localhost/replica:0/task:0/device:CPU:0"](global_norm/global_norm)]]
可能原因：出现NaN或者Inf的原因一般可分为以下三种

输入数据有错，有时错误是由于记录文件和pbtxt文件之间的不一致引起的。
出现了运算错误，如除数为零，log0等
梯度爆炸

解决方法：

1. 首先调小学习速率，看看是否是由较高学习速率导致的。可以选择将学习速率降低一半，或者降低一个数量级。

2. 检查运算错误，主要是有除法运算和取log的地方。判断是否有0出现，以及是否有0导致的问题，试着使用clip_by_value对数值进行限制。

3. 检查数据是否清洗干净。

4. 检查 label_map.pbtxt中指定的类数，和TFRecord配置文件中的“ num_classes”字段是否相等。

7. .

# 报错详情
OP_REQUIRES failed at lookup_table_op.cc:784 : Failed precondition: HashTable

FailedPreconditionError: HashTable has different value for same key. Key  has 19991 and trying to add value 20330

8. error: unrecognized arguments:

# 输入命令
>>python ../THUMT/bin/trainer.py  --input corpus.en-zh.32k.en  corpus.en-zh.
32k.zh  --model transformer  
usage: trainer.py [<args>] [-h | --help]
trainer.py: error: unrecognized arguments: corpus.en-zh.32k.zh

原因：参数 input 是一个 list ,里面是两个路径参数， terminal端是用空格分隔不同的参数，因此在定义时，增加 nargs=2

parser = argparse.ArgumentParser(description="Training neural machine translation models",
                                     usage="trainer.py [<args>] [-h | --help]")
    # input files
parser.add_argument("--input", type=str, default=[os.path.join(data_path, "corpus.en_zh.32k.en"),
                  os.path.join(data_path, "corpus.en_zh.32k.zh")],
                  nargs=2, help="Path of source and target corpus")   # nargs=2,

wamg潇潇

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
python 报错汇总【持续更新中....】

1.Variableencoder/embedding_encoder already exists, disallowed.总结：由于跑的翻译模型需要构建两个embed,一直报这个错误：解决方式：先在main函数调用模型之前，加上tf.reset_default_graph()tf.reset_default_graph() # 避免 Variable ./encoder/kernel already exists, disallowed.model = Seq2...
复制链接

扫一扫