机器翻译的TFX-pipeline（简单Transformer）实现_transformers库pipeline镜像设置-CSDN博客

本文链接：https://blog.csdn.net/T_eddy/article/details/131734892

写在前面

因为只是测试，模型配置缩小，训练几个step（请更改constants.py和configs.py配置），这里模型准确率低，翻译效果不好。
github项目地址：https://github.com/032004129xuzhiyong/NMT_tfx_pipeline , 本文只是其中的测试ipynb文件
NMT教程：https://tensorflow.google.cn/text/tutorials/transformer
数据集：教程https://tensorflow.google.cn/text/tutorials/nmt_with_attention中的链接http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip。或者已经下载解压缩在该项目目录spa-eng中
任务：从Spanish翻译为English
运行：进入项目目录，然后直接python local_runner.py
环境：见requirement.yaml，需要tfx(1.13.0)可能旧版本会有不兼容，api不稳定。
项目目录：
- custom目录包含定义的预处理（教程来自https://tensorflow.google.cn/text/guide/subwords_tokenizer…等）和Transformer模型。模型结构在这里修改，下面的model.py只是定义模型如何运行，如何导出。
- data目录包含pipeline需要的输入数据，下面的创建pipeline数据就是生成这个
- models目录包含pipeline需要的预处理和模型训练代码。
  - constants.py 定义预处理和模型训练参数
  - model.py 定义pipeline需要进行的模型训练步骤
  - preprocessing.py 定义预处理步骤
- pipeline目录包含Pipeline的配置参数和整个pipeline定义
  - configs.py 定义pipeline参数
  - pipeline.py 定义pipeline的组件，以及模型验证的配置
- spa-eng目录包含原始的数据集，pipeline不需要
- tfx_metadata目录是运行pipeline后自动生成的元数据目录
- tfx_pipeline_output目录是运行pipeline后自动生成的组件输出
- vocab目录包含生成的词表
- local_runner.py用于运行pipeline的python文件,运行直接python local_runner.py
- moduletest.ipynb就是本文件
- requirement.yaml就是程序运行的环境(tfx:1.13.0)，由conda导出，环境中有个包model-card-toolkit有冲突，可以不用。
注意：
- 每次生成vocab大小都不一样，需要修改models目录下的constants.py中的词表大小。
- 这个pipeline是在本地运行的。
- pipeline运行多次后，tf_pipelie_output可能变得很大，它包含每次各个组件的结果。如果不需要，整个删除
- tf_pipelie_output运行结果可以结合各个组件对应的库（如tfdv、tft、tfma、TF-serving）导入结果，可视化结果，部署模型等。
- 运行前，最好修改一下模型参数，由于笔者个人电脑限制，模型大小调小。如果资源足够，可以d_model和dff翻倍，num_layers为8，batch调大
- 导出的模型签名（示例都在下面）：
  - serving_default签名函数，需要原始输入序列化后的examples数据，是为了用于Evaluator组件评估（一般模型输出就是我们需要的，但是这里不是）。
  - transform_features是预处理的签名函数，需要原始输入序列化后的examples数据。
  - translator是用于翻译的签名函数，需要原始输入序列化后的examples数据。（当然也可以改为Tensor输入）
  - train_step是用于继续训练的train_step的签名函数，需要输入原始输入的Tensor数据（比较方便）。一次输入一个batch的数据。当然也可以直接加载原始数据，然后用transform_features签名函数预处理（先batch再预处理），最后用没有任何签名的模型（也就是刚加载的模型：它的输入是经过预处理的数据）使用fit方法训练(要有fit方法需要用第二种导入)
- 该pipeline还可以扩展或优化，比如
  - 添加Tuner组件，进行超参数调优
  - 增加保存点，进行断点续训
  - 将translator输入签名变为原始输入Tensor，就不用再序列化。
  - 在运行pipeline前，正确定义Schema然后将其路径作为create_pipeline函数的参数schema_path，这样可以多个ExampleValidator数据验证组件，提前观测数据漂移，训练-服务偏斜，其他异常等。
  - 添加kubeflow的config配置(需要能访问到整个项目文件，比如将这个文件放到云存储桶中或绑定一个持久卷声明，建议查看tfx的template示例)，运行后生成pipeline的压缩文件，然后可以上传到kubeflow的pipeline上运行。（需要能访问外网，因为workflow需要gcr的镜像，镜像很大）

预处理测试

import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
import tensorflow_text as tf_text
from tfx import v1 as tfx
from tensorflow_transform.tf_metadata import schema_utils

import tempfile

from custom.bertpreprocess import BertTokenizerModule

2023-07-13 20:01:41.287848: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-13 20:01:44.243019: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:absl:Failed to import tensorflow serving protos. It can fail if the TF version doesn't match with the TF Serving version. We will try importing again with a workaround:module 'tensorflow.core.protobuf.error_codes_pb2' has no attribute '_CODE'

raw_data = [
    {'context':b'Una vez hubo aqu\xc3\xad una iglesia.','target':b'There was a church here once.'},
    {'context':b'\xc2\xbfCu\xc3\xa1l es tu nombre completo?','target':b"What's your full name?"},
    {'context':b'No tendr\xc3\xa1s ning\xc3\xban problema m\xc3\xa1s.','target':b"You'll have no more problems."},
    {'context':b'Tom le mostr\xc3\xb3 a Mary la foto de John.','target':b"Tom showed Mary John's picture."},
    {'context':b'Pareces un polic\xc3\xada.','target':b'You look like a policeman.'},
]

raw_data_metadata = tft.DatasetMetadata(
    schema_utils.schema_from_feature_spec({
        'context': tf.io.FixedLenFeature(shape=[],dtype=tf.string),
        'target': tf.io.FixedLenFeature(shape=[],dtype=tf.string),
    })
)

import pandas as pd
df = pd.DataFrame(raw_data)
df

	context	target
0	b'Una vez hubo aqu\xc3\xad una iglesia.'	b'There was a church here once.'
1	b'\xc2\xbfCu\xc3\xa1l es tu nombre completo?'	b"What's your full name?"
2	b'No tendr\xc3\xa1s ning\xc3\xban problema m\x...	b"You'll have no more problems."
3	b'Tom le mostr\xc3\xb3 a Mary la foto de John.'	b"Tom showed Mary John's picture."
4	b'Pareces un polic\xc3\xada.'	b'You look like a policeman.'

dict(df)

{'context': 0             b'Una vez hubo aqu\xc3\xad una iglesia.'
 1        b'\xc2\xbfCu\xc3\xa1l es tu nombre completo?'
 2    b'No tendr\xc3\xa1s ning\xc3\xban problema m\x...
 3      b'Tom le mostr\xc3\xb3 a Mary la foto de John.'
 4                        b'Pareces un polic\xc3\xada.'
 Name: context, dtype: object,
 'target': 0      b'There was a church here once.'
 1             b"What's your full name?"
 2      b"You'll have no more problems."
 3    b"Tom showed Mary John's picture."
 4         b'You look like a policeman.'
 Name: target, dtype: object}

MAX_TOKENS=30
def preprocessing_fn(inputs):
    with tf.init_scope():
        en_tokenizer=BertTokenizerModule('./vocab/en_vocab.txt')
        spa_tokenizer=BertTokenizerModule('./vocab/spa_vocab.txt')
    spa = spa_tokenizer.tokenize(tf_text.normalize_utf8(inputs['context'],'NFKD'))
    spa, _ = tf_text.pad_model_inputs(spa,max_seq_length=MAX_TOKENS)
    
    en = en_tokenizer.tokenize(tf_text.normalize_utf8(inputs['target'],'NFKD'))
    en, _ = tf_text.pad_model_inputs(en,max_seq_length=MAX_TOKENS+1)
    en_inputs = en[:,:-1]
    en_labels = en[:,1:]
    
    return {
        'context_in':spa,
        'target_in':en_inputs,
        'target_out':en_labels
    }

def main(output_dir):
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
        transformed_dataset, transform_fn = (
            (raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
        )
    transformed_data, transformed_metadata = transformed_dataset
    
    _ = (
        transform_fn
        | 'WriteTransformFn' >> tft_beam.WriteTransformFn(output_dir))
    return transformed_data, transformed_metadata

output_dir = tempfile.mkdtemp()
transformed_data, transformed_metadata=main(output_dir)

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.




WARNING:absl:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).
2023-07-12 16:05:27.948661: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-12 16:05:28.017529: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
WARNING:absl:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).
WARNING:absl:You are outputting instance dicts from `TransformDataset` which will not provide optimal performance. Consider setting  `output_record_batches=True` to upgrade to the TFXIO format (Apache Arrow RecordBatch). Encoding functionality in this module works with both formats.
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['/home/xzy/anaconda3/envs/tfx/lib/python3.9/site-packages/ipykernel_launcher.py', '-f', '/home/xzy/.local/share/jupyter/runtime/kernel-556c4c67-5714-4385-b1d9-794589411043.json']


INFO:tensorflow:Assets written to: /tmp/tmpd1xlh5dp/tftransform_tmp/81b2cef6d77041b5a3d7d63b0399f8aa/assets


INFO:tensorflow:Assets written to: /tmp/tmpd1xlh5dp/tftransform_tmp/81b2cef6d77041b5a3d7d63b0399f8aa/assets


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['/home/xzy/anaconda3/envs/tfx/lib/python3.9/site-packages/ipykernel_launcher.py', '-f', '/home/xzy/.local/share/jupyter/runtime/kernel-556c4c67-5714-4385-b1d9-794589411043.json']

tf_output = tft.TFTransformOutput(output_dir)

infer=tf_output.transform_features_layer()

INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.

infer({'context':tf.constant(['',b'Una vez hubo aqu\xc3\xad una iglesia.']),
      'target':tf.constant(['',b'There was a church here once.'])})

{'context_in': <tf.Tensor: shape=(2, 30), dtype=int64, numpy=
 array([[   2,    3,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0],
        [   2,    1,  128,  821,    1,   78, 1060,   15,    3,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0]])>,
 'target_in': <tf.Tensor: shape=(2, 30), dtype=int64, numpy=
 array([[  2,   3,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0],
        [  2,   1,  66,  26, 970, 108, 391,  11,   3,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0]])>,
 'target_out': <tf.Tensor: shape=(2, 30), dtype=int64, numpy=
 array([[  3,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0],
        [  1,  66,  26, 970, 108, 391,  11,   3,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
           0,   0,   0,   0]])>}

创建pipeline数据集

import logging
import time

import numpy as np
import matplotlib.pyplot as plt

import tensorflow_datasets as tfds
import tensorflow as tf

import tensorflow_text
from custom.bertpreprocess import BertPreprocess

2023-07-14 21:08:26.017291: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-14 21:08:26.679469: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

# # Download the file
import pathlib

# path_to_zip = tf.keras.utils.get_file(
#     'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip',
#     extract=True,cache_dir='.',cache_subdir='.')

# path_to_file = pathlib.Path(path_to_zip).parent/'spa-eng/spa.txt'
path_to_file = pathlib.Path('./spa-eng/spa.txt')

def load_data(path: pathlib.Path):
    text = path.read_text(encoding='utf-8')
    lines = text.splitlines()
    pairs = [line.split('\t')  for line in lines]
    
    en = np.array([en for en, spa in pairs])
    spa = np.array([spa for en, spa in pairs])
    return en, spa

en_raw, spa_raw = load_data(path_to_file)

BUFFER_SIZE = len(spa_raw)
BATCH_SIZE = 64

is_train = np.random.uniform(size=(len(en_raw),)) < 0.8

train_raw = (
    tf.data.Dataset
    .from_tensor_slices((spa_raw[is_train], en_raw[is_train]))
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE))
val_raw = (
    tf.data.Dataset
    .from_tensor_slices((spa_raw[~is_train], en_raw[~is_train]))
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE))

2023-07-14 21:08:32.775343: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-14 21:08:32.799577: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

生成词表(vocab)

#不同次运行，可能生成不同大小的词汇表
#需要修改models目录下constants.py中的参数
len(spa_vocab),len(en_vocab)

(4801, 3888)

%%time
max_vocab_size = 5000
spa_vocab = BertPreprocess.generate_vocab(train_raw.map(lambda spa,en:spa),max_vocab_size)
en_vocab = BertPreprocess.generate_vocab(train_raw.map(lambda spa,en:en),max_vocab_size)

WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7ff4a9802ee0> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7ff4a9802ee0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function <lambda> at 0x7ff4a9802ee0> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7ff4a9802ee0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


2023-07-14 21:08:35.375294: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [95271]
	 [[{{node Placeholder/_1}}]]
2023-07-14 21:08:35.375514: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [95271]
	 [[{{node Placeholder/_1}}]]


WARNING:tensorflow:AutoGraph could not transform <function <lambda> at 0x7ff576e47310> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7ff576e47310>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function <lambda> at 0x7ff576e47310> and will run it as-is.
Cause: could not parse the source code of <function <lambda> at 0x7ff576e47310>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


2023-07-14 21:09:10.383556: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [95271]
	 [[{{node Placeholder/_1}}]]
2023-07-14 21:09:10.383806: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [95271]
	 [[{{node Placeholder/_1}}]]


CPU times: user 54.6 s, sys: 684 ms, total: 55.3 s
Wall time: 53.1 s

#写入文件
BertPreprocess.write_vocab_file('./vocab/en_vocab.txt',en_vocab)
BertPreprocess.write_vocab_file('./vocab/spa_vocab.txt',spa_vocab)

生成pipeline的数据集

len(en_raw)

raw_dataset = tf.data.Dataset.from_tensor_slices((spa_raw,en_raw))

2023-07-13 20:02:00.316739: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-13 20:02:00.506817: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

def write_to_tfrecord(spa_line,en_line,f):
    print(spa_line)
    spa= spa_line.numpy()
    print(spa)
    en = en_line.numpy()
    example = tf.train.Example(features=tf.train.Features(feature={
        'context':tf.train.Feature(bytes_list=tf.train.BytesList(value=[spa])),
        'target':tf.train.Feature(bytes_list=tf.train.BytesList(value=[en])),
    }))
    f.write(example.SerializeToString())

###写入数据集，请注释
# with tf.io.TFRecordWriter('./data/data.tfrecord') as f:
#     for spa_line,en_line in raw_dataset:
#         spa= spa_line.numpy()
#         en = en_line.numpy()
#         example = tf.train.Example(features=tf.train.Features(feature={
#             'context':tf.train.Feature(bytes_list=tf.train.BytesList(value=[spa])),
#             'target':tf.train.Feature(bytes_list=tf.train.BytesList(value=[en])),
#         }))
#         f.write(example.SerializeToString())

2023-07-13 20:02:00.714819: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [118964]
	 [[{{node Placeholder/_1}}]]

rel = tf.data.TFRecordDataset('./data/data.tfrecord')

for ex in rel.take(5):
    a=tf.io.parse_single_example(ex,{
            'context':tf.io.FixedLenFeature(shape=[],dtype=tf.string),
            'target':tf.io.FixedLenFeature(shape=[],dtype=tf.string),
        })
    print(a)

{'context': <tf.Tensor: shape=(), dtype=string, numpy=b'Ve.'>, 'target': <tf.Tensor: shape=(), dtype=string, numpy=b'Go.'>}
{'context': <tf.Tensor: shape=(), dtype=string, numpy=b'Vete.'>, 'target': <tf.Tensor: shape=(), dtype=string, numpy=b'Go.'>}
{'context': <tf.Tensor: shape=(), dtype=string, numpy=b'Vaya.'>, 'target': <tf.Tensor: shape=(), dtype=string, numpy=b'Go.'>}
{'context': <tf.Tensor: shape=(), dtype=string, numpy=b'V\xc3\xa1yase.'>, 'target': <tf.Tensor: shape=(), dtype=string, numpy=b'Go.'>}
{'context': <tf.Tensor: shape=(), dtype=string, numpy=b'Hola.'>, 'target': <tf.Tensor: shape=(), dtype=string, numpy=b'Hi.'>}


2023-07-13 20:02:34.959953: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [1]
	 [[{{node Placeholder/_0}}]]

模型测试（两种导入）

import tensorflow as tf
from typing import List
import tensorflow as tf
import tensorflow_transform as tft
from tensorflow import keras
from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tensorflow_metadata.proto.v0 import schema_pb2

from models import constants
from custom.TransformerModel import Transformer,TranslatorForTFX,CustomSchedule,masked_accuracy,masked_loss

WARNING:absl:Failed to import tensorflow serving protos. It can fail if the TF version doesn't match with the TF Serving version. We will try importing again with a workaround:module 'tensorflow.core.protobuf.error_codes_pb2' has no attribute '_CODE'

第一种导入

model = tf.saved_model.load('./tfx_pipeline_output/nmt3/Trainer/model/6/Format-Serving/')

2023-07-14 20:42:25.503655: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-14 20:42:25.596098: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

#模型所有签名
model.signatures

_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(*, examples) at 0x7EFD522E07F0>, 'transform_features': <ConcreteFunction signature_wrapper(*, examples) at 0x7EFD522C7D90>, 'translator': <ConcreteFunction signature_wrapper(*, examples) at 0x7EFD5236F6A0>, 'train_step': <ConcreteFunction signature_wrapper(*, context_tensor, target_tensor) at 0x7EFD521E2670>})

#测试数据
spa = b'Una vez hubo aqu\xc3\xad una iglesia.'
en = b'There was a church here once.'
example = tf.train.Example(features=tf.train.Features(feature={
        'context':tf.train.Feature(bytes_list=tf.train.BytesList(value=[spa])),
        'target':tf.train.Feature(bytes_list=tf.train.BytesList(value=[en])),
    })).SerializeToString()
inputs = tf.constant([example])
inputs

<tf.Tensor: shape=(1,), dtype=string, numpy=
array([b'\n]\n.\n\x07context\x12#\n!\n\x1fUna vez hubo aqu\xc3\xad una iglesia.\n+\n\x06target\x12!\n\x1f\n\x1dThere was a church here once.'],
      dtype=object)>

预处理签名函数测试

model.signatures['transform_features'](inputs)

{'target_out': <tf.Tensor: shape=(1, 20), dtype=int64, numpy=
 array([[  1,  66,  26, 970, 108, 391,  11,   3,   0,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0]])>,
 'context_in': <tf.Tensor: shape=(1, 20), dtype=int64, numpy=
 array([[   2,    1,  128,  821,    1,   78, 1060,   15,    3,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0]])>,
 'target_in': <tf.Tensor: shape=(1, 20), dtype=int64, numpy=
 array([[  2,   1,  66,  26, 970, 108, 391,  11,   3,   0,   0,   0,   0,
           0,   0,   0,   0,   0,   0,   0]])>}

Serving_default测试

model.signatures['serving_default'](inputs)

{'outputs': <tf.Tensor: shape=(1, 20, 3868), dtype=float32, numpy=
 array([[[-13.698115 ,  15.020721 , -13.686973 , ..., -13.70266  ,
          -13.703123 , -13.693153 ],
         [-13.772773 ,   3.9995522, -13.760695 , ..., -13.778036 ,
          -13.775291 , -13.768736 ],
         [-11.426856 ,   1.8011227, -11.415457 , ..., -11.432271 ,
          -11.428927 , -11.423204 ],
         ...,
         [-14.367573 ,  -1.4908874, -14.365833 , ..., -14.368966 ,
          -14.367823 , -14.365198 ],
         [-14.408494 ,  -1.4793062, -14.406815 , ..., -14.409847 ,
          -14.408725 , -14.406109 ],
         [-14.4906435,  -1.4972291, -14.488987 , ..., -14.492    ,
          -14.490903 , -14.488233 ]]], dtype=float32)>}

翻译测试

spa = b'Si quieres sonar como un hablante nativo, debes estar dispuesto a practicar diciendo la misma frase una y otra vez de la misma manera en que un m\xc3\xbasico de banjo practica el mismo fraseo una y otra vez hasta que lo puedan tocar correctamente y en el tiempo esperado.'
example = tf.train.Example(features=tf.train.Features(feature={
        'context':tf.train.Feature(bytes_list=tf.train.BytesList(value=[spa])),
    })).SerializeToString()
inputs = tf.constant([example])
model.signatures['translator'](inputs)

{'outputs': <tf.Tensor: shape=(1,), dtype=string, numpy=
 array([b'[UNK] you you may to you you you you may follow walked ?ter a itate you touch signrop be start meeting you you fool you even shouldes you you you you you mustap its to you you you you other ,ur the want to you you you you you you you you you you you you still you plenty private you you you make . you let willing pay tight you you not you you you you is re . tend successful you you you just you you you just you you you you you you you belt would way wfe youent want button isn once you you call be . youop you don you don he rich answer'],
       dtype=object)>}

测试原始数据Tensor数据训练

#将一维数据转为tensor类型，因为保存的模型的签名只接受Tensorflow专属类型输入
spa_tensor = tf.convert_to_tensor(spa_raw)
spa_tensor

<tf.Tensor: shape=(118964,), dtype=string, numpy=
array([b'Ve.', b'Vete.', b'Vaya.', ...,
       b'Una huella de carbono es la cantidad de contaminaci\xc3\xb3n de di\xc3\xb3xido de carbono que producimos como producto de nuestras actividades. Algunas personas intentan reducir su huella de carbono porque est\xc3\xa1n preocupados acerca del cambio clim\xc3\xa1tico.',
       b'Como suele haber varias p\xc3\xa1ginas web sobre cualquier tema, normalmente s\xc3\xb3lo le doy al bot\xc3\xb3n de retroceso cuando entro en una p\xc3\xa1gina web que tiene anuncios en ventanas emergentes. Simplemente voy a la siguiente p\xc3\xa1gina encontrada por Google y espero encontrar algo menos irritante.',
       b'Si quieres sonar como un hablante nativo, debes estar dispuesto a practicar diciendo la misma frase una y otra vez de la misma manera en que un m\xc3\xbasico de banjo practica el mismo fraseo una y otra vez hasta que lo puedan tocar correctamente y en el tiempo esperado.'],
      dtype=object)>

en_tensor = tf.convert_to_tensor(en_raw)
en_tensor

<tf.Tensor: shape=(118964,), dtype=string, numpy=
array([b'Go.', b'Go.', b'Go.', ...,
       b'A carbon footprint is the amount of carbon dioxide pollution that we produce as a result of our activities. Some people try to reduce their carbon footprint because they are concerned about climate change.',
       b'Since there are usually multiple websites on any given topic, I usually just click the back button when I arrive on any webpage that has pop-up advertising. I just go to the next page found by Google and hope for something less irritating.',
       b'If you want to sound like a native speaker, you must be willing to practice saying the same sentence over and over in the same way that banjo players practice the same phrase over and over until they can play it correctly and at the desired tempo.'],
      dtype=object)>

#需要指定关键字参数，否则报错，关键字为输入签名中的name设定
model.signatures['train_step'](context_tensor=spa_tensor[:64],target_tensor=en_tensor[:64])

{'masked_accuracy': <tf.Tensor: shape=(), dtype=float32, numpy=0.67725986>,
 'loss': <tf.Tensor: shape=(), dtype=float32, numpy=1.6610923>}

第二种导入

#需要指定自定义的类（学习率调度）、函数（包括metrics，loss）
model_load=tf.keras.models.load_model('./tfx_pipeline_output/nmt3/Trainer/model/6/Format-Serving/',
custom_objects={'masked_accuracy':masked_accuracy,'masked_loss':masked_loss,'CustomSchedule':CustomSchedule})

WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.saving.legacy.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7fd3bfe86a30> and <keras.engine.input_layer.InputLayer object at 0x7fd53d278280>).


WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. For example, in the saved checkpoint object, `model.layer.weight` and `model.layer_copy.weight` reference the same variable, while in the current object these are two different variables. The referenced variables are:(<keras.saving.legacy.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7fd3bfe86a30> and <keras.engine.input_layer.InputLayer object at 0x7fd53d278280>).
2023-07-14 20:44:10.156792: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs_1' with dtype float and shape [?,?]
	 [[{{node inputs_1}}]]
2023-07-14 20:44:10.156895: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor '115617' with dtype float and shape [2048,128]
	 [[{{node 115617}}]]
2023-07-14 20:44:10.237320: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape [?,?]
	 [[{{node Placeholder_1}}]]
2023-07-14 20:44:10.237433: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'transformer/115968' with dtype float and shape [2048,128]
	 [[{{node transformer/115968}}]]

train_step签名函数测试

weights = model_load.get_weights()

model_load.signatures['train_step'](context_tensor=spa_tensor[:64],target_tensor=en_tensor[:64])

{'loss': <tf.Tensor: shape=(), dtype=float32, numpy=1.6610311>,
 'masked_accuracy': <tf.Tensor: shape=(), dtype=float32, numpy=0.6772633>}

weights_train = model_load.get_weights()

np.array_equal(weights_train[0],model_load.get_weights()[0])

True

#train_step后模型参数改变
for i in range(len(weights)):
    print(np.array_equal(weights[i],weights_train[i]))

False...(都是False)

model_load.signatures['train_step'](context_tensor=spa_tensor[:64],target_tensor=en_tensor[:64])

{'loss': <tf.Tensor: shape=(), dtype=float32, numpy=1.6604986>,
 'masked_accuracy': <tf.Tensor: shape=(), dtype=float32, numpy=0.6773393>}

weights_train2 = model_load.get_weights()

for i in range(len(weights)):
    print(np.array_equal(weights_train[i],weights_train2[i]))

False...(都是False)

for i in range(len(weights)):
    print(np.array_equal(model_load.get_weights()[i],weights_train2[i]))

True...(都是True)

继续训练（通过原始文本数据）

raw_tensor_dataset = tf.data.Dataset.from_tensor_slices((spa_tensor,en_tensor)).batch(32)

for index, (context_b, target_b) in raw_tensor_dataset.take(10).enumerate():
    metrics_dict=model_load.signatures['train_step'](context_tensor=context_b,target_tensor=target_b)
    print('\rStep: ',index,',  metrics: ',metrics_dict,end='',flush=True)

2023-07-14 20:45:27.488129: I tensorflow/core/grappler/optimizers/data/replicate_on_split.cc:32] Running replicate on split optimization
2023-07-14 20:45:27.502371: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_5' with dtype string and shape [118964]
	 [[{{node Placeholder/_5}}]]


Step:  tf.Tensor(9, shape=(), dtype=int64) ,  metrics:  {'loss': <tf.Tensor: shape=(), dtype=float32, numpy=1.6589187>, 'masked_accuracy': <tf.Tensor: shape=(), dtype=float32, numpy=0.6777657>}}