TFT(tensorflow_transform)常用操作

最新推荐文章于 2024-08-15 01:55:01 发布

起名大废废

最新推荐文章于 2024-08-15 01:55:01 发布

阅读量236

点赞数

分类专栏： tfx 文章标签： tensorflow python 人工智能

本文链接：https://blog.csdn.net/T_eddy/article/details/131411359

版权

tfx 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

该文展示了如何使用TensorFlowTransform库对数据进行预处理，包括创建元数据、定义预处理函数、执行转换并保存预处理操作。它还讨论了如何在后续模型中重用这些预处理步骤，以及如何将预处理后的数据与Keras模型结合。

摘要由CSDN通过智能技术生成

2023-06-26 23:15:53.877588: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-26 23:15:55.005099: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/TensorRT/lib:/usr/local/cuda-11.7/lib64
2023-06-26 23:15:55.005222: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/TensorRT/lib:/usr/local/cuda-11.7/lib64
2023-06-26 23:15:55.005232: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

准备数据和元数据

raw_data = [
      {'x': 1.0, 'y': 1.0, 's': 'hello'},
      {'x': 2.0, 'y': 2.0, 's': 'world'},
      {'x': 3.0, 'y': 3.0, 's': 'hello'}
  ]

#方法一，使用tfdv推断
import tensorflow_data_validation as tfdv
import pandas as pd
data = pd.DataFrame(raw_data)
stat = tfdv.generate_statistics_from_dataframe(data)
def clearDim(schema,stat):
    for field in data.columns:
        tfdv.get_feature(schema,field).shape.ClearField('dim')
    return schema
#推断的Schema需要清除Shape中的Dim字段才能用于tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)
#否则会报错
schema = tfdv.infer_schema(stat,max_string_domain_size=0,schema_transformations=[clearDim])
raw_data_metadata = dataset_metadata.DatasetMetadata(
    schema
)

#方法二，手动创建
# raw_data_metadata = dataset_metadata.DatasetMetadata(
#     schema_utils.schema_from_feature_spec({
#         'y': tf.io.FixedLenFeature([], tf.float32),
#         'x': tf.io.FixedLenFeature([], tf.float32),
#         's': tf.io.FixedLenFeature([], tf.string),
#     }))

准备与预处理函数

def preprocessing_fn(inputs):
    x = inputs['x']
    y = inputs['y']
    s = inputs['s']
    x_centered = x - tft.mean(x)
    y_normalized = tft.scale_to_0_1(y)
    s_integerized = tft.compute_and_apply_vocabulary(s)
    x_centered_times_y_normalized = (x_centered * y_normalized)
    return {
        'x_centered':x_centered,
        'y_normalized':y_normalized,
        's_integerized':s_integerized,
        'x_centered_times_y_normalized':x_centered_times_y_normalized
    }

执行转换和将预处理操作写入文件

def main(output_dir):
  # Ignore the warnings
  with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
    transformed_dataset, transform_fn = (  # pylint: disable=unused-variable
        (raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(
            preprocessing_fn))

  transformed_data, transformed_metadata = transformed_dataset  # pylint: disable=unused-variable

  # Save the transform_fn to the output_dir
  _ = (
      transform_fn
      | 'WriteTransformFn' >> tft_beam.WriteTransformFn(output_dir))

  return transformed_data, transformed_metadata

#输出目录不能存在相同文件,或者目录为空
output_dir = pathlib.Path('./transform_output')
transformed_data, transformed_metadata = main(str(output_dir))

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.




WARNING:tensorflow:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).


WARNING:tensorflow:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).


WARNING:tensorflow:From /home/xzy/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow_transform/tf_utils.py:324: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.


2023-06-26 23:15:57.191927: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-06-26 23:15:57.288381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/TensorRT/lib:/usr/local/cuda-11.7/lib64
2023-06-26 23:15:57.288444: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-06-26 23:15:57.289481: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From /home/xzy/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow_transform/tf_utils.py:324: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.


WARNING:tensorflow:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).


WARNING:tensorflow:You are passing instance dicts and DatasetMetadata to TFT which will not provide optimal performance. Consider following the TFT guide to upgrade to the TFXIO format (Apache Arrow RecordBatch).
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['/home/xzy/anaconda3/envs/tf/lib/python3.8/site-packages/ipykernel_launcher.py', '-f', '/home/xzy/.local/share/jupyter/runtime/kernel-f0c3eac2-332f-48d3-be51-f348ad18cc09.json']


INFO:tensorflow:Assets written to: /tmp/tmpt45kl1nk/tftransform_tmp/2480ce1b114f48579e5a9a3f65cc7f25/assets


INFO:tensorflow:Assets written to: /tmp/tmpt45kl1nk/tftransform_tmp/2480ce1b114f48579e5a9a3f65cc7f25/assets


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:Assets written to: /tmp/tmpt45kl1nk/tftransform_tmp/ec61c498ec57446d94add565b8b2c0ce/assets


INFO:tensorflow:Assets written to: /tmp/tmpt45kl1nk/tftransform_tmp/ec61c498ec57446d94add565b8b2c0ce/assets


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:tensorflow_text is not available.
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['/home/xzy/anaconda3/envs/tf/lib/python3.8/site-packages/ipykernel_launcher.py', '-f', '/home/xzy/.local/share/jupyter/runtime/kernel-f0c3eac2-332f-48d3-be51-f348ad18cc09.json']

注意

#在试验阶段（手动），输入和输出的数据都是List[Dict]
#但是重新加载模型后输入后输出的是Dict[List]
raw_data

[{'x': 1.0, 'y': 1.0, 's': 'hello'},
 {'x': 2.0, 'y': 2.0, 's': 'world'},
 {'x': 3.0, 'y': 3.0, 's': 'hello'}]

transformed_data

[{'s_integerized': 0,
  'x_centered': -1.0,
  'x_centered_times_y_normalized': -0.0,
  'y_normalized': 0.0},
 {'s_integerized': 1,
  'x_centered': 0.0,
  'x_centered_times_y_normalized': 0.0,
  'y_normalized': 0.5},
 {'s_integerized': 0,
  'x_centered': 1.0,
  'x_centered_times_y_normalized': 1.0,
  'y_normalized': 1.0}]

!ls {output_dir}

transform_fn  transformed_metadata

!ls {output_dir}/transformed_metadata

asset_map  schema.pbtxt

重用预处理操作

#加载方法一，使用方法未知
loaded = tf.saved_model.load(str(output_dir/'transform_fn'))
loaded.signatures['serving_default']

<ConcreteFunction signature_wrapper(*, inputs_1, inputs, inputs_2) at 0x7F0525A17BB0>

#加载方法二，使用如下
tf_transform_output = tft.TFTransformOutput(output_dir)
tft_layer = tf_transform_output.transform_features_layer()
tft_layer

INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:struct2tensor is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_decision_forests is not available.


INFO:tensorflow:tensorflow_text is not available.


INFO:tensorflow:tensorflow_text is not available.





<tensorflow_transform.output_wrapper.TransformFeaturesLayer at 0x7f0525a251c0>

raw_data_batch = {
    's': tf.constant([ex['s'] for ex in raw_data]),
    'x': tf.constant([ex['x'] for ex in raw_data], dtype=tf.float32),
    'y': tf.constant([ex['y'] for ex in raw_data], dtype=tf.float32),
}

transformed_batch = tft_layer(raw_data_batch)
transformed_batch

{'x_centered': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([-1.,  0.,  1.], dtype=float32)>,
 's_integerized': <tf.Tensor: shape=(3,), dtype=int64, numpy=array([0, 1, 0])>,
 'y_normalized': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0. , 0.5, 1. ], dtype=float32)>,
 'x_centered_times_y_normalized': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.,  0.,  1.], dtype=float32)>}

模型对接

class StackDict(tf.keras.layers.Layer):
    def call(self,inputs):
        values = [
          tf.cast(v,tf.float32)  for k, v in sorted(inputs.items(),key=lambda kv:kv[0])
        ]
        #将多个一维数组并列合并，即每个Field为一列
        return tf.stack(values,axis=1)

class TrainedModel(tf.keras.Model):
    def __init__(self):
        super().__init__(self)
        self.pre = tft_layer
        self.concat = StackDict()
        self.body = tf.keras.Sequential([
            tf.keras.layers.Dense(64,activation='relu'),
            tf.keras.layers.Dense(64,activation='relu'),
            tf.keras.layers.Dense(10)
        ])
    def call(self,inputs,training=None):
        x = self.pre(inputs)
        x = self.concat(x)
        return self.body(x,training)

trained_model = TrainedModel()
trained_model_output = trained_model(raw_data_batch)
trained_model_output.shape

TensorShape([3, 10])

#trained_model.compile(...)
#trained_model.fit(...)
trained_model.save(tempfile.mkdtemp(),save_format='tf')

INFO:tensorflow:Assets written to: /tmp/tmp5919sllk/assets


INFO:tensorflow:Assets written to: /tmp/tmp5919sllk/assets

起名大废废

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
TFT(tensorflow_transform)常用操作

TFT(tensorflow_transform)是TFX的组成部分，本文包括TFT在集成到流水线组件前，实验预处理的快速操作。
复制链接

扫一扫

专栏目录