cp14_2_Layers_config_numeric_continuou_Feature Column_boosted tree_n_batches_per_layer_repeat_estima

最新推荐文章于 2024-04-29 10:58:08 发布

LIQING LIN

最新推荐文章于 2024-04-29 10:58:08 发布

阅读量1.1k

点赞数 1

分类专栏： pythonMachineLearningInAction

本文链接：https://blog.csdn.net/Linli522362242/article/details/113783554

版权

pythonMachineLearningInAction 专栏收录该内容

101 篇文章 1 订阅

订阅专栏

cp14_TF_v1_v2_TensorSpec_rank_trainable_variables_autodiff_keras_Model_API_set_label_coords_custom

https://blog.csdn.net/Linli522362242/article/details/113710166

Writing custom Keras layers

In cases where we want to define a new layer that is not already supported by Keras, we can define a new class derived from the tf.keras.layers.Layer class. This is especially useful when designing a new layer or customizing an existing layer.

To illustrate the concept of implementing custom layers, let's consider a simple example. Imagine we want to define a new linear layer that computes , where 𝜖 refers to a random variable as a noise variable. To implement this computation, we define a new class as a subclass of tf.keras.layers.Layer. For this new class, we have to define both the constructor __init__() method and the call() method. In the constructor, we define the variables and other required tensors for our customized layer. We have the option to create variables and initialize them in the constructor if the input_shape is given to the constructor. Alternatively, we can delay the variable initialization (for instance, if we do not know the exact input shape upfront) and delegate it to the build() method for late variable creation. In addition, we can define get_config() for serialization, which means that a model using our custom layer can be efficiently saved using TensorFlow's model saving and loading capabilities.

To look at a concrete example, we are going to define a new layer called NoisyLinear, which implements the computation , which was mentioned in the preceding paragraph:

Writing custom Keras layers

Defining a custom layer:

Define __init__()
Define build() for late-variable creation
Define call()
Define get_config() for serialization

class NoisyLinear( tf.keras.layers.Layer ):
    def __init__( self, output_dim, noise_stddev=0.1, **kwargs ):
        self.output_dim = output_dim
        self.noise_stddev = noise_stddev
        super( NoisyLinear, self ).__init__(**kwargs)
    # if we do not know the exact input shape upfront,
    # delegate it to the build() method for late variable creation.    
    def build( self, input_shape ): # input_shape: (none, features)
        # https://www.tensorflow.org/api_docs/python/tf/keras/initializers
        # class random_normal: Initializer that generates tensors with a normal distribution.
        self.w = self.add_weight( name='weights',
                                  shape=( input_shape[1], self.output_dim ),
                                  initializer='random_normal',
                                  trainable=True 
                                )
        # class zeros: Initializer that generates tensors initialized to 0.
        self.b = self.add_weight( shape=( self.output_dim, ),
                                  initializer='zeros',
                                  trainable=True 
                                )
    def call( self, inputs, training=False):
        # call() performs the logic of applying the layer to the input tensors 
        # (which should be passed in as argument). 
        if training:
            batch = tf.shape( inputs )[0]
            dim = tf.shape( inputs )[1]
            noise = tf.random.normal( shape=(batch, dim),
                                      mean = 0.0,
                                      stddev = self.noise_stddev
                                      ) 
            noisy_inputs = tf.add(inputs, noise)
        else:
            noisy_inputs = inputs
        z = tf.matmul( noisy_inputs, self.w ) + self.b
        return tf.keras.activations.relu(z)
    
    def get_config(self):
        # Returns the config of the layer.
        # A layer config is a Python dictionary (serializable) containing the configuration 
        # of a layer. The same layer can be reinstantiated later (without its trained weights) 
        # from this configuration.
        # If the keys differ from the arguments in __init__, 
        # then override from_config(self) as well. This method is used when saving the layer 
        # or a model that contains this layer.
        config = super( NoisyLinear, self ).get_config()
        config.update({'output_dim': self.output_dim,
                       'noise_stddev': self.noise_stddev
                      })
        return config

In the following code, we will define a new instance of this layer, initialize it by calling .build(), and execute it on an input tensor. Then, we will serialize it via .get_config() and restore the serialized object via .from_config():

# testing
tf.random.set_seed(1)

noisy_layer = NoisyLinear(4) # output_dim : 4
noisy_layer.build( input_shape=(None,4) )

x = tf.zeros( shape=(1,4) )
tf.print( noisy_layer(x, training=True) ) # <==after call() return tf.keras.activations.relu(z)

# re-building from config:
config = noisy_layer.get_config()
# config:
# {'name': 'noisy_linear', # default
#  'trainable': True,      # default
#  'dtype': 'float32',     # default
#  'output_dim': 4,
#  'noise_stddev': 0.1}
new_layer = NoisyLinear.from_config( config ) 
tf.print( new_layer(x, training=True) )  # <==after call() return tf.keras.activations.relu(z)

In the previous code snippet, we called the layer two times on the same input tensor. However, note that the outputs differ because the NoisyLinear layer added random noise to the input tensor.

Now, let's create a new model similar to the previous one for solving the XOR classification task. As before, we will use Keras' Sequential class, but this time, we will use our NoisyLinear layer as the first hidden layer of the multilayer perceptron. The code is as follows:

tf.random.set_seed(1)

model = tf.keras.Sequential([
    NoisyLinear(4, noise_stddev=0.1),
    tf.keras.layers.Dense(units=4, activation='relu'),
    tf.keras.layers.Dense(units=4, activation='relu'),
    tf.keras.layers.Dense(units=1, activation='sigmoid')
])
model.build( input_shape=(None,2) )
model.summary()

## compile
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()]
             )
## train:
hist = model.fit(x_train, y_train,
                 validation_data=(x_valid, y_valid),
                 epochs=200, batch_size=2,
                 verbose=0)
## Plotting
history = hist.history
# history.keys() ==> dict_keys(['loss', 'binary_accuracy', 'val_loss', 'val_binary_accuracy'])

fig = plt.figure( figsize=(16,4) )

ax = fig.add_subplot(1,3,1)
plt.plot(history['loss'], lw=4)
plt.plot(history['val_loss'], lw=4)
plt.legend(['Train loss', 'Validation loss'], fontsize=15)
ax.set_xlabel('Epochs', size=15)

ax = fig.add_subplot(1,3,2)
plt.plot(history['binary_accuracy'], lw=4)
plt.plot(history['val_binary_accuracy'], lw=4)
plt.legend(['Train loss', 'Validation loss'], fontsize=15)
ax.set_xlabel('Epochs', size=15)

ax = fig.add_subplot(1,3,3)
plot_decision_regions( X=x_valid, y=y_valid.astype(np.int32),
                       clf=model )
ax.set_xlabel(r'$x_1$', size=15)
ax.xaxis.set_label_coords(1,-0.025)
ax.set_ylabel(r'$x_2$', size=15)
ax.yaxis.set_label_coords(-0.025,1)

plt.show()

The resulting figure will be as follows:

Here, our goal was to learn how to define a new custom layer subclassed from tf.keras.layers.Layer and to use it as we would use any other standard Keras layer. Although, with this particular example, NoisyLinear did not help to improve the performance, please keep in mind that our objective was to mainly learn how to write a customized layer from scratch主要学习如何从头开始编写自定义层. In general, writing a new customized layer can be useful in other applications, for example, if you develop a new algorithm that depends on a new layer beyond the existing ones.

TensorFlow Estimators

So far, in this chapter, we have mostly focused on the low-level TensorFlow API. We used decorators to modify functions to compile the computational graphs explicitly for computational efficiency. Then, we worked with the Keras API and implemented feedforward NNs, to which we added customized layers. In this section, we will switch gears and work with TensorFlow Estimators. The tf.estimator API encapsulates the underlying steps in machine learning tasks, such as training, prediction (inference), and evaluation. Estimators are more encapsulated but also more scalable when compared to the previous approaches that we have covered in this chapter. Also, the tf.estimator API adds support for running models on multiple platforms without requiring major code changes, which makes them more suitable for the so-called "production phase" in industry applications. In addition, TensorFlow comes with a selection of off-the-shelf estimators for common machine learning and deep learning architectures that are useful for comparison studies, for example, to quickly assess whether a certain approach is applicable to a particular dataset or problem.

In the remaining sections of this chapter, you will learn how to use such pre-made Estimators and how to create an Estimator from an existing Keras model. One of the essential elements of Estimators is defining the feature columns as a mechanism for importing data into an Estimator-based model, which we will cover in the next section.

Working with feature columns

In machine learning and deep learning applications, we can encounter various different types of features: continuous, unordered categorical (nominal), and ordered categorical (ordinal). You will recall that in In machine learning and deep learning applications, we can encounter various different types of features: continuous(e.g. house price), unordered categorical (nominal, e.g. t-shirt color as a nominal feature ), and ordered categorical (ordinal, e.g. t-shirt size would be an ordinal feature, because we can define an order XL > L > M). You will recall that in cp4 Training Sets Preprocessing_StringIO_dropna_categorical_feature_Encode_Scale_L1_L2_bbox_to_anchor https://blog.csdn.net/Linli522362242/article/details/108230328, we covered different types of features and learned how to handle each type. Note that while numeric data can be either continuous or discrete, in the context of the TensorFlow API, "numeric" data specifically refers to continuous data of the floating point type.

Sometimes, feature sets are comprised of a mixture of different feature types. While TensorFlow Estimators were designed to handle all these different types of features, we must specify how each feature should be interpreted by the Estimator. For example, consider a scenario with a set of seven different features, as shown in the following figure:

The features shown in the figure (model year, cylinders ['sɪlɪndəz]汽缸, displacement, horsepower, weight, acceleration, and origin) were obtained from the Auto MPG dataset, which is a common machine learning benchmark dataset for predicting the fuel efficiency of a car in miles per gallon (MPG). The full dataset and its description are available from UCI's machine learning repository at https://archive.ics.uci.edu/ml/datasets/auto+mpg.

Attribute Information:

1. mpg 每加仑燃料所行英里数: continuous
2. cylinders 气缸数: multi-valued discrete
3. displacement排量: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration 加速效率: continuous
7. model year: multi-valued discrete
8. origin 生产地: multi-valued discrete
9. car name: string (unique for each instance)

We are going to treat five features from the Auto MPG dataset (number of cylinders, displacement, horsepower, weight, and acceleration) as "numeric" (here, continuous) features. The model year can be regarded as an ordered categorical (ordinal) feature. Lastly, the manufacturing origin can be regarded as an unordered categorical (nominal) feature with three possible discrete values, 1, 2, and 3, which correspond to the US, Europe, and Japan, respectively.

Let's first load the data and apply the necessary preprocessing steps, such as partitioning the dataset into training and test datasets, as well as standardizing the continuous features:
#######################################################
https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/

tf.keras.utils.get_file(
    fname, origin, untar=False, md5_hash=None, file_hash=None,
    cache_subdir='datasets', hash_algorithm='auto',
    extract=False, archive_format='auto', cache_dir=None
)

By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the cache_subdir datasets, and given the filename fname. The final location of a file example.txt would therefore be ~/.keras/datasets/example.txt.

Files in tar, tar.gz, tar.bz, and zip formats can also be extracted. Passing a hash will verify the file after download. The command line programs shasum and sha256sum can compute the hash.

#######################################################

import numpy as np
import tensorflow as tf
import pandas as pd

from IPython.display import Image

dataset_path = tf.keras.utils.get_file("auto-mpg.data",
                                        ('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/'
                                        'auto-mpg.data'
                                        ) )

#######################################################
### open with txt

### open with Notepad++

The upper blue horizontal bar or gray horizontal bar, as long as you click on the keyboard '->' key once to span a '/t'

comment : str, default None, here is '/t'

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment='#', parsing #empty\na,b,c\n1,2,3 with header=0 will result in ‘a,b,c’ being treated as the header.

C:\Users\LlQ\.keras\datasets

#######################################################

column_names = ['MPG', 'Cylinders', 'Displacement', 'Housepower',
                'Weight', 'Acceleration', 'ModelYear', 'Origin'] 
                # 9. car name: string (unique for each instance) # I want to ignore

df = pd.read_csv(dataset_path, names=column_names,
                 na_values='?', comment='\t', #to ignore commente.g. 	"chevrolet vega 2300"
                 sep=" ", # several " "
                 skipinitialspace=True, # Skip spaces after delimiter(here is sep)
                )
df.tail()

print( df.isna().sum() )

df = df.dropna()                 # Remove missing values. e.g. row_index = 32
df[30:35]

<==
###################################################

df = df.reset_index() # default drop=False
df.tail()

if drop=False, df will be inserted a new dataframe column 'index'

###################################################

df = df.reset_index(drop=True)
df.tail()

if drop=True, df will not be inserted a new column

## train/test splits：
import sklearn
import sklearn.model_selection

df_train, df_test = sklearn.model_selection.train_test_split( df, train_size=0.8 )
train_stats = df_train.describe().transpose()
train_stats

==> we need to normalize our data

Numeric Column(TensorFlow Estimators can work with)

# 2. cylinders: multi-valued discrete
# 3. displacement: continuous
# 4. horsepower: continuous
# 5. weight: continuous
# 6. acceleration: continuous

numeric_column_names = ['Cylinders', 'Displacement', 'Housepower', 'Weight', 'Acceleration']

df_train_norm, df_test_norm = df_train.copy(), df_test.copy()
for col_name in numeric_column_names:
    mean = train_stats.loc[col_name, 'mean']
    std = train_stats.loc[col_name, 'std']
    df_train_norm.loc[:, col_name] = (df_train_norm.loc[:, col_name]-mean)/std
    df_test_norm.loc[:, col_name] = (df_test_norm.loc[:, col_name]-mean)/std

df_train_norm.tail()

The pandas DataFrame that we created via the previous code snippet contains five columns with values of the type float. These columns will constitute the continuous features. In the following code, we will use TensorFlow's feature_column function to transform these continuous features into the feature column data structure that TensorFlow Estimators can work with:

numeric_features = []

for col_name in numeric_column_names:
    numeric_features.append(
        tf.feature_column.numeric_column(key=col_name) # we don't need normalizer_fn since we have normalize the data in the previous for loop
    )

numeric_features

####################################################### {'key':[...value...], 'key':[...value...],..., 'key':[...value...]}
13_Loading and Preprocessing Data from multiple CSV with TensorFlow 2_Feature Columns_TF eXtended
https://blog.csdn.net/Linli522362242/article/details/107933572
Feature columns bridge raw data with the data your model needs.

#########################################

dict_train_normal=df_train_norm.to_dict( orient='list' )#{'key':[...value...], 'key':[...],..., 'key':[...]}
# dict_train_normal.keys() ==> dict_keys(['MPG', 'Cylinders', 'Displacement', 'Housepower', 'Weight', 'Acceleration', 'ModelYear', 'Origin'])

from tensorflow.keras import layers

def demo_feature(feature_column):
    # hint: selection based on feature_column(s) or NumericColumn(s)
    feature_layer = layers.DenseFeatures( feature_column ) #A layer that produces a dense Tensor based on given feature_columns.
    # feature_layer(dict_train_normal) produces a tensor
    print( feature_layer(dict_train_normal).numpy()[:5])
    
# numeric_features[0]
# NumericColumn(key='Cylinders', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)    
demo_feature(numeric_features[0]) # 'Cylinders'

<==.numpy() <==

demo_feature(numeric_features)

Acceleration, Cylinders, Displacement, Housepower, Weight

df_train_norm.head()

Bucketized column（分桶列）

Often, you don’t want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. Consider raw data that represents a person’s age. Instead of representing age as a numeric column, we could split the age into several buckets using a bucketized column. Notice the one-hot values below describe which age range each row matches. Buckets include the left boundary, and exclude the right boundary. For example, consider raw data that represents the year a house was built. Instead of representing that year as a scalar numeric column, we could split the year into the following four buckets:

The model will represent the buckets as follows:

Why would you want to split a number — a perfectly valid input to your model — into a categorical value? Well, notice that the categorization splits a single input number into a four-element vector. Therefore, the model now can learn four individual weights rather than just one; four weights creates a richer model than one weight. More importantly, bucketizing enables the model to clearly distinguish between different year categories since only one of the elements is set (1) and the other three elements are cleared (0). For example, when we just use a single number (a year) as input, a linear model can only learn a linear relationship. So, bucketing provides the model with additional flexibility that the model can use to learn模型可以学习更复杂的关系.

######################################################
Next, let's group the rather fine-grained model year information into buckets to simplify the learning task for the model that we are going to train later. Concretely, we are going to assign each car into one of four "year" buckets, as follows:

Note that the chosen intervals were selected arbitrarily[ˌɑrbəˈtrɛrəlɪ]任意地 to illustrate the concepts of "bucketing." In order to group the cars into these buckets, we will first define a numeric feature based on each original model year. Then, these numeric features will be passed to the bucketized_column function for which we will specify three interval cut-off values: [73, 76, 79]. The specified values include the right cut-off value. These cut-off values are used to specify half-closed intervals, for instance, (−∞, 73) , [73, 76) , [76, 79) , and [79, ∞) . The code is as follows:

# df.columns
# Index(['MPG', 'Cylinders', 'Displacement', 'Housepower', 'Weight',
#        'Acceleration', 'ModelYear', 'Origin'],
#        dtype='object')
feature_year = tf.feature_column.numeric_column( key='ModelYear' )

bucketized_features = []
bucketized_features.append( 
    tf.feature_column.bucketized_column(
                                        source_column=feature_year,
                                        boundaries=[73, 76, 79]
    ) )

print(bucketized_features)

dict_train_normal=df_train_norm.to_dict( orient='list' )#{'key':[...value...], 'key':[...],..., 'key':[...]}
# dict_train_normal.keys() ==> dict_keys(['MPG', 'Cylinders', 'Displacement', 'Housepower', 'Weight', 'Acceleration', 'ModelYear', 'Origin'])

def demo_feature(feature_column):
    # A layer that produces a dense Tensor based on given feature_columns.
    feature_layer = layers.DenseFeatures( feature_column )
    print( feature_layer(dict_train_normal).numpy()[:5])

demo_feature(bucketized_features)

[[0. 0. 0. 1.] <==   >=79 <== 79
[0. 0. 0. 1.] <==   >=79 <== 81
[0. 0. 0. 1.] <==   >=79 <== 81
[0. 0. 1. 0.] <== 76 <=year< 79 <== 76
[0. 1. 0. 0.]] <== 73 <=year< 76 <== 73

df_train_norm['ModelYear'][:5]

For consistency, we added this bucketized feature column to a Python list, even though the list consists of only one entry. In the following steps, we will merge this list with the lists made from other features, which will then be provided as input to the TensorFlow Estimator-based model.

Categorical column

Next, we will proceed with defining a list for the unordered categorical feature, Origin. In TensorFlow, there are different ways of creating a categorical feature column.

Categorical vocabulary column
If the data contains the category names (for example, in string format like "US," "Europe," and "Japan"), then we can use tf.feature_column.categorical_column_with_vocabulary_list and provide a list of unique, possible category names as input. Categorical vocabulary columns provide a good way to represent strings as a one-hot vector. For example:

Categorical vocabulary column
If the list of possible categories is too large, for example, in a typical text analysis context, then we can use tf.feature_column.categorical_column_with_vocabulary_file instead. When using this function, we simply provide a file that contains all the categories/words so that we do not have to store a list of all possible words in memory.

# Given input "feature_name_from_input_fn" which is a string,
# create a categorical feature to our model by mapping the input to one of 
# the elements in the vocabulary file
vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_file(
        key="feature_name_from_input_fn",
        vocabulary_file="product_class.txt",
        vocabulary_size=3)

# product_class.txt should have one line for vocabulary element, in our case:
kitchenware
electronics
sports

Categorical identity column
Moreover, if the features are already associated with an index of categories in the range [0, num_categories), then we can use the tf.feature_column.categorical_column_with_identity function. For example, let's say you want to represent the integer range [0, 4). (That is, you want to represent the integers 0, 1, 2, or 3.) In this case, the categorical identity mapping looks like this: https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html

However, in this case, the feature Origin is given as integer values 1, 2, 3 (as opposed to([æz əˈpoʊzd tu]与⋯⋯相对, 而不是) 0, 1, 2), which does not match the requirement for categorical indexing, as it expects the indices to start from 0.

df_train_norm['Origin'][:5]

In the following code example, we will proceed with the vocabulary list:

feature_origin = tf.feature_column.categorical_column_with_vocabulary_list(
    key = 'Origin',
    vocabulary_list = [1,2,3]
)
# feature_origin
# VocabularyListCategoricalColumn(key='Origin', vocabulary_list=(1, 2, 3), 
#                                 dtype=tf.int32, default_value=-1, num_oov_buckets=0)

Certain Estimators, such as DNNClassifier and DNNRegressor, only accept so-called "dense columns." Therefore, the next step is to convert the existing categorical feature column to such a dense column. There are two ways to do this: using an embedding column via embedding_column or an indicator column via indicator_column. An indicator column converts the categorical indices to one-hot encoded vectors, for example, index 0 will be encoded as [1, 0, 0], index 1 will be encoded as [0, 1, 0], and so on. On the other hand, the embedding column maps each index to a vector of random number of the type float, which can be trained.
############################################
Point column as indicator_column https://blog.csdn.net/Linli522362242/article/details/107933572

point = feature_column.categorical_column_with_vocabulary_list(
             ###################
    'point', df['point'].unique() # array(['c', 'f', 'c+', 'b+', 'b', 'a', 'd+'], dtype=object)
)#called credit
point_one_hot = feature_column.indicator_column(point)
demo(point_one_hot)

Point column as embedding_column

# point = feature_column.categorical_column_with_vocabulary_list(
#              ###################
#     'point', df['point'].unique() # array(['c', 'f', 'c+', 'b+', 'b', 'a', 'd+'], dtype=object)
# )#called credit
# Notice the input to the embedding column is the categorical column
# we previously created
point_embedding = feature_column.embedding_column(point, dimension=4)
demo(point_embedding)

Key point: using an embedding column is best when a categorical column has many possible values. We are using one here for demonstration purposes, so you have a complete example you can modify for a different dataset in the future.

Let’s look at an example comparing indicator and embedding columns. Suppose our input examples consist of different words from a limited palette of only 81 words. Further suppose that the data set provides the following input words in 4 separate examples:

“dog”
“spoon”
“scissors”
“guitar”

In that case, the following figure illustrates the processing path for embedding columns or indicator columns.

An embedding column stores categorical data in a lower-dimensional vector than an indicator column. (We just placed random numbers into the embedding vectors; training determines the actual numbers.)

When an example is processed, one of the categorical_column_with… functions maps the example string to a numerical categorical value. For example, a function maps “spoon” to [32]. (The 32 comes from our imagination — the actual values depend on the mapping function.) You may then represent these numerical categorical values in either of the following two ways:

As an indicator column. A function converts each numeric categorical value into an 81-element vector (because our palette consists of 81 words), placing a 1 in the index of the categorical value (0, 32, 79, 80) and a 0 in all the other positions.
As an embedding column. A function uses the numerical categorical values (0, 32, 79, 80) as indices to a lookup table. Each slot in that lookup table contains a 3-element vector.

############################################

When the number of categories is large, using the embedding column with fewer dimensions than the number of categories can improve the performance. In the
following code snippet, we will use the indicator column approach on the categorical feature in order to convert it into the dense format:

categorical_indicator_features = []
# Notice the input to the indecator column is the categorical column
# we previously created
# feature_origin = tf.feature_column.categorical_column_with_vocabulary_list(
#     key = 'Origin',
#    vocabulary_list = [1,2,3]
# )
categorical_indicator_features.append(
    # indicator_column: Represents multi-hot representation of given categorical column.
    tf.feature_column.indicator_column(feature_origin)
)

print(categorical_indicator_features)

demo_feature( categorical_indicator_features )

"US," "Europe," and "Japan"

<==

In this section, we have covered the most common approaches for creating feature columns that can be used with TensorFlow Estimators. However, there are several
additional feature columns that we haven't discussed, including hashed columns and crossed columns. More information about these other feature columns can be found in the official TensorFlow documentation at https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/feature_column OR https://blog.csdn.net/Linli522362242/article/details/107933572.

Machine learning with pre-made Estimators

Now, after constructing the mandatory[ˈmændətɔːri] feature columns, we can finally utilize TensorFlow's Estimators. Using pre-made Estimators can be summarized in four steps:
1. Define an input function for data loading
2. Convert the dataset into feature columns
3. Instantiate an Estimator (use a pre-made Estimator or create a new one, for example, by converting a Keras model into an Estimator)
4. Use the Estimator methods train(), evaluate(), and predict()

1. Define an input function for data loading

Continuing with the Auto MPG example from the previous section, we will apply these four steps to illustrate how we can use Estimators in practice. For the first step, we need to define a function that processes the data and returns a TensorFlow dataset consisting of a tuple that contains the input features and the labels (ground truth MPG values). Note that the features must be in a dictionary format, and the keys of the dictionary must match the feature columns' names.

Starting with the first step, we will define the input function for the training data as follows:
################################################

tf.data.Dataset.from_tensor_slices(dict_train_normal)

################################################

df = df_train.copy()
train_x, train_y = df, df.pop('MPG')#Return item(pandas.core.series.Series) and drop from frame
dataset = tf.data.Dataset.from_tensor_slices( (dict(train_x), train_y) )
dataset

def train_input_fn( df_train, batch_size=8 ):
    df = df_train.copy()
    train_x, train_y = df, df.pop('MPG')# df.pop('MPG'): Return item(pandas.core.series.Series) and drop from frame
    dataset = tf.data.Dataset.from_tensor_slices( (dict(train_x), train_y) )
    
    # shuffle(buffer_size), repeat, and batch the examples
    # buffer_size : determines how many elements in the dataset are grouped together before shuffling
    return dataset.shuffle(1000).repeat().batch(batch_size)

revised: shuffle buffer 0~9 to 0~1000,
repeat(3) to repeat() ###Like having countless buffers, and these buffers are connected in series###,
batch(7) to batch(batch_size)

Notice that we used dict(train_x) in this function to convert the pandas DataFrame object into a Python dictionary. Let's load a batch from this dataset to see how it looks:

# inspection
ds = train_input_fn( df_train_norm )
batch = next( iter(ds) )
print('Keys:', batch[0].keys() )
print('Batch Model Years:', batch[0]['ModelYear'])
print('MPG: ', batch[1])

We also need to define an input function for the test dataset that will be used for evaluation after model training:

def eval_input_fn( df_test, batch_size=8 ):
    df = df_test.copy()
    test_x, test_y = df, df.pop('MPG')
    dataset = tf.data.Dataset.from_tensor_slices( (dict(test_x), test_y) )
    return dataset.batch( batch_size ) # without shuffle(1000).repeat()

# inspection
test_ds = eval_input_fn( df_test_norm )
batch = next( iter(test_ds) )
print('Keys:', batch[0].keys() )
print('Batch Model Years:', batch[0]['ModelYear'])
print('MPG: ', batch[1])

2. Convert the dataset into feature columns

Now, moving on to step 2, we need to define the feature columns. We have already defined a list containing the continuous features, a list for the bucketized feature
column, and a list for the categorical feature column. We can now concatenate these individual lists to a single list containing all feature columns:

all_feature_columns = ( numeric_features # 'Cylinders','Displacement','Horsepower','Weight','Acceleration'
                       + bucketized_features # 'ModelYear'
                       + categorical_indicator_features # 'Origin'
                      )
print( all_feature_columns )

3. Instantiate an Estimator (use a pre-made Estimator or create a new one, for example, by converting a Keras model into an Estimator)

For step 3, we need to instantiate a new Estimator. Since predicting MPG values is a typical regression problem, we will use tf.estimator.DNNRegressor. When
instantiating the regression Estimator, we will provide the list of feature columns and specify the number of hidden units that we want to have in each hidden layer using the argument hidden_units. Here, we will use two hidden layers, where the first hidden layer has 32 units and the second hidden layer has 10 units:

################################################

tf.estimator.DNNRegressor(
    hidden_units, feature_columns, model_dir=None, label_dimension=1,
    weight_column=None, optimizer='Adagrad', activation_fn=tf.nn.relu,
    dropout=None, config=None, warm_start_from=None,
    loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, batch_norm=False
)

################################################

regressor = tf.estimator.DNNRegressor(
    feature_columns = all_feature_columns,
    hidden_units=[32,10],
    model_dir = 'model/autompg-dnnregressor/'
)

The other argument, model_dir, that we have provided specifies the directory for saving model parameters. One of the advantages of Estimators is that they automatically checkpoint the model during training, so that in case the training of the model crashes for an unexpected reason (like power failure), we can easily load the last saved checkpoint and continue training from there. The checkpoints will also be saved in the directory specified by model_dir. If we do not specify the model_dir argument, the Estimator will create a random temporary folder (for example, in the Linux operating system, a random folder in the /tmp/ directory will be created), which will be used for this purpose.

After these three basic setup steps, we can finally use the Estimator for training, evaluation, and, eventually, prediction. The regressor can be trained by calling the train() method, for which we require the previously defined input function:

EPOCHS = 1000
BATCH_SIZE = 8         # number of steps in each epoch 
total_steps = EPOCHS * int( np.ceil( len(df_train)/BATCH_SIZE) )
print('Training Steps:', total_steps)

Feature columns bridge raw data with the data your model needs.

regressor.train(
    input_fn = lambda:train_input_fn( df_train_norm, batch_size=BATCH_SIZE),
    steps = total_steps
)

... ...

... ...

Calling .train() will automatically save the checkpoints during the training of the model.

###################################

1）元数据图（meta graph）：

它保存了tensorflow完整的网络图结构。这个文件以 *.meta为拓展名

2）检查点文件（checkpoint file）

这是一个二进制文件，它包含权重变量，biases变量和其他变量。这个文件以 *.ckpt 为拓展名； PS：从 0.11版本之后就不是单单一个 .ckpt文件，除此之外还有一个 .index文件,如下例所示：

1.mymodel.data-00000-of-00001

2.mymodel.index

3.checkpoint

其中 .data文件是包含训练变量的文件(保存了TensorFlow程序中每一个变量的取值)；
.index是描述variable中key和value的对应关系(保存了每一个变量的名称, e.g. learning rate, bias and so on，是一个string-string的table，其中table的key值为tensor名，value值为BundleEntryProto，每个BundleEntryProto表述了tensor的metadata（用于解释或帮助理解信息的数据）)；
checkpoint文件checkpoint文件是个文本文件(它保存了一个目录下所有的模型文件列表)，里面记录了保存的最新的checkpoint文件以及其它checkpoint文件列表。

在加载模型时，需要加载两个东西：图结构和变量值。加载图结构可以手动重新搭建网络，也可以直接加载.meta文件
###################################
We can then reload the last checkpoint:

model_dir : Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model.

warm_start_from : A string filepath to a checkpoint to warm-start from, or a WarmStartSettings object to fully configure warm-starting. If the string filepath is provided instead of a WarmStartSettings, then all weights are warm-started, and it is assumed that vocabularies and Tensor names are unchanged.

reloaded_regressor = tf.estimator.DNNRegressor(
    feature_columns=all_feature_columns,
    hidden_units = [32,10],
    warm_start_from='model/autompg-dnnregressor/',
    model_dir = 'model/autompg-dnnregressor/'
)

Then, in order to evaluate the predictive performance of the trained model, we can use the evaluate() method, as follows:

# def eval_input_fn( df_test, batch_size=8 ):
#     df = df_test.copy()
#     test_x, test_y = df, df.pop('MPG')
#     dataset = tf.data.Dataset.from_tensor_slices( (dict(test_x), test_y) )
#
#     return dataset.batch( batch_size ) # without shuffle(1000).repeat()

eval_results = reloaded_regressor.evaluate(
    input_fn = lambda: eval_input_fn( df_test_norm, batch_size=8)
)

for key in eval_results:
    print( '{:15s} {}'.format(key, eval_results[key]) )

print('Average-Loss {:.4f}'.format(eval_results['average_loss']))

Finally, to predict the target values on new data points, we can use the predict() method. For the purposes of this example, suppose that the test dataset represents
a dataset of new, unlabeled data points in a real-world application.

Note that in a real-world prediction task, the input function will only need to return a dataset consisting of features, assuming that the labels are not available. Here, we will simply use the same input function that we used for evaluation to get the predictions for each example:

def eval_input_fn( df_test, batch_size=8 ):
    df = df_test.copy()
    test_x, test_y = df, df.pop('MPG')
    print('Label:', test_y)
    dataset = tf.data.Dataset.from_tensor_slices( (dict(test_x), test_y) )

    return dataset.batch( batch_size ) # without shuffle(1000).repeat()

pred_res = regressor.predict( 
                input_fn=lambda: eval_input_fn( df_test_norm, batch_size=8)
           )
print(next(iter(pred_res)))

~~ 34.0

print(next(iter(pred_res)))

~~ 30.0

While the preceding code snippets conclude the illustration of the four steps that are required for using pre-made Estimators, for practice, let's take a look at another premade Estimator: the boosted tree regressor, tf.estimator.BoostedTreeRegressor. Since, the input functions and the feature columns are already built, we just need to repeat steps 3 and 4. For step 3, we will create an instance of BoostedTreeRegressor and configure it to have 200 trees.
#############################################################################################

Decision tree boosting

We already covered the ensemble algorithms, including boosting, in 07_Ensemble Learning and Random Forests_Bagging_Out-of-Bag_Random Forests_Extra-Trees极端随机树_Boosting https://blog.csdn.net/Linli522362242/article/details/104771157, 07_Ensemble Learning and Random Forests_02_AdaBoost_Gradient Boosting_XGBoost https://blog.csdn.net/Linli522362242/article/details/105046444, cp7_SelectModel_Ensemble Learning_MajorityVoteClassifier_weight_logistic_get_params_bagging_transAxe https://blog.csdn.net/Linli522362242/article/details/109725186. The boosted tree algorithm is a special family of boosting algorithms that is based on the optimization of an arbitrary loss function. Feel free to visit https://medium.com/mlreview/gradientboosting-from-scratch-1e317ae4587d to learn more.
###################################07_Ensemble Learning and Random Forests_02_AdaBoost_Gradient Boosting_XGBoost https://blog.csdn.net/Linli522362242/article/details/105046444
The objective of any supervised learning algorithm is to define a loss function and minimize it. Let’s see how maths work out for Gradient Boosting algorithm. Say we have mean squared error (MSE) as loss defined as:

We want our predictions, such that our loss function (MSE) is minimum. By using gradient descent( Note, we usually use gradient descent to update each sample feature coefficents or feature weights then , But here is different, after we got the predicted value, we use gradient descent method to adjust the learning rate (also called step size) according to the following formula for making all predicted values are sufficiently close to actual values https://www.cnblogs.com/massquantity/p/9174746.html) and updating our predictions based on a learning rate, we can find the values where MSE is minimum.
OR ==>
Note: the residual errors made by the previous predictor(e.g. classifiers)
So, we are basically updating the predictions such that the sum of our residuals is close to 0 (or minimum) and predicted values are sufficiently close to actual values.
###################################

一个算法，学习的目标就是让样本预测值与真实值之间的差异最小，学习的过程就是不断调整模型参数，以缩小这个差异，直到缩小到足够小。

样本预测值与其真实值之间的差异，总得有一个数学公式来表示，这个数学公式就叫做损失函数，损失函数的值就是样本预测值与真实值之间的差异

综上，算法的学习过程，就是缩小样本预测值与真实值之间差异的过程，就是让损失函数极小化的过程。

由于boosting算法是一种加法模型，所以它的优化方法使用的也是：前向分布算法。

boosted tree model

boosted tree algorithm

如果基函数（当前决策树，它能使得损失函数最小）再乘以一个学习速率（又称基函数的系数），那么提升树公式就是

#############################################################################################

Boosted Tree Regressor

tf.estimator.BoostedTreesRegressor(
    feature_columns, n_batches_per_layer, model_dir=None, label_dimension=1,
    weight_column=None, n_trees=100, max_depth=6, learning_rate=0.1,
    l1_regularization=0.0, l2_regularization=0.0, tree_complexity=0.0,
    min_node_weight=0.0, config=None, center_bias=False,
    pruning_mode='none', quantile_sketch_epsilon=0.01,
    train_in_memory=False
)

n_batches_per_layer : the number of batches to collect statistics( def grow_tree(self, stats_summaries_list, last_layer_nodes_range, split_types_list)

stats_summaries_list: List of stats summary tensors, representing sums of gradients and hessians for each feature bucket.
last_layer_nodes_range: A tensor representing ids of the nodes in the current layer, to be split.
split_types_list: a list of lists indicating feature split type

) per layer. Each layer is built after at least n_batches_per_layer accumulations. The total number of batches is total number of data divided by batch size. https://github.com/tensorflow/estimator/blob/master/tensorflow_estimator/python/estimator/canned/boosted_trees.py#L2106-L2253

################################################

_model_fn ==> _bt_model_fn ==>class _AccumulatorEnsembleGrower(..., self._n_batches_per_layer, ...) following codes ==>_EnsembleGrower grower.grow_tree(stats_summaries_list, last_layer_nodes_range, split_types_list)
boosted_trees_ops:boosted_trees_ops.calculate_best_feature_split_v2 ==>Calculates gains for each feature and returns the best possible split information for each node. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/boosted_trees_ops.py

def accumulate_quantiles(self, float_features, weights, are_boundaries_ready):
    summary_op = self._quantile_accumulator.add_summaries(
        float_features, weights)
    cond_accum = _accumulator(
        dtype=tf.dtypes.float32, shape={}, shared_name='quantile_summary_accum')########################
    cond_accum_step = cond_accum.set_global_step(self._stamp_token)
    apply_grad = cond_accum.apply_grad(tf.constant(0.), self._stamp_token)
    update_quantile_op = tf.group(summary_op, cond_accum_step, apply_grad)
    if not self._is_chief:
      return update_quantile_op

    with tf.control_dependencies([update_quantile_op]):

      def flush_fn():
        grad = cond_accum.take_grad(1)########################
        flush_op = self._quantile_accumulator.flush()
        boundaries_ready_op = are_boundaries_ready.assign(True).op
        return tf.group(flush_op, grad, boundaries_ready_op)

      finalize_quantile_op = _cond(
          tf.math.greater_equal(cond_accum.num_accumulated(),
                                self._n_batches_per_layer),########################
          flush_fn,########################
          tf.no_op,
          name='wait_until_quaniles_accumulated')
    return finalize_quantile_op

################################################

# Need to see a large portion of the data before we can build a layer, for example half of data n_batches_per_layer = 0.5 * NUM_EXAMPLES / BATCH_SIZE
# Also note that it is usually beneficial to set some regularization, for example, l2. A good default value is 1./number of examples per layer OR 1./(n_batches_per_layer*batch_size).

Why I need to set n_batches_per_layer?
since: dataset.shuffle(1000).repeat().batch(batch_size), I don't know NUM_EXAMPLES

NUM_EXAMPLES = 20*8=160 > 0.5*df_train_norm.shape[0]=0.5*313=156.5

In other words, the number of training samples is 160 (repetitions exist)################################

boosted_tree = tf.estimator.BoostedTreesRegressor(
    feature_columns=all_feature_columns,
    n_batches_per_layer=20,
    n_trees=200
)

boosted_tree.train(
    input_fn = lambda:train_input_fn(df_train_norm, batch_size=BATCH_SIZE) # BATCH_SIZE = 8
)
eval_results = boosted_tree.evaluate(
    input_fn = lambda:eval_input_fn(df_test_norm, batch_size=8)
)

print(eval_results)

print('Average-Loss {:.4f}'.format(eval_results['average_loss']))

... ... Note : '_log_step_count_steps': 100
... ...

eval_results start from Label: 376 34.0 ... ...

As you can see, the boosted tree regressor achieves lower average loss than the DNNRegressor. For a small dataset（160） like this, this is expected.

In this section, we covered the essential steps for using TensorFlow's Estimators for regression. In the next subsection, we will take a look at a typical classification example using Estimators

Using Estimators for MNIST handwritten digit classification

For this classification problem, we are going to use the DNNClassifier Estimator provided by TensorFlow, which lets us implement a multilayer perceptron very
conveniently. In the previous section, we covered the four essential steps for using the pre-made Estimators in detail, which we will need to repeat in this section. First, we are going to import the tensorflow_datasets (tfds) submodule, which we can use to load the MNIST dataset and specify the hyperparameters of the model.

#######################################################
Estimator API and graph issues

Since parts of TensorFlow 2.0 are still a bit rough around the edges, you may encounter the following issue when executing the next code block: RuntimeError: Graph is finalized and cannot be modified. Currently, there is no good solution for this issue, and a suggested workaround is to restart your Python, IPython, or Jupyter Notebook session before executing the next code block.
#######################################################

The setup step includes loading the dataset and specifying hyperparameters (BUFFER_SIZE for shuffling the dataset, BATCH_SIZE for the size of mini-batches, and the number of training epochs):

BUFFER_SIZE = 10000
BATCH_SIZE = 64
NUM_EPOCHS = 20
steps_per_epoch = np.ceil( 60000/BATCH_SIZE)
steps_per_epoch

Note that steps_per_epoch determines the number of iterations in each epoch, which is needed for infinitely repeated datasets (as discussed in cp13_Parallelizing NN Training w TF_printoptions(precision)_squeeze_shuffle_batch_repeat_image process_map https://blog.csdn.net/Linli522362242/article/details/112386820 ( .repeat( count=None ) # if count is None or -1 that is for the dataset be repeated indefinitely.). Next, we will define a helper function that will preprocess the input image and its label.

mnist, mnist_info = tfds.load('mnist', with_info=True,
                              shuffle_files=False)
print(mnist_info)

TypeError: Could not build a TypeSpec for ['C:\\Users\\LlQ\\tensorflow_datasets\\mnist\\3.0.1\\mnist-test.tfrecord-00000-of-00001'] with type list
... ...

RuntimeError: Graph is finalized and cannot be modified

Solution:

mnist['train']

Since the input image is originally of the type 'uint8' (in the range [0, 255], 2^8=256), we will use tf.image.convert_image_dtype() to convert its type to tf.float32 (and thereby, within the range [0, 1], Images that are represented using floating point values are expected to have values in the range [0,1). Image data stored in integer data types are expected to have values in the range [0,MAX], where MAX is the largest positive representable number for the data type.) # for using tf.keras.activations, input value should be tf.float16, tf.float32, tf.float64):
# tf.image.convert_image_dtype(img_data, tf.float32)
# equal to image_float=tf.cast(img_data, tf.float32)/255

# labels from 1D[...] to 2D [[...]]

def preprocess(item): #item is a dict
    image = item['image'] # get an image data without 'image'
    label = item['label'] # get an label value without 'label'
    # tf.image.convert_image_dtype 
    # image_float=tf.cast(img_data, tf.float32)/255
    image = tf.image.convert_image_dtype(image, tf.float32) #  'uint8' ==> tf.float32
    image = tf.reshape(image, (-1,)) # to one column
    return {'image-pixels':image}, label[..., tf.newaxis] #label[..., tf.newaxis] since we need 2D

'image-pixels' since feature_column.numeric_column(key='?') need a key

Step 1: Define two input functions (one for training and one for evaluation):

(1) shuffle(10000),
(2) batch, and (3) repeat (Suggested): if data is not enough, then (1) shuffle, (2) batch, and (3) repeat ...

# Step 1: Define the input function for training
def train_input_fn():
    datasets = tfds.load( name='mnist' )
    mnist_train = datasets['train']
    
    dataset = mnist_train.map( preprocess )
    # The .shuffle() method requires an argument called buffer_size, 
    # which determines how many elements in the dataset are grouped together before shuffling.
    # The elements in the buffer are randomly retrieved and their place 
    # in the buffer is given to the next elements in the original (unshuffled) dataset.
    dataset = dataset.shuffle(BUFFER_SIZE)#small buffer_size,may not shuffle the dataset perfectly
    dataset = dataset.batch(BATCH_SIZE)
    return dataset.repeat()#https://blog.csdn.net/Linli522362242/article/details/112386820

#         Define the input function for evaluation
def eval_input_fn():
    datasets = tfds.load(name='mnist')
    mnist_test = datasets['test']
    dataset = mnist_test.map(preprocess).batch(BATCH_SIZE) # without shuffle for evaluation
    return dataset

Notice that the dictionary of features has only one key, 'image-pixels'. We will use this key in the next step.

Step 2: Define the feature columns:

# Step 2: feature column
image_feature_column = tf.feature_column.numeric_column(
    key = 'image-pixels', shape=(28*28) # each image_data is 28x28, after reshape(-1,) ==>784
)

Note that here, we defined the feature columns of size 784 (that is, 28 × 28 ), which is the size of the input MNIST images after they are flattened( tf.reshape(image, (-1,)) ).

Step 3: Create a new Estimator. Here, we specify two hidden layers: 32 units in the first hidden layer and 16 units in the second.

We also specify the number of classes (remember that MNIST consists of 10 different digits, 0-9) using the argument n_classes:
# default activation_fn=tf.nn.relu # for using tf.keras.activations, input value should be tf.float16, tf.float32, tf.float64):

# Step 3: instantiate the pre-made Estimator
dnn_classifier = tf.estimator.DNNClassifier(
    feature_columns=[image_feature_column],
    hidden_units=[32,16],
    n_classes=10, # set(labels) ==10
    model_dir = 'model/mnist-dnn/'
    # default activation_fn=tf.nn.relu
)

Step 4: Use the Estimator for training, evaluation, and prediction:

# Step 4: train
dnn_classifier.train(
    input_fn = train_input_fn,# tfds dataset
    steps = NUM_EPOCHS * steps_per_epoch #NUM_EPOCHS=20 #steps_per_epoch=np.ceil( 60000/BATCH_SIZE)
)

...

18760 = 20*983 = NUM_EPOCHS * steps_per_epoch

eval_result = dnn_classifier.evaluate(
    input_fn = eval_input_fn
)
print( eval_result )

So far, you have learned how to use pre-made Estimators and apply them for preliminary assessment to see, for example, whether an existing model is suitable for a particular problem. Besides using pre-made Estimators, we can also create an Estimator by converting a Keras model to an Estimator, which we will do in the next subsection.

Input of train and evaluate should have following features, otherwise there will be a KeyError:

if weight_column is not None, a feature with key=weight_column whose value is a Tensor.
for each column in feature_columns:
- if column is a CategoricalColumn, a feature with key=column.name whose value is a SparseTensor.
- if column is a WeightedCategoricalColumn, two features: the first with key the id column name, the second with key the weight column name. Both features' value must be a SparseTensor.
- if column is a DenseColumn, a feature with key=column.name whose value is a Tensor.

https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html

Creating a custom Estimator from an existing Keras model

Converting a Keras model to an Estimator is useful in academia as well as industry for cases where you have developed a model and want to publish it or share the model with other members in your organization. Such a conversion allows us to access the strengths of Estimators, such as distributed training and automatic checkpointing. In addition, it will make it easy for others to use this model, and particularly to avoid confusions in interpreting the input features by specifying the feature columns and the input function.

To learn how we can create our own Estimator from a Keras model, we will work with the previous XOR problem. First, we will regenerate the data and split it into training and validation datasets:

# Set random seeds for reproducibilitabsy
tf.random.set_seed(1)
np.random.seed(1)

# Create the data
x = np.random.uniform(low=-1, high=1, size=(200,2))
y = np.ones(len(x))
y[ x[:,0]*x[:,1]<0 ]=0

x_train = x[:100, :]
y_train = y[:100]

x_valid = x[100:, :]
y_valid = y[100:]

Next, we will go through the four steps that we described in the previous subsection. Steps 1, 2, and 4 will be the same as the ones we used with the pre-made estimators.
Note that the key name for the input features that we use in steps 1 and 2 must match with what we defined in the input layer of our model. The code is as follows:

# Step 1: Define the input functions
def train_input_fn(x_train, y_train, batch_size=8):
    dataset = tf.data.Dataset.from_tensor_slices(
        {'input-features':x_train}, # 'input-features' for feature_column # values is a list
        y_train.reshape(-1,1) #one column
    )
    # Shuffle, repeat, and batch the examples
    return dataset.shuffle(100).repeat().batch(batch_size)

def eval_input_fn(x_test, y_test=None, batch_size=8):
    if y_test is None: # prediction
        dataset = tf.data.Dataset.from_tensor_slices({'input-features':x_test}) # converted to tensor
    else:              # evaluation
        dataset = tf.data.Dataset.from_tensor_slices( {'input-features':x_test}, 
                                                      y_test.reshape(-1,1)
                                                    )
    return dataset.batch(batch_size)

# Step 2: Define the feature columns
features = [
    tf.feature_column.numeric_column( # tensor to feature_column
        key = 'input-features:', shape=(2,)
    )
]
features

Let's also build a Keras model that we want to convert to an Estimator later. We will define the model using the Sequential class as before. This time, we will also add an input layer defined as tf.keras.layers.Input to give a name to the input to this model:

# Step 3: Create the estimator: convert from a Keras model
model = tf.keras.Sequential([
    tf.keras.layers.Input( shape=(2,), name='input-features'),
    tf.keras.layers.Dense( units=4, activation='relu'),
    tf.keras.layers.Dense( units=4, activation='relu'),
    tf.keras.layers.Dense( units=4, activation='relu'),
    tf.keras.layers.Dense( 1, activation='sigmoid')
])

model.summary()

For step 3, we will convert the model to an Estimator using tf.keras.estimator.model_to_estimator instead of instantiating one of the pre-made Estimators(e.g. model = MyModel() OR dnn_classifier = tf.estimator.DNNClassifier(...) ). Before converting the model, we first need to compile it:

model.compile( optimizer=tf.keras.optimizers.SGD(),
               loss = tf.keras.losses.BinaryCrossentropy(),
               metrics=[tf.keras.metrics.BinaryAccuracy()]
             )
my_estimator = tf.keras.estimator.model_to_estimator(
    keras_model=model,
    model_dir='model/estimator-for-XOR/'
)

C:\Users\LlQ\0Python Machine Learning\model\estimator-for-XOR\keras

Finally, in step 4, we can train our model using the Estimator and evaluate it on the validation dataset:

# Step 4: use the estimator: train/evaluate/predict

num_epochs = 200
batch_size = 2
steps_per_epoch = np.ceil( len(x_train)/batch_size )

my_estimator.train(
    input_fn = lambda: train_input_fn(x_train, y_train, batch_size),
    steps=num_epochs*steps_per_epoch
)

...

my_estimator.evaluate(
    input_fn = lambda: eval_input_fn(x_valid, y_valid, batch_size)
)

...

As you can see, converting a Keras model to an Estimator is very straightforward. Doing this allows us to easily benefit from the various Estimator strengths, such as distributed training and automatically saving the checkpoints during training.

Summary

In this chapter, we covered TensorFlow's most essential and useful features. We started by discussing the migration from TensorFlow v1.x to v2https://blog.csdn.net/Linli522362242/article/details/113710166. In particular, we used TensorFlow's dynamic computation graph approach, the so-called eager execution mode, which makes implementing computations more convenient compared to using static graphs. We also covered the semantics of defining TensorFlow Variable objects as model parameters, annotating Python functions using the tf.function decorator to improve computational efficiency via graph compilation.

After we considered the concept of computing partial derivatives and gradients of arbitrary functions, we covered the Keras API in more detail. It provides us with a user-friendly interface for building more complex deep NN models. Finally, we utilized TensorFlow's tf.estimator API to provide a consistent interface that is typically preferred in production environments. We concluded this chapter by converting a Keras model into a custom Estimator.

Now that we have covered the core mechanics of TensorFlow, the next chapter will introduce the concept behind convolutional neural network (CNN) architectures for deep learning. CNNs are powerful models and have shown great performance in the field of computer vision.