Embedding

为什么出现Embeding

        独热码:数据量大 过于稀疏,映射之间是独立的没有表现出相关性 Embedding:是一种单词编码的方法,用低维向量实现了编码,这种编码通过神经网络训练

        优化,能表达出单词间的相关性。

        tf.keras.layers.Embedding( 词汇大小,编码维度 )

        编码维度就是用几个数字表达一个单词

        对1-100进行编码,[4] 编码为[ 0.25,  0.1,  0.11 ]

        例:tf.teras.layers.Embedding( 100, 3 )

import numpy as np
import tensorflow as tf
arr = np.array([[1,4],[2,5],[3,10]])
model_test = tf.keras.models.Sequential()
model_test.add(tf.keras.layers.Embedding(10, 2, input_length=2))
pre = model_test.predict(arr)
print("pre",pre)
print(print.shape)

此时会报一个错误,ps:input_length不能像input_dim和output_dim一样可以省略。

 将10改为11即可。

import numpy as np
import tensorflow as tf
arr = np.array([[1,4],[2,5],[3,10]])
model_test = tf.keras.models.Sequential()
model_test.add(tf.keras.layers.Embedding(11, 2, input_length=2))
pre = model_test.predict(arr)
print("pre",pre)
print(print.shape)

 将初始为(3,2)维度的数据映射成(3,2,2)的数据。

 2. 维度要求

        入Embeding时,x_train维度:

        [送入样本数,循环核时间展开步数]

输入字母预测

import tensorflow as tf
from tensorflow.keras.layers import SimpleRNN, Embedding
import numpy as np
import os
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt

input_word = "abcde"
w_to_id = {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4}

x_train = [w_to_id["a"], w_to_id["b"], w_to_id["c"], w_to_id["d"], w_to_id["e"]]
y_train = [w_to_id["b"], w_to_id["c"], w_to_id["d"], w_to_id["e"], w_to_id["a"]]

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

x_train = np.reshape(x_train, (len(x_train), 1))
y_train = np.array(y_train)

model = tf.keras.models.Sequential([
    Embedding(5, 2),
    SimpleRNN(3),
    Dense(5,activation=tf.nn.softmax)
])

model.compile(optimizer="adam",
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=["sparse_categorical_accuracy"]
              )
checkpoint_save_path = "./checkpoint/embedding/embedding.ckpt"

if os.path.exists(checkpoint_save_path + ".index"):
    model.load_weights((checkpoint_save_path))
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                save_best_only=True,
                                                save_weights_only=True,
                                                 monitor="loss")

history = model.fit(x_train, y_train, batch_size=128, epochs=5, callbacks=[cp_callback])
model.summary()
file = open("./weigths.txt","w")
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

acc = history.history["sparse_categorical_accuracy"]
loss = history.history["loss"]
plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [w_to_id[alphabet1]]
    alphabet = np.reshape(alphabet, (1, 1))
    result = model.predict(alphabet)
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值