人工智能-深度学习之循环神经网络-CSDN博客

本文链接：https://blog.csdn.net/m0_51678367/article/details/147646087

循环神经网络

任务
序列模型
循环神经网络
RNN常见结构
普通RNN结构缺陷
长短期记忆网络（LSTM）
双向循环神经网络BRNN
深层循环神经网络（DRNN）
实战准备
总结

任务

任务：自动撰写文章
提供标题：what is flare
AI生成文章：“A flare, also sometimes called a fusee, is a type of pyrotechnic that produces a bright light or intense heat without an explosion. Flares are used for distress signaling, illumination, or defensive countermeasures in civilian and military applications … (数据来源：http://ai-writer.com/)”
任务：自动寻找语句中的人名
1、The courses are taught by Flare Zhao and David Chen.
2、How long does a flare last in the air?
基于文本内容及其前后信息进行预测
任务：物体位置预测
T=10s的时候，球在什么位置？
在这里插入图片描述
基于目标不同时刻状态进行预测
任务：股价预测
通过历史股价，预测次日股票价格

基于数据历史信息进行预测

序列模型

输入或者输出中包含有序列的数据的模型
突出数据的前后序列关系
两大特点：
1、输入（输出）元素之间是具有顺序关系。不同的顺序，得到的结果应该是不同的，不如”不吃饭“和”吃饭不“这两个短语的意思是不同的
2、输入输出不定长。比如文章生成、聊天机器人
在这里插入图片描述

循环神经网络

在这里插入图片描述
RNN：前部序列的信息经处理后，作为输入信息传递到后部序列.
数学公式：

词汇数值化：
建立一个词汇-数值一一对应的字典，然后把输入词汇转化为数值矩阵

模型通过loss值进行更新

RNN常见结构

结构1：多输入对应多输出、维度相同RNN结构

在这里插入图片描述
输入：x1,x2,…,xi
输出：y1,y2,…,yi
应用：特定信息识别

结构2：多输入单输出RNN结构

在这里插入图片描述
输入：x1,x2,…,xi
输出：y
应用：情感识别
举例：输入I feel happy watching the movie，输出angry

单输入多输出RNN结构

在这里插入图片描述
输入：x
输出：y1,y2,…,yi
应用：序列数据生成器
举例：文章生成，音乐生成

多输入多输出RNN结构

在这里插入图片描述
输入：x1,x2,…,xi
输出：y1,y2,…,yi
应用：语言翻译
what is artificial intelligence? --> 什么是人工智能？

普通RNN结构缺陷

1、前部序列信息在传递到后部的同时，信息权重下降，导致重要信息丢失
2、求解过程中梯度消失
在这里插入图片描述
需要提高前部特定信息的决策权重

长短期记忆网络（LSTM）

在这里插入图片描述
通过ai传递前部序列信息，距离越远信息丢失越多

增加记忆细胞ci，可以传递前部远处部位信息

总结（简化理解）：
相比ai，记忆细胞ci重点记录前部序列重要信息，且在传递过程中信息丢失少
在这里插入图片描述
忘记门：选择性丢弃ai-1与xi中不重要的信息
更新门：确定给记忆细胞添加哪些信息
输出门：筛选需要输出的信息

1、在网络结构很深（很多层）的情况下，也能保留重要信息
2、解决了普通RNN求解过程中的梯度消失问题

双向循环神经网络BRNN

判断flare是否为人名：
1、The courses are taught by Flare Zhao and David Chen.
2、Hong long does a flare last in the air?
做判断时，把后部序列信息也考虑
在这里插入图片描述

深层循环神经网络（DRNN）

解决更复杂的序列任务，可以把单层RNN叠起来或者在输出前和普通mlp结构结合使用
在这里插入图片描述

实战准备

实战（1）

任务：基于zgpa_train.csv数据，建立RNN模型，预测股价
1、完成数据预处理，将序列数据转化为可用于RNN输入的数据
2、对新数据zgpa_test.csv进行预测，可视化结果
3、存储预测结果，并观察局部预测结果
在这里插入图片描述
模型结构：单层RNN，输出有5个神经元
每次使用前八个数据预测第九个数据

#提取序列数据：
def extract_data(data,slide):
	x=[]
	y=[]
	for i in range(len(data)-slide):
		x.append([a for a in data[i:i+slide]])
		#或x.append(data[i:i+slide])
		y.append(data[i+slide])
	x = np.array(x)
	x = x.reshape(x.reshape[0],x.shape[1],1)
	return x, y

其中代码:

	x = np.array(x)
	x = x.reshape(x.reshape[0],x.shape[1],1)

的意思如下：
在这里插入图片描述

#建立普通RNN模型：
from keras.models import Sequential
from keras.layers import Densen,SimpleRNN
model = Sequential()
#增加一个RNN层：
model.add(SimpleRNN(units=5,input_shape=(X.shape[1],X.shape[2]),activation='relu')) #samples默认自动计算
#增加输出层
model.add(Dense(unit=1,activation='linear'))
model.compile(optimizer='adam',loss='mean_squared_error')

input_shape=(sample,time_steps,features)
samples:样本数量（模型根据输入数据自动计算）
time_steps:序列的长度，即用多少个连续样本预测一个输出
features:样本的特征维数([0,0,1]对应3)

假设股票数据样本有100个，每次用8跳数据预测第九条，股票数据为单维度数值，要求输入数据的shape为(100,8,1)

完整代码：

import pandas as pd
import numpy as np
data = pd.read_csv('zgpa_train.csv')
data.head()
price = data.loc[:,'close'] #读入收盘价
price.head()
#归一化处理
price_norm = price/max(price)
print(price_norm)

%matplotlib inline
from matplotlib import pyplot as plt
fig1 = plt.figure(figsize=(8,5))
plt.plot(price)
plt.title('close price')
plt.xlabel('time')
plt.ylabel('price')
plt.show()

#define the X and y
#define methond to extract X and y
def extract_data(data,time_step):
	x = []
	y = []
	#0,1,2,3,...,9:10个样本:time_step=8;0,1,...,7;1,2,...,8;2,3,...,9三组（两组样本）
	for i in range(len(data)-time_step):
		X.append([a for a in data[i:i+time_step]])
		y.append(data[i+time_step])
	X = np.array(X)
	X = X.reshape(X.shape[0],X.shape[1],1)
	return X, y 
time_step = 8
#define X and y 
X, y = extract_data(price_norm,time_step)
##print(X)
##print(X.shape)
##print(X[0,:,:])
##print(y)

#set up the moedl
from keras.models import Sequential
from keras.layers import Dense, SimplieRNN
model = Sequential()
#add RNN layer
model.add(SimpleRNN(units=5,input_shape=(time_step,1),activation='relu'))
#add output layer
model.add(Dense(units=1,activation='linear'))
#configure the model
model.compile(optimizer='adam',loss='mean_squard_error')
model.summary()

#train the model
model.fit(X,y,batch_size=30,epochs=200)
#make prediction based on training data
y_train_predict = model.predict(X)*max(price)
y_train = [y*max(price) for i in y]
##print(y_train_predict, y_train)
fig2 = plt.figure(figsize=(8,5))
plt.plot(y_train,label='real price')
plt.plot(y_train_predict,label='predict price')
plt.title('close price')
plt.xlabel('time')
plt.ylabel('price')
plt.show()
#load test data
data_test = pd.read_csv('zgpa_test.csv')
data_test.head()
price_test = data_test.loc[:,'close']
pirce_test.head()
price_test_norm = price_test/max(price)
#extract X_test and y_test 取出X和y
X_test_norm, y_test_norm = extract_data(price_test_norm,time_step)
print(X_test_norm.shape,len(y_test_norm))

#make prediction based on the test data
y_test_predict = model.predict(X_test_norm)*max(price)
y_test = [i*max(price) for i in y_test_norm]
fig3 = plt.figure(figsize=(8,5))
plt.plot(y_test,label='real price_test')
plt.plot(y_test_predict,label='predict price_test')
plt.title('close price')
plt.xlabel('time')
plt.ylabel('price')
plt.show()

#预测数据的存储
result_y_test = np.array(y_test).reshape(-1,1)
result_y_test_predict = y_test_predict
##print(result_y_test .shape,result_y_test_predict.shape)
result = np.concate(result_y_test,result_y_test_predict,axis=1)
##print(result.shape)
result = pd.DataFrame(result,columns=['real_price_test','predict_price_test'])
result.to_csv('zgpa_predict_test.csv')

实战（2）：LSTM自动生成文本

任务：基于flare文本数据，建立LSTM模型，预测序列文字
1、完成数据预处理，将文字序列数据转化为可用于LSTM输入的数据
2、查看文字数据预处理后的数据结构，并进行数据分离操作
3、针对字符串输入（“flare is a teacher in ai industry.He obtained his phd in Australia.”),预测其对应的后续字符
模型结构：
单层LSTM，输出有20个神经元
每次使用前20个字符预测第21个字符
在这里插入图片描述

#文本加载
raw data = open('flare').read()
#字符字典建立
#字符去重
letters = list(set(data))
#建立数字到字符的索引字典
int_to_char = {a:b for a,b in enumerate(letters)}
#建立字符到数字的索引字典
char_to_int = {b:a for a,b in enumerate(letters)}

注:
在这里插入图片描述

完整代码

源代码V1（报错）：

#load the data
data = open('flare').read()
#移除换行符
data = data.replace('\n','').replace('\r','')
print(data)
#字符去重处理
letters = list(set(data))
print(letters)
num_letters = len(letters)
print(num_letters)
#建立字典
#int to char
int_to_char = {a:b for a,b in enumerate(letters)}
##print(int_to_char)
#char to int
char_to_int = {b:a for a,b int enumerate(letters)}
##print(char_to_int)
#time_step
time_step = 20

#批量处理
import numpy as np
from keras.utils import to_categorical
#滑动窗口提取数据
def extract_data(data, slide):
    x = []
    y = []
    for i in range(len(data) - slide):
        x.append([a for a in data[i:i+slide]])
        y.append(data[i+slide])
    X = np.array(X)
    X = X.reshape(X.shape[0],X.shape[1],1)
    return X, y

#字符到数字的批量转化
def char_to_int_Data(x,y, char_to_int):
    x_to_int = []
    y_to_int = []
    for i in range(len(x)):
        x_to_int.append([char_to_int[char] for char in x[i]])
        y_to_int.append([char_to_int[char] for char in y[i]])
    return x_to_int, y_to_int

#实现输入字符文章的批量处理，输入整个字符、滑动窗口大小、转化字典
def data_preprocessing(data, slide, num_letters, char_to_int):
    char_Data = extract_data(data, slide)
    int_Data = char_to_int_Data(char_Data[0], char_Data[1], char_to_int)
    Input = int_Data[0]
    Output = list(np.array(int_Data[1]).flatten())
    Input_RESHAPED = np.array(Input).reshape(len(Input), slide)
    new = np.random.randint(0,10,size=[Input_RESHAPED.shape[0],Input_RESHAPED.shape[1],num_letters])
    for i in range(Input_RESHAPED.shape[0]):
        for j in range(Input_RESHAPED.shape[1]):
            new[i,j,:] = to_categorical(Input_RESHAPED[i,j],num_classes=num_letters)
    return new, Output

#extract X and y from text data
X, y = data_preprocessing(data,time_step,num_letters,char_to_int)
##print(X.shape)
##print(len(y))

#split the data数据分离
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.1，random_state=10)
##print(X.shape,len(y))

y_train_category = to_categorical(y_train,num_letters)
print(y_train_category)

#set up the model
from keras.models import Sequential
from keras.layers import Dense,LSTM

model = Sequential()
model.add(LSTM(units=20,input_shape=(X_train.shape[1],X_train.shape[2]),activation='relu'))
model.add(Dense(units=num_letters,activation='softmax'))
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()
#train the model
model.fit(X_train,y_train_category,batch_size=1000,epochs=5)

#make prediction based on the training data
y_train_predict = model.predict_classes(X_train)
print(y_train_predict)

#transform the int to letters
y_train_predict_char = [int_to_char[i] for i in y_train_predict]
print(y_train_predict_char)

from sklearn.metrics import accuracy_score
accuracy_train = accuracy_score(y_train,y_train_predict)
print(accuracy_train)

y_test_predict = model.predict_classes(X_test)
y_test_predict_char = [int_to_char[i]] for i in y_test_predict]
accuracy_test = accuracy_score(y_test,y_test_predict)
print(accuracy_test)
print(y_test_predict)
print(y_test)

new_letters = 'flare is a teacher in ai industry.He obtained his phd in Australia'
X_new, y_new = data_preprocessing(data,time_step,num_letters,char_to_int)
y_new_predict = model.predict(X_new)
print(y_new_predict)

#transform the int to letters
y_new_predict_char = [int_to_char[i] for i in y_new_predict]
print(y_new_predict_char)

for i in range(0,X_new.shape[0]-20):
	print(new_letters[i:i+20],'--predict next letters is--',y_new_predict_char[i])

修改版V2：

import numpy as np
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, LSTM
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the data
data = open('flare.txt').read()
# Remove newlines
data = data.replace('\n', '').replace('\r', '')
print(data)

# Character deduplication
letters = list(set(data))
print(letters)
num_letters = len(letters)
print(num_letters)

# Create dictionaries
int_to_char = {a: b for a, b in enumerate(letters)}
char_to_int = {b: a for a, b in enumerate(letters)}

# Time step
time_step = 20

# Sliding window data extraction
def extract_data(data, slide):
    x = []
    y = []
    for i in range(len(data) - slide):
        x.append([a for a in data[i:i + slide]])
        y.append(data[i + slide])  # y should be a single character
    return x, y

# Character to integer batch conversion
def char_to_int_Data(x, y, char_to_int):
    x_to_int = []
    y_to_int = []
    for i in range(len(x)):
        x_to_int.append([char_to_int[char] for char in x[i]])
        y_to_int.append(char_to_int[y[i]])  # y[i] is a single character
    return x_to_int, y_to_int

# Implement batch processing for input text
def data_preprocessing(data, slide, num_letters, char_to_int):
    char_Data = extract_data(data, slide)
    int_Data = char_to_int_Data(char_Data[0], char_Data[1], char_to_int)
    Input = int_Data[0]
    Output = int_Data[1]  # Output is already a list of integers
    Input_RESHAPED = np.array(Input).reshape(len(Input), slide, 1)
    new = np.zeros((Input_RESHAPED.shape[0], Input_RESHAPED.shape[1], num_letters))
    for i in range(Input_RESHAPED.shape[0]):
        for j in range(Input_RESHAPED.shape[1]):
            new[i, j, :] = to_categorical(Input_RESHAPED[i, j], num_classes=num_letters)
    return new, Output

# Extract X and y from text data
X, y = data_preprocessing(data, time_step, num_letters, char_to_int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=10)

# Convert y_train to categorical
y_train_category = to_categorical(y_train, num_letters)
print(y_train_category)

# Set up the model
model = Sequential()
model.add(LSTM(units=20, input_shape=(X_train.shape[1], X_train.shape[2]), activation='relu'))
model.add(Dense(units=num_letters, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

# Train the model
model.fit(X_train, y_train_category, batch_size=1000, epochs=3)

# Make predictions based on the training data
y_train_predict = np.argmax(model.predict(X_train), axis=1)
print(y_train_predict)

# Transform the int to letters
y_train_predict_char = [int_to_char[i] for i in y_train_predict]
print(y_train_predict_char)

# Calculate accuracy
accuracy_train = accuracy_score(y_train, y_train_predict)
print(accuracy_train)

# Predict on test data
y_test_predict = np.argmax(model.predict(X_test), axis=1)
y_test_predict_char = [int_to_char[i] for i in y_test_predict]
accuracy_test = accuracy_score(y_test, y_test_predict)
print(accuracy_test)
print(y_test_predict)
print(y_test)

# New data for prediction
new_letters = 'flare is a teacher in ai industry. He obtained his phd in Australia'
X_new, y_new = data_preprocessing(new_letters, time_step, num_letters, char_to_int)
y_new_predict = np.argmax(model.predict(X_new), axis=1)
y_new_predict_char = [int_to_char[i] for i in y_new_predict]
print(y_new_predict_char)

# Print predictions
for i in range(0, X_new.shape[0]):
    print(new_letters[i:i + time_step], '--predict next letter is--', y_new_predict_char[i])