序
在学习机器学习、人工智能时,许多算法常常调用pytorch等类的封装函数、返回值,在这种情况下盲目地调用、记忆会导致知其然不知其所以然的情况,不清楚为什么要调用某个方法,只知道该调用了,仅此而已。在这里从头手搓一个能够表意深度学习思想的代码及其构成思路,以供参考
注:该训练模型没有经过严格的参数调整,各个参数在训练特定模型时不能完成严谨、精确的训练,需要更进一步的分析、调参,故该模型匮乏实际使用价值。
该模型仅用作短时间帮助新手上手,对包括矩阵相乘、反向传播等算法进行了简化,会很大程度上影响模型性能,同时方便新手了解大致思路
粗略介绍机器学习原理
注:此为粗略介绍
机器学习实际上是尝试让机器表示一种从对象到对象的变换过程。而矩阵则是数学上非常便于描述一种变换关系的类型(可以进行多维度的变换描述、存储信息量大且易于操作)。故在很多机器学习中经常使用多次矩阵变换来描述一个更复杂的变换。而如何使得变换更加趋于准确呢?便是调整各个矩阵的值。在这里我们把这些矩阵叫做参数矩阵。
而如何进行变换呢?
前向传播
这里我们设定输入特征数量为m,即输入矩阵是一个1*m的矩阵。设定隐藏层特征数量为n,即每个隐藏层都是1*n矩阵,也可以说是向量。设定输出特征数量为k。
我们想要做的可以表示成如下的描述:把一开始的m维向量与第一个权重矩阵相乘,再与第二个,第三个......以此类推直到得到输出矩阵。这就是前向传播的通俗理解
基于前向传播,我们需要做的,就是让参数矩阵的值进行调整,从而使得对于任意输入矩阵,我们都能得到期望中的目标输出矩阵。
梯度下降算法
这里引入数学中的偏导数求法,将偏移函数Y对结果矩阵求偏导,即求出结果矩阵梯度。梯度算法通过计算目标函数关于参数的梯度来指导参数的更新,期望通过迭代的过程使得目标函数的值逐渐最小化。因此我们要做的就是求出每一层矩阵的梯度,从而使其指导每一层矩阵在训练过程中的更新
反向传播
前向传播让我们可以依次计算各个隐藏层,从而计算出最终所得矩阵。而我们通过梯度下降算法最直接的所得是最后一个权重矩阵的梯度,而此时我们通过反向传播得到每个矩阵的梯度,从而对每个隐藏层的权重矩阵进行更新。
注:以上的介绍都没有涉及数学公式、概念,只需要对其目的有所了解即可。
机器学习的构建中,每层矩阵具体所担当的职责是不知道的,反向传播过程中涉及到的部分求矩阵转置在数学的推导中也无法直接得出。对于初学者不建议上来死啃数学原理
我们要做什么?
我们期待做的是一个深度学习工程,其包括了矩阵类、样本的生成、训练所需代码、使用新的样本检验生成模型
项目文件目录
matrix_class.py 矩阵类
symbol_generator.py 样本生成代码
test_vector.py 测试代码
train_vector.py 训练代码
data/ 此处存放数据
data/goal_vector.txt 存放训练数据
data/record_vector.txt 存放训练记录
data/test_vector.txt 存放测试数据、结果
data/Weight_vector.txt 存放权重矩阵
实际代码
矩阵类matrix_class.py
因为在多次矩阵乘法中可能会出现过大的数据炸float,故我们在矩阵的构建中控制矩阵内所存储的数值绝对值小于1,同时保留小数点后10位。
引用如下,我们会使用random进行随机数的生成、math进行开根号操作
import random
import math
矩阵类的初始化。包括了矩阵内元素的类、矩阵的行数列数、矩阵中元素的具体值。随后将矩阵中数随机初始化
class Matrix:
def __init__(self, line, column, type_of_member=float):
self.type_of_member = type_of_member
self.line = line
self.column = column
self.matrix = [[0.0 for _ in range(column)] for _ in range(line)]
for j in range(line):
for i in range(column):
self.matrix[j][i] = round(random.uniform(-1, 1), 10)
set方法,用于更改矩阵内的值。在这里会进行判断是否有大于1、小于-1的出现
def set(self, line, column, value):
value_ = float(value)
assert (isinstance(value_, self.type_of_member)), f"Type error! with {value} is not {self.type_of_member}"
if value_ > 1:
value_ = 1
if value_ < -1:
value_ = -1
assert (-1 <= value_ <= 1), f"Value error! value={value_}>1 or <-1"
self.matrix[line][column] = value_
mul方法,用于矩阵相乘。在实际大模型训练中,乘法有更复杂、更有效的数值控制方法,在此处我们简化成将所有乘积加和取平均值防止大于1,同时我们进行开根处理,以保持数量级的大致相同
mul方法一定要进行第一个矩阵的列与第二个矩阵的行的对齐,以保证在书写训练代码时维度出现问题时及时报错
def mul(self, matrix2):
assert (self.column == matrix2.line), "Value error:self.column == matrix2.line"
x = Matrix(self.line, matrix2.column) # 所得矩阵
for i in range(self.line):
for j in range(matrix2.column):
summ = 0.0
for k in range(self.column):
assert (i < self.line and k < self.column and j < matrix2.column)
summ += (self.matrix[i][k])*(matrix2.matrix[k][j])
summ /= self.column
if summ >= 0:
summ = math.sqrt(summ)
else:
summ = -1*math.sqrt(-summ)
x.set(i, j, round(summ, 10))
return x
variance方法,用于计算两个矩阵的方差
def variance(self, matrix2):
assert (self.column == matrix2.column), "Value error:self.column == matrix2.column"
assert (self.line == matrix2.line), "Value error:self.line == matrix2.line"
summ = 0
x = self.sub(matrix2)
for i in range(self.column):
for j in range(self.line):
summ += (x.matrix[j][i])*(x.matrix[j][i])
summ /= (self.column * self.line)
return summ
减sub、加plus、乘数mul_num、转置transpose方法
def sub(self, matrix2):
assert (self.column == matrix2.column), "Value error:self.column == matrix2.column"
assert (self.line == matrix2.line), "Value error:self.line == matrix2.line"
x = Matrix(self.line, matrix2.column)
for i in range(matrix2.line):
for j in range(matrix2.column):
x.set(i, j, round((self.matrix[i][j]-matrix2.matrix[i][j]), 10))
return x
def plus(self, matrix2):
assert (self.column == matrix2.column), "Value error:self.column == matrix2.column"
assert (self.line == matrix2.line), "Value error:self.line == matrix2.line"
x = Matrix(self.line, matrix2.column)
for i in range(matrix2.line):
for j in range(matrix2.column):
x.set(i, j, round((self.matrix[i][j]+matrix2.matrix[i][j]), 10))
return x
def mul_num(self, num):
for i in range(self.line):
for j in range(self.column):
self.set(i, j, round(self.matrix[i][j]*num, 10))
return self
def transpose(self):
x = Matrix(self.column, self.line)
for i in range(self.line):
for j in range(self.column):
x.matrix[j][i] = round(self.matrix[i][j], 10)
return x
随机样本生成symbol_generator.py
生成随机样本代码。在目标文件的前三行分别会优先填充输入层特征数、输出层特征数、样本个数,来说明文档属性,以便与trainer对齐
这里把第二个矩阵,即目标矩阵每一位进行了+random.uniform(-0.0000001, 0.0000001)操作,使得数据集有些许波动
def generate_vector_goal(file_path, input_layer, output_layer, num_of_sample):
with open(file_path, 'w') as f:
f.writelines(str(input_layer)+"\n")
f.writelines(str(output_layer)+"\n")
f.writelines(str(num_of_sample)+"\n")
for i in range(num_of_sample):
x = [round(random.uniform(-1, 1), 10) for _ in range(input_layer)]
y = [0.0 for _ in range(output_layer)]
for j in range(input_layer):
y[j] = round(x[j]/2+random.uniform(-0.0000001, 0.0000001))
for j in range(input_layer):
f.write(str(x[j])+"\n")
for j in range(output_layer):
f.write(str(y[j])+"\n")
生成随机测试用样本代码。在目标文件的第一行会填入输入层特征数,目的同上
def generate_vector_test(file_path, input_layer):
with open(file_path, 'w') as f:
f.writelines(str(input_layer)+"\n")
for i in range(input_layer):
f.write(str(round(random.uniform(-1, 1), 10))+"\t\t\t")
f.write("\n")
生成随机的权重矩阵。在目标文件前四行会分别填充输入特征数、隐藏层特征数、输出特征数、隐藏层层数
def generate_vector_weight(file_path, input_layer, hidden_layer, output_layer, num_of_plies):
with open(file_path, 'w') as f:
f.writelines(str(input_layer)+"\n")
f.writelines(str(hidden_layer)+"\n")
f.writelines(str(output_layer)+"\n")
f.writelines(str(num_of_plies)+"\n")
for j in range(input_layer):
for k in range(hidden_layer):
f.write(str(round(random.uniform(-1, 1), 10))+"\n")
for i in range(num_of_plies-1):
for j in range(hidden_layer):
for k in range(hidden_layer):
f.write(str(round(random.uniform(-1, 1), 10))+"\n")
for j in range(hidden_layer):
for k in range(output_layer):
f.write(str(round(random.uniform(-1, 1), 10))+"\n")
最终调用方法,进行随机的生成
if __name__ == '__main__':
generate_vector_goal("data/goal_vector.txt", 20, 30, 100)
generate_vector_weight("data/Weight_vector.txt", 20, 20, 30, 500)
generate_vector_test("data/test_vector.txt", 20)
训练用代码train_vector.py
准备工作:引入已经写好的Matrix类、datetime类,后者用于在编写训练日志时记录时间
几个常量的初始化,包括各层特征数、隐藏层层数、训练轮数、样本个数、学习率。
from matrix_class import Matrix
from datetime import datetime
input_layer = 20
output_layer = 30 # 输入、结果特征数
hidden_layer = 20 # 隐藏层特征数
num_of_plies = 500 # 隐藏层层数
num_of_term = 200 # 训练轮数
num_of_sample = 100 # 样本个数
lr = 1e-7 # 学习率
file_path = "data/goal_vector.txt"
weight_matrix_file_path = "data/Weight_vector.txt"
record_file_path = "data/record_vector.txt"
读取权重矩阵,要提取进行四个常量的对齐
if __name__ == '__main__':
with open(weight_matrix_file_path, 'r+') as f:
hidden_weight = [Matrix(hidden_layer, hidden_layer) for i in range(num_of_plies-1)] # 声明隐藏层权重矩阵
hidden_first_weight = Matrix(input_layer, hidden_layer)
hidden_last_weight = Matrix(hidden_layer, output_layer)
# 读取参数矩阵
weight_lines = f.readlines()
assert (input_layer == int(weight_lines[0])), "Value error! with input_layer == int(weight_lines[0])"
assert (hidden_layer == int(weight_lines[1])), "Value error! with hidden_layer == int(weight_lines[1])"
assert (num_of_plies == int(weight_lines[3])), "Value error! with num_of_plies == int(weight_lines[3])"
assert (output_layer == int(weight_lines[2])), "Value error! with output_layer == int(weight_lines[2])"
for j in range(input_layer):
for k in range(hidden_layer):
hidden_first_weight.set(j, k, float(weight_lines[j*hidden_layer+k+4].strip()))
for i in range(num_of_plies-1):
for j in range(hidden_layer):
for k in range(hidden_layer):
hidden_weight[i].set(j, k, float(weight_lines[input_layer*hidden_layer +
k + j*hidden_layer +
i*hidden_layer*hidden_layer+4].strip()))
for j in range(hidden_layer):
for k in range(output_layer):
hidden_last_weight.set(j, k, float(weight_lines[input_layer*hidden_layer +
(num_of_plies-1)*hidden_layer*hidden_layer +
k + j*output_layer+4].strip()))
读入输入矩阵及对应的目标矩阵,同样也要先进行对齐
# 打开存有原始值和目标值的文件
with (open(file_path, 'r+') as f):
matrix_input = Matrix(1, input_layer)
matrix_output = Matrix(1, output_layer)
matrix_goal = Matrix(1, output_layer)
h = [Matrix(1, hidden_layer) for i in range(num_of_plies)] # 隐藏层中间值
lines = f.readlines()
assert (input_layer == int(lines[0])), "Value error! with input_matrix_line"
assert (output_layer == int(lines[1])), "Value error! with input_matrix_column"
assert (num_of_sample == int(lines[2])), "Value error! with num_of_matrix"
for i in range(num_of_term): # 训练i次
for j in range(num_of_sample): # 对于每个矩阵
# 读取矩阵
for num in range(input_layer):
matrix_input.set(0, num, float(lines[j*input_layer + j*output_layer + num+3].strip()))
for num in range(output_layer):
matrix_output.set(0, num, float(lines[j*output_layer + (j+1)*input_layer + num+3].strip()))
使用前向传递算法,计算依据当前权重矩阵以及输入矩阵计算所得到的结果矩阵
# 计算所得矩阵
h[0] = matrix_input.mul(hidden_first_weight)
for k in range(num_of_plies-1):
h[k+1] = h[k].mul(hidden_weight[k])
matrix_output = h[num_of_plies-1].mul(hidden_last_weight)
计算方差、书写日志。
对比所得结果矩阵和期望矩阵的方差,计算当前训练进度,同时书写日志
# 计算方差
variance = matrix_output.variance(matrix_goal)
schedule = round((i*num_of_sample+j)*100/(num_of_term * num_of_sample), 2)
# 进行记录
print(f"这是第{i}次训练中的第{j}个矩阵,所得到方差结果为{variance},当前进度{schedule}%")
with open(record_file_path, 'w') as f1:
f1.write(str(datetime.utcnow()))
f1.write(f"这是第{i}次训练中的第{j}个向量,所得到方差结果为{variance},当前进度{schedule}%\n")
f1.write("以下为输入向量")
for num in range(input_layer):
f1.write(str(matrix_input.matrix[0][num])+"\t\t\t")
f1.write("\n")
f1.write("以下为结果向量")
for num in range(output_layer):
f1.write(str(matrix_output.matrix[0][num])+"\t\t\t")
f1.write("\n")
f1.write("以下为目标向量")
for num in range(output_layer):
f1.write(str(matrix_goal.matrix[0][num])+"\t\t\t")
f1.write("\n")
f1.write("\n\n")
实现反向传播算法,以计算梯度矩阵、更新权重矩阵
在这里我们一定要保证各个操作维度对齐
# 计算结果矩阵梯度
grad_output = matrix_goal.sub(matrix_output)
grad_hidden_last = h[num_of_plies-1].transpose().mul(grad_output)
grad_hidden = [Matrix(hidden_layer, hidden_layer)
for _ in range(num_of_plies-1)]
grad = grad_output.mul(grad_hidden_last.transpose())
for k in reversed(range(num_of_plies-2)):
grad_hidden[k] = h[k].transpose().mul(grad)
grad = grad.mul(hidden_weight[k].transpose())
grad_hidden_first = matrix_input.transpose().mul(grad)
# 更新权重矩阵
hidden_last_weight = hidden_last_weight.sub(grad_hidden_last.mul_num(lr))
for k in range(num_of_plies - 1):
hidden_weight[k] = hidden_weight[k].sub(grad_hidden[k].mul_num(lr))
hidden_first_weight = hidden_first_weight.sub(grad_hidden_first.mul_num(lr))
最后更新文档中的权重,打印训练完成
# 更新文档中权重
with open(weight_matrix_file_path, 'r+') as f1:
f1.truncate(0)
for m in range(input_layer):
for n in range(hidden_layer):
f1.write(f"{hidden_first_weight.matrix[m][n]}\n")
for k in range(num_of_plies-1):
for m in range(hidden_layer):
for n in range(hidden_layer):
f1.write(f"{hidden_weight[k].matrix[m][n]}\n")
for m in range(hidden_layer):
for n in range(output_layer):
f1.write(f"{hidden_last_weight.matrix[m][n]}\n")
print("训练完成")
测试代码test_vector
这个跟训练用代码重合率很高,就不额外赘述
from matrix_class import Matrix
input_layer = 20
output_layer = 30 # 输入、结果特征数
hidden_layer = 20 # 隐藏层特征数
num_of_plies = 500 # 隐藏层层数
num_of_term = 200 # 训练轮数
num_of_sample = 100 # 样本个数
lr = 1e-7 # 学习率
file_path = "data/test_vector.txt"
weight_matrix_file_path = "data/Weight_vector.txt"
record_file_path = "data/record_vector.txt"
if __name__ == '__main__':
with open(weight_matrix_file_path, 'r+') as f:
hidden_first_weight = Matrix(input_layer, hidden_layer)
hidden_last_weight = Matrix(hidden_layer, output_layer)
hidden_weight = [Matrix(hidden_layer, hidden_layer) for i in range(num_of_plies-1)] # 声明隐藏层权重矩阵
# 读取参数矩阵
weight_lines = f.readlines()
assert (input_layer == int(weight_lines[0])), "Value error! with input_layer == int(weight_lines[0])"
assert (hidden_layer == int(weight_lines[1])), "Value error! with hidden_layer == int(weight_lines[1])"
assert (output_layer == int(weight_lines[2])), "Value error! with output_layer == int(weight_lines[2])"
assert (num_of_plies == int(weight_lines[3])), "Value error! with num_of_plies == int(weight_lines[3])"
for j in range(input_layer):
for k in range(hidden_layer):
hidden_first_weight.set(j, k, float(weight_lines[j*hidden_layer+k+4].strip()))
for i in range(num_of_plies-1):
for j in range(hidden_layer):
for k in range(hidden_layer):
hidden_weight[i].set(j, k, float(weight_lines[input_layer*hidden_layer +
k + j*hidden_layer +
i*hidden_layer*hidden_layer+4].strip()))
for j in range(hidden_layer):
for k in range(output_layer):
hidden_last_weight.set(j, k, float(weight_lines[input_layer*hidden_layer +
(num_of_plies-1)*hidden_layer*hidden_layer +
k + j*output_layer+4].strip()))
with (open(file_path, 'r+') as f):
matrix_input = Matrix(1, input_layer)
lines = f.readlines()
assert (input_layer == int(lines[0])), "Value error! with input_layer == int(lines[0])"
line = lines[1]
numbers = line.strip().split()
numbers = [float(num) for num in numbers]
for num in range(input_layer):
matrix_input.set(0, num, numbers[num])
# 计算所得矩阵
h = matrix_input.mul(hidden_first_weight)
for k in range(num_of_plies-1):
h = h.mul(hidden_weight[k])
matrix_output = h.mul(hidden_last_weight)
# 写入矩阵
f.write("\n\n\n"+"以下为结果向量\n\n")
for i in range(output_layer):
f.write(str(matrix_output.matrix[0][i])+"\t\t\t")
print("写入所得矩阵完毕")
运行项目
先在symbol_generator里面运行方法,以完成各个文档内数的初始化
然后在train_vector里面运行,等待训练结束
最后运行test_vector,并在data/test_vector中查看结果是否符合预期
代码中还有一部分_matrix的,是输入和输出都是矩阵。这种训练更加费时间,只是作为扩展学习放在资源中