前言
- Keywords: 数据处理;算法复现;PCA-KNN;CNN; 深度学习
- Benefits:一定的光学基本知识,基本的算法复现的过程,深度学习中数据处理的疼痛点,代码能力提升。
- Paper Sources: https://gitee.com/librr/PCA-KNN # Robust Classification of Tea Based on Multi-Channel LED-Induced Fluorescence and a Convolutional Neural Network.
一、数据的预处理
LED光谱的预处理包括
- 减去背景光谱
- 二阶萨维茨基-戈雷滤波器平滑
- 归一化到 650 ~ 700 nm区域的最大振幅
- 保留了500~900nm之间的光谱数据( 500 nm之前的数据主要为LED光谱,900 nm之后的数据以噪声为主)
Fig.1[1] LED光谱发生器
Fig.2[1] 不同振幅下光的波长比较
二、算法思路
- CNN的体系结构如下所示[1]。
- 输入层是7行837列的图形,来自7个led结合在一起的荧光光谱。
- 隐藏层包含一个卷积层、一个扁平化层和两个完整的连接层。卷积层是CNN的核心构建部分,它对输入应用了卷积操作。该卷积模拟了单个神经元对视觉刺激的反应。该层的参数由一组可学习的过滤器(或内核)组成,每个过滤器的大小为7×7。
在前向过程中,每个滤波器通过输入的宽度和高度进行卷积,并计算滤波器的输入和输入之间的点积。卷积运算最终生成了该滤波器的二维特征映射。因此,当网络检测到输入的某个空间位置的特征的某些特定模式时,它会学习被激活的过滤器。 - 在本例子中,输入图的大小为7×837,所以滤波器只在水平方向上滑动,一次卷积后的结果是一个大小为1×831的向量。最后,应用32个内核得到32个特征映射.
Fig.3 CNN体系结构描述
name | shape |
---|---|
Input | 7 * 837 |
Convolutions | |
Flatten | |
Full Connection1 | |
Full Connection2 | |
output | |
Filters | 7*7 |
- 在实验的场景中,一个特征是一系列精确波长之间的振幅关系,位置信息是由光谱仪确保的。而池化层由于自身特点,不适合在该CNN中使用,因此使用一个扁平的层来自重新排序特征图,然后使用两个全连接层来计算矩阵乘法。
- 此外,在每个卷积层和全连接层后,采用整流线性单元(ReLU)作为非线性激活操作。在输出层中,使用Softmax操作来给出预测的概率。采用交叉熵损失来测量地面真实类别与预测结果之间的不一致。为了根据损失学习网络参数,采用随机梯度下降(SGD)最优方法。
三、代码参考
Project 1
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from torch.utils.data import Dataset, DataLoader
# 定义自定义数据集类
class CustomDataset(Dataset):
def __init__(self, csv_file):
self.data = pd.read_csv(csv_file)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
sample = self.data.iloc[idx]
wavelength = torch.tensor(sample[Wavelength[nm]])
average = torch.tensor(sample[Average])
return wavelength, average
# 定义神经网络模型
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Conv1d(2, 16, kernel_size=7)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(16 * 282, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, num_classes)
self.relu = nn.ReLU()
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.conv(x)
x = self.relu(x)
x = self.flatten(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
x = self.softmax(x)
return x
# 设置超参数
num_epochs = 10
batch_size = 32
learning_rate = 0.001
num_classes = 10
# 创建数据加载器
train_dataset = CustomDataset(<train.csv<)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
# 创建模型实例和优化器
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
# 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
for i, (wavelength, average) in enumerate(train_loader):
# 前向传播
outputs = model(wavelength)
loss = criterion(outputs, average)
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print(<Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}<.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
# 测试模型
test_dataset = CustomDataset(<test.csv<)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
with torch.no_grad():
correct = 0
total = 0
for wavelength, average in test_loader:
outputs = model(wavelength)
_, predicted = torch.max(outputs.data, 1)
total += average.size(0)
correct += (predicted == average).sum().item()
print(<Accuracy of the model on the test dataset: {} %<.format(100 * correct / total))
Project 2
model.py
import torch
import torch.nn as nn
class SPCNN(nn.module):
def __init__(self):
super(SPCNN, self).__init__()
self.CNN = nn.Conv2d(1,32,2,1) #in_channel 1 out_channel 32 kernel 2*2
self.BN1 = nn.BatchNorm2d(32)
self.FN1 = nn.Linear(32,128)
self.BN2 = nn.BatchNorm2d(128)
self.FN2 = nn.Linear(128,n) #n is num of class
def forward(self,inputx):
x = self.CNN(inputx)
x = nn.ReLU()(x)
x = self.BN1(x)
x = self.FN1(x)
x = nn.ReLU()(x)
x = self.BN2(x)
x = self.FN2(x)
return x
dataset.py
import torch
import numpy as np
from torch.utils.data inport Dataset
class SPDATA(Dataset):
def __init__(self,splitpath,datapath):
self.splitpath = splitpath
self.datapath = datapath
with open(splitpath,"r") as fr:
line = fr.readline()
self.namelist = line.split(",")
def __getitem__(self,index):
name = namelist[index]
# 接下来读取 name 路径的数据,读成 numpy array (data)
# 读取 name 里的分类,读成 int(label)
return data,label
def __len__(self):
return len(self.namelist)
main.py
import torch
from torch.utils.data inport Dataset
from torch.utils.data inport Dataloader
import torch.nn as nn
from model import SPCNN
from dataset import SPDATA
#读取模型
model = SPCNN()
#读取数据
traindata = SPDATA(trainsplitpath,datapath)
testdata = SPDATA(testsplitpath,datapath)
trainloader = Dataloader(traindata,batch_size=16,shuffle=True)
testloader = Dataloader(testdata,batch_size=1,shuffle=True)
#定义优化器和损失函数
params = model.parameters()
optim = torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
Loss = nn.CrossEntropyLoss()
#train
for data, label in trainloader:
model.train()
predicted = model(data)
optim.zero_grad()
floss = Loss(predicted,data)
floss.backward()
optim.step()
#保存模型
#test
for data,label in testloader:
model.evel()
predicted = nn.Softmax()(model(data))
#输出结果