之前做的实验都是基于历史数据进行实施预测并验证结果,今天写一篇文章来进行分步预测未来变化,其实和之前没有太大不同。
1.数据预处理
我这里有NBA篮网队的技术统计数据,如下:
得分 | 前场板 | 后场板 | 总篮板 | 助攻 | 失误 | 抢断 | 盖帽 | 犯规 |
125 | 9 | 33 | 42 | 27 | 8 | 8 | 3 | 17 |
115 | 6 | 41 | 47 | 25 | 8 | 7 | 6 | 21 |
123 | 7 | 32 | 39 | 26 | 8 | 6 | 7 | 21 |
141 | 9 | 27 | 36 | 29 | 9 | 8 | 11 | 28 |
119 | 10 | 27 | 37 | 16 | 9 | 8 | 3 | 23 |
130 | 12 | 34 | 46 | 31 | 14 | 7 | 7 | 23 |
104 | 14 | 36 | 50 | 18 | 11 | 7 | 7 | 22 |
123 | 4 | 33 | 37 | 35 | 12 | 9 | 9 | 20 |
105 | 8 | 44 | 52 | 26 | 9 | 6 | 8 | 16 |
128 | 3 | 40 | 43 | 33 | 11 | 3 | 9 | 19 |
115 | 13 | 38 | 51 | 32 | 7 | 4 | 8 | 19 |
125 | 2 | 33 | 35 | 27 | 17 | 10 | 5 | 24 |
109 | 10 | 41 | 51 | 18 | 13 | 8 | 4 | 18 |
118 | 6 | 33 | 39 | 21 | 13 | 6 | 6 | 25 |
114 | 14 | 39 | 53 | 18 | 16 | 5 | 1 | 11 |
109 | 11 | 31 | 42 | 20 | 8 | 6 | 3 | 18 |
130 | 8 | 42 | 50 | 35 | 13 | 3 | 5 | 19 |
116 | 7 | 34 | 41 | 30 | 10 | 9 | 6 | 20 |
128 | 11 | 33 | 44 | 30 | 10 | 6 | 4 | 17 |
109 | 6 | 41 | 47 | 26 | 11 | 12 | 7 | 19 |
103 | 12 | 38 | 50 | 19 | 13 | 3 | 6 | 17 |
134 | 10 | 34 | 44 | 30 | 13 | 7 | 2 | 21 |
107 | 8 | 30 | 38 | 26 | 11 | 10 | 6 | 17 |
130 | 7 | 30 | 37 | 35 | 18 | 12 | 5 | 17 |
117 | 11 | 32 | 43 | 23 | 15 | 6 | 3 | 21 |
127 | 11 | 46 | 57 | 31 | 18 | 8 | 4 | 22 |
101 | 9 | 33 | 42 | 19 | 19 | 6 | 10 | 20 |
139 | 10 | 35 | 45 | 37 | 17 | 10 | 8 | 31 |
114 | 11 | 34 | 45 | 27 | 10 | 6 | 6 | 14 |
107 | 4 | 31 | 35 | 28 | 5 | 3 | 3 | 17 |
111 | 13 | 43 | 56 | 34 | 15 | 8 | 5 | 18 |
120 | 12 | 33 | 45 | 25 | 11 | 6 | 3 | 21 |
112 | 7 | 43 | 50 | 31 | 14 | 10 | 7 | 18 |
113 | 16 | 33 | 49 | 19 | 10 | 3 | 4 | 27 |
88 | 13 | 35 | 48 | 19 | 9 | 9 | 1 | 10 |
116 | 10 | 37 | 47 | 25 | 8 | 5 | 2 | 17 |
113 | 10 | 25 | 35 | 24 | 7 | 11 | 8 | 18 |
113 | 9 | 30 | 39 | 24 | 8 | 9 | 3 | 16 |
124 | 15 | 35 | 50 | 22 | 16 | 10 | 2 | 20 |
117 | 3 | 38 | 41 | 29 | 15 | 3 | 6 | 19 |
100 | 5 | 30 | 35 | 22 | 11 | 8 | 7 | 16 |
121 | 11 | 34 | 45 | 21 | 14 | 3 | 5 | 22 |
132 | 10 | 43 | 53 | 33 | 14 | 5 | 1 | 24 |
124 | 13 | 38 | 51 | 35 | 5 | 6 | 8 | 13 |
98 | 14 | 33 | 47 | 21 | 19 | 5 | 4 | 16 |
129 | 7 | 44 | 51 | 30 | 13 | 11 | 2 | 18 |
127 | 10 | 31 | 41 | 30 | 8 | 8 | 5 | 16 |
112 | 9 | 31 | 40 | 21 | 6 | 10 | 4 | 19 |
109 | 8 | 31 | 39 | 23 | 13 | 9 | 3 | 18 |
128 | 4 | 33 | 37 | 28 | 10 | 4 | 4 | 11 |
136 | 9 | 26 | 35 | 31 | 15 | 7 | 3 | 16 |
134 | 8 | 37 | 45 | 35 | 13 | 9 | 5 | 15 |
104 | 9 | 37 | 46 | 20 | 12 | 2 | 5 | 20 |
111 | 8 | 26 | 34 | 30 | 15 | 8 | 6 | 22 |
108 | 8 | 26 | 34 | 21 | 17 | 6 | 3 | 18 |
117 | 8 | 32 | 40 | 29 | 18 | 5 | 6 | 19 |
124 | 6 | 38 | 44 | 25 | 16 | 6 | 4 | 17 |
146 | 7 | 38 | 45 | 33 | 17 | 6 | 5 | 24 |
147 | 10 | 40 | 50 | 33 | 11 | 7 | 4 | 25 |
132 | 11 | 25 | 36 | 29 | 9 | 7 | 10 | 25 |
98 | 6 | 43 | 49 | 24 | 7 | 6 | 5 | 18 |
128 | 9 | 38 | 47 | 34 | 17 | 0 | 6 | 26 |
113 | 6 | 23 | 29 | 23 | 11 | 5 | 8 | 25 |
135 | 13 | 39 | 52 | 30 | 16 | 8 | 11 | 24 |
125 | 7 | 42 | 49 | 26 | 17 | 2 | 5 | 19 |
122 | 4 | 37 | 41 | 24 | 19 | 9 | 5 | 14 |
116 | 12 | 34 | 46 | 29 | 6 | 3 | 5 | 12 |
122 | 8 | 33 | 41 | 31 | 18 | 6 | 6 | 18 |
116 | 6 | 37 | 43 | 23 | 17 | 3 | 2 | 18 |
110 | 4 | 38 | 42 | 26 | 20 | 6 | 4 | 13 |
122 | 11 | 37 | 48 | 30 | 12 | 11 | 7 | 17 |
130 | 8 | 43 | 51 | 31 | 12 | 11 | 4 | 22 |
122 | 7 | 44 | 51 | 25 | 20 | 5 | 9 | 24 |
96 | 10 | 35 | 45 | 19 | 15 | 6 | 6 | 16 |
145 | 9 | 31 | 40 | 30 | 8 | 6 | 3 | 24 |
111 | 13 | 37 | 50 | 28 | 17 | 13 | 6 | 20 |
104 | 6 | 40 | 46 | 23 | 19 | 4 | 5 | 17 |
123 | 9 | 34 | 43 | 21 | 15 | 5 | 10 | 22 |
125 | 13 | 44 | 57 | 24 | 20 | 11 | 7 | 22 |
我们需要对数据进行分步处理,即使用第一场的因素来预测第二场的得分。
前场板 | 后场板 | 总篮板 | 助攻 | 失误 | 抢断 | 盖帽 | 犯规 |
删除第一场的得分情况,使上方单元格下移,并将第一行的得分情况设置为0即可。即使用上一场的前场板....犯规等数据预测下一场的得分。
如下:
得分 | 前场板 | 后场板 | 总篮板 | 助攻 | 失误 | 抢断 | 盖帽 | 犯规 |
0 | 9 | 33 | 42 | 27 | 8 | 8 | 3 | 17 |
125 | 6 | 41 | 47 | 25 | 8 | 7 | 6 | 21 |
115 | 7 | 32 | 39 | 26 | 8 | 6 | 7 | 21 |
123 | 9 | 27 | 36 | 29 | 9 | 8 | 11 | 28 |
141 | 10 | 27 | 37 | 16 | 9 | 8 | 3 | 23 |
119 | 12 | 34 | 46 | 31 | 14 | 7 | 7 | 23 |
130 | 14 | 36 | 50 | 18 | 11 | 7 | 7 | 22 |
104 | 4 | 33 | 37 | 35 | 12 | 9 | 9 | 20 |
123 | 8 | 44 | 52 | 26 | 9 | 6 | 8 | 16 |
105 | 3 | 40 | 43 | 33 | 11 | 3 | 9 | 19 |
128 | 13 | 38 | 51 | 32 | 7 | 4 | 8 | 19 |
115 | 2 | 33 | 35 | 27 | 17 | 10 | 5 | 24 |
125 | 10 | 41 | 51 | 18 | 13 | 8 | 4 | 18 |
109 | 6 | 33 | 39 | 21 | 13 | 6 | 6 | 25 |
118 | 14 | 39 | 53 | 18 | 16 | 5 | 1 | 11 |
114 | 11 | 31 | 42 | 20 | 8 | 6 | 3 | 18 |
109 | 8 | 42 | 50 | 35 | 13 | 3 | 5 | 19 |
130 | 7 | 34 | 41 | 30 | 10 | 9 | 6 | 20 |
116 | 11 | 33 | 44 | 30 | 10 | 6 | 4 | 17 |
128 | 6 | 41 | 47 | 26 | 11 | 12 | 7 | 19 |
109 | 12 | 38 | 50 | 19 | 13 | 3 | 6 | 17 |
103 | 10 | 34 | 44 | 30 | 13 | 7 | 2 | 21 |
134 | 8 | 30 | 38 | 26 | 11 | 10 | 6 | 17 |
107 | 7 | 30 | 37 | 35 | 18 | 12 | 5 | 17 |
130 | 11 | 32 | 43 | 23 | 15 | 6 | 3 | 21 |
117 | 11 | 46 | 57 | 31 | 18 | 8 | 4 | 22 |
127 | 9 | 33 | 42 | 19 | 19 | 6 | 10 | 20 |
101 | 10 | 35 | 45 | 37 | 17 | 10 | 8 | 31 |
139 | 11 | 34 | 45 | 27 | 10 | 6 | 6 | 14 |
114 | 4 | 31 | 35 | 28 | 5 | 3 | 3 | 17 |
107 | 13 | 43 | 56 | 34 | 15 | 8 | 5 | 18 |
111 | 12 | 33 | 45 | 25 | 11 | 6 | 3 | 21 |
120 | 7 | 43 | 50 | 31 | 14 | 10 | 7 | 18 |
112 | 16 | 33 | 49 | 19 | 10 | 3 | 4 | 27 |
113 | 13 | 35 | 48 | 19 | 9 | 9 | 1 | 10 |
88 | 10 | 37 | 47 | 25 | 8 | 5 | 2 | 17 |
116 | 10 | 25 | 35 | 24 | 7 | 11 | 8 | 18 |
113 | 9 | 30 | 39 | 24 | 8 | 9 | 3 | 16 |
113 | 15 | 35 | 50 | 22 | 16 | 10 | 2 | 20 |
124 | 3 | 38 | 41 | 29 | 15 | 3 | 6 | 19 |
117 | 5 | 30 | 35 | 22 | 11 | 8 | 7 | 16 |
100 | 11 | 34 | 45 | 21 | 14 | 3 | 5 | 22 |
121 | 10 | 43 | 53 | 33 | 14 | 5 | 1 | 24 |
132 | 13 | 38 | 51 | 35 | 5 | 6 | 8 | 13 |
124 | 14 | 33 | 47 | 21 | 19 | 5 | 4 | 16 |
98 | 7 | 44 | 51 | 30 | 13 | 11 | 2 | 18 |
129 | 10 | 31 | 41 | 30 | 8 | 8 | 5 | 16 |
127 | 9 | 31 | 40 | 21 | 6 | 10 | 4 | 19 |
112 | 8 | 31 | 39 | 23 | 13 | 9 | 3 | 18 |
109 | 4 | 33 | 37 | 28 | 10 | 4 | 4 | 11 |
128 | 9 | 26 | 35 | 31 | 15 | 7 | 3 | 16 |
136 | 8 | 37 | 45 | 35 | 13 | 9 | 5 | 15 |
134 | 9 | 37 | 46 | 20 | 12 | 2 | 5 | 20 |
104 | 8 | 26 | 34 | 30 | 15 | 8 | 6 | 22 |
111 | 8 | 26 | 34 | 21 | 17 | 6 | 3 | 18 |
108 | 8 | 32 | 40 | 29 | 18 | 5 | 6 | 19 |
117 | 6 | 38 | 44 | 25 | 16 | 6 | 4 | 17 |
124 | 7 | 38 | 45 | 33 | 17 | 6 | 5 | 24 |
146 | 10 | 40 | 50 | 33 | 11 | 7 | 4 | 25 |
147 | 11 | 25 | 36 | 29 | 9 | 7 | 10 | 25 |
132 | 6 | 43 | 49 | 24 | 7 | 6 | 5 | 18 |
98 | 9 | 38 | 47 | 34 | 17 | 0 | 6 | 26 |
128 | 6 | 23 | 29 | 23 | 11 | 5 | 8 | 25 |
113 | 13 | 39 | 52 | 30 | 16 | 8 | 11 | 24 |
135 | 7 | 42 | 49 | 26 | 17 | 2 | 5 | 19 |
125 | 4 | 37 | 41 | 24 | 19 | 9 | 5 | 14 |
122 | 12 | 34 | 46 | 29 | 6 | 3 | 5 | 12 |
116 | 8 | 33 | 41 | 31 | 18 | 6 | 6 | 18 |
122 | 6 | 37 | 43 | 23 | 17 | 3 | 2 | 18 |
116 | 4 | 38 | 42 | 26 | 20 | 6 | 4 | 13 |
110 | 11 | 37 | 48 | 30 | 12 | 11 | 7 | 17 |
122 | 8 | 43 | 51 | 31 | 12 | 11 | 4 | 22 |
130 | 7 | 44 | 51 | 25 | 20 | 5 | 9 | 24 |
122 | 10 | 35 | 45 | 19 | 15 | 6 | 6 | 16 |
96 | 9 | 31 | 40 | 30 | 8 | 6 | 3 | 24 |
145 | 13 | 37 | 50 | 28 | 17 | 13 | 6 | 20 |
111 | 6 | 40 | 46 | 23 | 19 | 4 | 5 | 17 |
104 | 9 | 34 | 43 | 21 | 15 | 5 | 10 | 22 |
123 | 13 | 44 | 57 | 24 | 20 | 11 | 7 | 22 |
好了,转成CSV文件。
2.导入常用包
import csv
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import explained_variance_score
from sklearn import metrics
from sklearn.metrics import mean_absolute_error # 平方绝对误差
import random
3.引入数据
因为数据是从前到后的,所以需要转置一下。
#特征
feature=[]
#目标
target=[]
csv_file = csv.reader(open('lanwang.csv'))
for content in csv_file:
content=list(map(float,content))
if len(content)!=0:
feature.append(content[1:9])
target.append(content[0:1])
targets=[]
for i in target:
targets.append(i[0])
feature.reverse()
targets.reverse()
# 标准化转换
scaler = StandardScaler()
# 训练标准化对象
scaler.fit(feature)
# 转换数据集
feature= scaler.transform(feature)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title('TURE')
plt.plot(targets)
plt.xlabel('Time')
plt.ylabel('Value')
得分变化图:
4.SVR验证
使用前90%训练,后10%验证。
注意:
1.训练集一定因为是随机划分。所以一定要把最后一行数据去掉,即存在0的那一行数据,因为那一行数据是不存在目标值的,所以特征值也毫无意义。
2.使用评价函数时也去掉最后一行,因为根本不知道真实数据,所以没有意义。
代码如下:
feature1=feature[0:int(len(feature))-1]
targets1=targets[0:int(len(targets))-1]
feature_train,feature_test,target_train,target_test = train_test_split(feature1,targets1,test_size=0.1,random_state=8)
feature_test=feature[int(len(feature)*0.9):int(len(feature))]
target_test=targets[int(len(targets)*0.9):int(len(targets))]
model_svr = SVR()
model_svr.fit(feature_train,target_train)
predict_results=model_svr.predict(feature_test)
target1=target_test[0:int(len(target_test))-1]
results=predict_results[0:int(len(predict_results))-1]
plt.plot(target1)#测试数组
plt.plot(predict_results)#测试数组
plt.legend(['True','SVR'])
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title("SVR") # 标题
plt.show()
print("MSE:",mean_squared_error(target1,results))
print("R2 = ",metrics.r2_score(target1,results)) # R2
print("MAE = ",mean_absolute_error(target1,results)) # results
结果太真实了..
5.PSO优化一下
注意:返回适应值那里一定要改成去掉最后一行。
代码如下:
class PSO:
def __init__(self, parameters):
"""
particle swarm optimization
parameter: a list type, like [NGEN, pop_size, var_num_min, var_num_max]
"""
# 初始化
self.NGEN = parameters[0] # 迭代的代数
self.pop_size = parameters[1] # 种群大小
self.var_num = len(parameters[2]) # 变量个数
self.bound = [] # 变量的约束范围
self.bound.append(parameters[2])
self.bound.append(parameters[3])
self.pop_x = np.zeros((self.pop_size, self.var_num)) # 所有粒子的位置
self.pop_v = np.zeros((self.pop_size, self.var_num)) # 所有粒子的速度
self.p_best = np.zeros((self.pop_size, self.var_num)) # 每个粒子最优的位置
self.g_best = np.zeros((1, self.var_num)) # 全局最优的位置
# 初始化第0代初始全局最优解
temp = -1
for i in range(self.pop_size):
for j in range(self.var_num):
self.pop_x[i][j] = random.uniform(self.bound[0][j], self.bound[1][j])
self.pop_v[i][j] = random.uniform(0, 1)
self.p_best[i] = self.pop_x[i] # 储存最优的个体
fit = self.fitness(self.p_best[i])
if fit > temp:
self.g_best = self.p_best[i]
temp = fit
def fitness(self, ind_var):
X = feature_train
y = target_train
"""
个体适应值计算
"""
x1 = ind_var[0]
x2 = ind_var[1]
x3 = ind_var[2]
if x1==0:x1=0.001
if x2==0:x2=0.001
if x3==0:x3=0.001
clf = SVR(C=x1,epsilon=x2,gamma=x3)
clf.fit(X, y)
predictval=clf.predict(feature_test)
predictval1=predictval[0:int(len(predictval))-1]
print("R2 = ",metrics.r2_score(target1,predictval1)) # R2
return metrics.r2_score(target1,predictval1)
def update_operator(self, pop_size):
"""
更新算子:更新下一时刻的位置和速度
"""
c1 = 2 # 学习因子,一般为2
c2 = 2
w = 0.4 # 自身权重因子
for i in range(pop_size):
# 更新速度
self.pop_v[i] = w * self.pop_v[i] + c1 * random.uniform(0, 1) * (
self.p_best[i] - self.pop_x[i]) + c2 * random.uniform(0, 1) * (self.g_best - self.pop_x[i])
# 更新位置
self.pop_x[i] = self.pop_x[i] + self.pop_v[i]
# 越界保护
for j in range(self.var_num):
if self.pop_x[i][j] < self.bound[0][j]:
self.pop_x[i][j] = self.bound[0][j]
if self.pop_x[i][j] > self.bound[1][j]:
self.pop_x[i][j] = self.bound[1][j]
# 更新p_best和g_best
if self.fitness(self.pop_x[i]) > self.fitness(self.p_best[i]):
self.p_best[i] = self.pop_x[i]
if self.fitness(self.pop_x[i]) > self.fitness(self.g_best):
self.g_best = self.pop_x[i]
def main(self):
popobj = []
self.ng_best = np.zeros((1, self.var_num))[0]
for gen in range(self.NGEN):
self.update_operator(self.pop_size)
popobj.append(self.fitness(self.g_best))
print('############ Generation {} ############'.format(str(gen + 1)))
if self.fitness(self.g_best) > self.fitness(self.ng_best):
self.ng_best = self.g_best.copy()
print('最好的位置:{}'.format(self.ng_best))
print('最大的函数值:{}'.format(self.fitness(self.ng_best)))
print("---- End of (successful) Searching ----")
plt.figure()
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title("Figure1")
plt.xlabel("iterators", size=14)
plt.ylabel("fitness", size=14)
t = [t for t in range(self.NGEN)]
plt.plot(t, popobj, color='b', linewidth=2)
plt.show()
if __name__ == '__main__':
NGEN = 20
popsize = 20
low = [0,0,0]
up = [100,100,100]
parameters = [NGEN, popsize, low, up]
pso = PSO(parameters)
pso.main()
可能是数据太少了,优化后R方也只能到达70左右。
6.获取PSO-SVR的对未来一场比赛得分的预测结果
model_svr = SVR(C=100,epsilon=0,gamma=100)
model_svr.fit(feature_train,target_train)
predict_results=model_svr.predict(feature_test)
target1=target_test[0:int(len(target_test))-1]
results=predict_results[0:int(len(predict_results))-1]
plt.plot(target1,marker='s')#测试数组
plt.plot(predict_results,marker='o')#测试数组
plt.legend(['True','SVR'])
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.title("SVR") # 标题
plt.show()
print("MSE:",mean_squared_error(target1,results))
print("R2 = ",metrics.r2_score(target1,results)) # R2
print("MAE = ",mean_absolute_error(target1,results)) # results
结果很一般:
真实值:
预测值:
你看明白了吗?
我回去把数据集中的0改成1000000
再次调优化仍然是0.72左右,和之前差不多就不改参数了。
通过上一场的条件,预测下一场的结果就是119分左右。
LSTM自回归预测与此相同,只不过就是这样的格式。
2 | 1 |
3 | 2 |
4 | 3 |
5 | 4 |
0 | 5 |
*此项预测仅供娱乐,切勿拿去赌球之类的事情。
*还是那句话,只要和人相关,预测永远也可不能准确,因为你无法获取全部的影响因素。
预测得分 119 准确率70%左右,真实得分...
悟了吗,兄弟们。