用机器学习预测股票第二天涨跌。
预测结果千股跌停!!!!
训练神经网络出现异常:Maximum iterations (200) reached and the optimization hasn’t converged yet,字面意思是达到限制的迭代总数,只需要增加迭代次数(最大值)或缩放数据就可以。将代码改为(增加迭代次数)。
1.报错信息
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
warnings.warn(
C:\Users\liyongfei\AppData\Local\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:684: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
warnings.warn(
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
C:\Users\liyongfei\AppData\Local\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:541: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
C:\Users\liyongfei\AppData\Local\anaconda3\lib\site-packages\sklearn\neural_network\_multilayer_perceptron.py:541: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
2.源代码
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
# import warnings
# warnings.filterwarnings("ignore") # 忽略版本问题
# 使用GridSearchCV做参数优化,找到最好的参数 最佳参数
dataSource = pd.read_csv('data/紫光股份神经网络数据源.csv', encoding='gbk', header=0)
labels = np.array(dataSource['nextLabel'])
dataSource = dataSource.drop(columns =['symbol','name','label','nextLabel'])
print(dataSource)
print(labels)
x_train,x_test,y_train,y_test = train_test_split(dataSource, labels, test_size=0.30,
random_state=0)
print('-------test_features----------')
print(x_test)
print('-------test_labels----------')
print(y_test)
# 设定参数网格搜索的参数范围
parameters = {
'hidden_layer_sizes': [(100,), (50, 100), (100, 100), (50, 50), (50, 100, 50)],
'activation': ['identity', 'logistic', 'tanh', 'relu'],
'solver': ['lbfgs', 'sgd', 'adam'],
'alpha': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
# 定义一个MLP分类器
estimator = MLPClassifier(random_state=1)
# 创建GridSearchCV对象
grid_search = GridSearchCV(estimator=estimator, param_grid=parameters, scoring='accuracy', cv=10, n_jobs=-1)
# 训练模型
grid_search.fit(x_train, y_train)
# 打印最佳参数和分数
print("最佳参数:", grid_search.best_params_)
print("最佳分数:", grid_search.best_score_)
# 使用最佳参数配置的分类器来预测和评估测试集
best_classifier = grid_search.best_estimator_
# 假设已经有了测试集X_test和目标值y_test
predictions = best_classifier.predict(x_test)
print("预测结果:",predictions)
# 评估模型性能
# 例如,使用模块sklearn.metrics中的函数来评估性能
# 最佳参数: {'activation': 'logistic', 'alpha': 0.001, 'hidden_layer_sizes': (50, 100, 50), 'solver': 'lbfgs'}
# 最佳分数: 0.6406593406593407
3.修改后代码
增加参数 max_iter=10000
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
# import warnings
# warnings.filterwarnings("ignore") # 忽略版本问题
# 使用GridSearchCV做参数优化,找到最好的参数 最佳参数
dataSource = pd.read_csv('data/紫光股份神经网络数据源.csv', encoding='gbk', header=0)
labels = np.array(dataSource['nextLabel'])
dataSource = dataSource.drop(columns =['symbol','name','label','nextLabel'])
print(dataSource)
print(labels)
x_train,x_test,y_train,y_test = train_test_split(dataSource, labels, test_size=0.30,
random_state=0)
print('-------test_features----------')
print(x_test)
print('-------test_labels----------')
print(y_test)
# 设定参数网格搜索的参数范围
parameters = {
'hidden_layer_sizes': [(100,), (50, 100), (100, 100), (50, 50), (50, 100, 50)],
'activation': ['identity', 'logistic', 'tanh', 'relu'],
'solver': ['lbfgs', 'sgd', 'adam'],
'alpha': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
}
# 定义一个MLP分类器
estimator = MLPClassifier(random_state=1,max_iter=10000)
# 创建GridSearchCV对象
grid_search = GridSearchCV(estimator=estimator, param_grid=parameters, scoring='accuracy', cv=10, n_jobs=-1)
# 训练模型
grid_search.fit(x_train, y_train)
# 打印最佳参数和分数
print("最佳参数:", grid_search.best_params_)
print("最佳分数:", grid_search.best_score_)
# 使用最佳参数配置的分类器来预测和评估测试集
best_classifier = grid_search.best_estimator_
# 假设已经有了测试集X_test和目标值y_test
predictions = best_classifier.predict(x_test)
print("预测结果:",predictions)
# 评估模型性能
# 例如,使用模块sklearn.metrics中的函数来评估性能
# 最佳参数: {'activation': 'logistic', 'alpha': 0.001, 'hidden_layer_sizes': (50, 100, 50), 'solver': 'lbfgs'}
# 最佳分数: 0.6406593406593407
4.上述代码数据集
data/紫光股份神经网络数据源.csv
https://download.csdn.net/download/qq_14945847/88831511
5.写在最后
紫光股份数据集最佳分数: 0.6406593406593407,效果很一般,这个数据集满足不了紫光股份股票的预测,可以获取其他股票数据集,来训练模型,模型效果如果大于0.7可以尝试以下购买股票。
后续会测试MLPRegressor回归算法进行训练,预测下一个交易日可能的收盘价。
目前模型不稳定,仅供参考,无任何投资价值。
ps:看样子明天又是千股跌停的一天!!!!!