机器学习算法模型之神经网络

1. 神经网络是一个嵌套的过程,注意不是一个回归的过程。要搞清楚神经网络,主要是搞清楚神经网络是如何工作的。
    1. 权重w的求解
    2. 激活函数的选择
    3. 隐藏层的层数和每一层对应神经元的个数
    综合以上三个过程的结束,我们的神经网络的框架就算是搭建完成了。
2. 神经网络在sklearn中的API
    1. 分类器
        sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100,), activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
            1. hidden_layer_sizes:传入一个元组, 例如(50,50),这表示第一层隐藏层和第二层隐藏层都有50个神经元。
            2. activation: 激活函数,{'identity','logistic','tanh','relu'},默认是relu
            3. solver: 权重优化器,{'lbfgs','sgd','adam'}, 默认是adam
                lbfgs:quasi-Newton方法的优化器
                sgd:随机梯度下降
                adam:Kingma, Diederik和Jimmy Ba提出的机遇随机梯度的优化器
                注意,默认solver的adam在相对较大的数据集上效果较好(几千个或者几万个),对小数据集来讲,lbfgs收敛更快,效果更好。
            4. alpha: float,正则惩罚项
            5. batch_size: int, 默认auto, 随即优化的minibatches的大小
            6. learning_rate: 学习率,用于权重的更新,只有当solver为'sdg'时使用,{'constant','invscaling','adaptive'},默认是'constant'
            7.shuffle: bool, 是否对每次嵌套时对样本进行清洗, 在solver为sgd和adam时候可以选用
            8.learning_rate_int: 默认0.001,初始学习率,solver随机梯度下降或者adam时候使用。
            
    2. 回归器
        sklearn.neural_network.MLPRegressor()
        回归器和分类器的参数相同
    3. API的属性说明:
        coefs_: 包含w的矩阵,可以通过迭代获得每一层神经网络的权重矩阵
        classes_: 每个输出的类标签
        loss_: 损失函数计算出来的当前损失值
        coefs_: 列表中的第i个元素表示i层的权重矩阵
        intercepts_: 列表中第i个元素代表i+1层的偏差向量
        n_iter_ :迭代次数,注意这里的迭代次数是权重的迭代次数。
        n_layers_: 层数
        n_outputs_: 输出的个数
        out_activation_: 输出激活函数的名称
3. 神经网络使用的范围:
    数据量非常大的场景来进行。
4. 神经网络的优缺点:
    优点:预测非常的准确
    缺点:由于其过程非常复杂,因此不好说明。不容易表达出来。

 

由于回归器和分类器的参数相同,这里我们只仅仅对回归器进行代码演示

数据如下:我们的特征数据为‘拥堵延时指数、高延时运行时间占比、拥堵路段里程比、平均车速’,我们的目标集为‘交通健康指数’

城市名称城市代码交通健康指数交通延时指数高延时运行时间占比拥堵路段里程比平均车速
中山市4420000.7436354841.59591397853.136129031.67919354828.26258065
临沂市3713000.7590064521.56881720448.297473121.57139784929.85091398
兰州市6201000.7404709681.50897849537.992903231.83854838727.91311828
南宁市4501000.7578838711.47892473130.017849461.2681182828.51892473
南昌市3601000.7604129031.49919354836.559032261.61817204330.64521505
南通市3206000.780061291.37322580617.293978490.64129032336.31129032
厦门市3502000.7655354841.48629032335.035806451.60209677433.4133871
台州市3310000.77071.48919354838.171935481.01677419430.18037634
合肥市3401000.7573225811.49645161339.336935481.40650537629.20225806
哈尔滨市2301000.7536225811.56591397848.028494621.94317204328.65483871
嘉兴市3304000.7780451611.38408602221.684354840.52419354830.47215054
大连市2102000.7409419351.62145161359.946236561.82962365629.05290323
太原市1401000.7513032261.5515591439.246989252.11360215131.8844086
常州市3204000.7651516131.43510752730.196881720.6931182831.96709677
徐州市3203000.7556096771.47715053838.261666671.28881720429.62478495
惠州市4413000.7442774191.55236559147.938870971.66005376328.59102151
无锡市3202000.7531580651.41903225830.6451.25580645233.03451613
昆明市5301000.7457645161.54053763443.906881721.97231182828.21462366
泉州市3505000.7853935481.38860215124.014247310.84048387133.10064516
济南市3701000.7449580651.6848387160.752419352.13370967728.18612903
温州市3303000.7508451611.51456989246.594784951.29032258125.28204301
潍坊市3707000.7783451611.53860215149.193548390.70532258128.54231183
烟台市3706000.7679741941.52881720447.670215050.7367204330.69768817
珠海市4404000.7538258061.53408602242.473118281.47177419434.25344086
石家庄市1301000.7571967741.51236559138.799247311.5131182831.24752688
福州市3501000.7509774191.5965591450.627419351.67989247328.88032258
绍兴市3306000.7606451611.49231182840.23279571.06731182827.47043011
贵阳市5201000.7363064521.56102150543.63779573.06763440931.57268817
金华市3307000.7698129031.36860215116.935537630.63247311829.11575269
长春市2201000.7370645161.66747311865.680967742.36365591427.95241935

创建的神经网络结果如图所示:

代码演示如下:

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import pandas as pd


# 数据集获取
path = "E:\Desktop\二线城市交通大数据(整理版本).xlsx"
data = pd.read_excel(path)
x = data.iloc[:, 3:]
y = data['交通健康指数']

# 拆分数据集
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)

# 神经网络评估器的流程
# 实例化预估器
estimator = MLPRegressor(hidden_layer_sizes=(9, 9), activation='relu', solver='lbfgs',max_iter=10000)

# 模型训练
estimator.fit(x_train, y_train)

# 模型结果
print('权重为:\n', estimator.coefs_)
print('偏置为:\n', estimator.intercepts_)
print('迭代次数:\n', estimator.max_iter)

# 模型评估
print(mean_squared_error(y_true=y_test, y_pred=estimator.predict(x_test)))

# 画图分析
plt.plot([i for i in range(len(y_test))], y_test, label='true')
plt.plot([i for i in range(len(y_test))], estimator.predict(x_test), label='predict')
plt.legend()
plt.show()

 输出结果为:

权重为:
 [array([[ 0.42577168,  0.17666234,  0.19003966,  0.26250721,  0.3317495 ,
         0.08508971, -0.22039682, -0.35895583, -0.13872385],
       [-0.37564312, -0.59776873,  0.46398324, -0.20914132, -0.48539331,
         0.33594829,  0.22823493,  0.0282387 , -0.49696628],
       [-0.29733362,  0.1314713 ,  0.04251641, -0.53688462, -0.33659041,
         0.49136256,  0.32953032,  0.39113453, -0.26674058],
       [ 0.12170368, -0.3033268 , -0.46451011, -0.09668492,  0.23758192,
        -0.31235258, -0.58519066, -0.01642481,  0.07302027]]), array([[-3.87153101e-01, -4.41334063e-01,  2.05860764e-01,
        -5.71204328e-01,  2.86162288e-01, -3.47884587e-01,
         2.58576352e-01, -8.90988416e-02,  4.77097423e-01],
       [-8.41104197e-02,  4.91815922e-01, -4.45543029e-01,
         1.76201627e-01,  2.47454291e-01, -1.42124659e-01,
         3.05051279e-02, -2.04327463e-01, -1.22892624e-02],
       [ 8.52969740e-02,  1.81415220e-01,  4.49311909e-01,
        -3.33872078e-01,  1.58314641e-01, -3.56945623e-01,
         2.72664354e-01, -3.20190743e-02,  2.95226558e-01],
       [ 5.36893944e-01, -5.14863968e-01, -1.69998148e-01,
         5.73832139e-01,  3.27003319e-01,  1.12153740e-01,
         3.53775468e-01,  4.96910910e-02, -2.06986749e-01],
       [-2.51883778e-01,  1.64562422e-01,  6.20866569e-02,
         5.62702793e-01, -3.10472029e-01, -1.55396850e-01,
        -1.18747530e-01,  5.58967054e-01,  3.12433716e-01],
       [ 2.10134725e-01,  8.63839437e-02, -2.26089459e-01,
        -4.34510907e-01,  2.88601412e-01, -3.81954622e-01,
         3.49619056e-01,  4.09807358e-01,  4.69517473e-01],
       [-1.54442824e-04,  3.05257781e-01,  5.63687971e-01,
        -5.52803397e-01, -3.22323538e-01,  2.10270416e-01,
        -2.06400717e-01,  4.73932003e-01, -4.94360991e-01],
       [ 3.72320528e-01,  1.05820561e-01, -5.67804032e-01,
        -4.01005888e-01, -3.53290114e-02,  2.46396546e-01,
        -7.04286272e-02,  4.24332860e-01,  1.40415718e-01],
       [-2.42658334e-02, -4.06160053e-01, -4.42680874e-02,
        -3.98431087e-02,  2.70100216e-01, -3.00897123e-01,
        -2.83378521e-01, -1.03931077e-01,  2.49463069e-01]]), array([[-0.29808742],
       [-0.08643562],
       [-0.35921175],
       [-0.75997322],
       [ 0.32831232],
       [ 0.30329984],
       [ 0.50290044],
       [-0.35763353],
       [-0.14015589]])]
偏置为:
 [array([ 0.23508308, -0.30508132, -0.42374356,  0.08324226, -0.46454786,
       -0.31669358,  0.20434223, -0.43277333,  0.27982283]), array([ 0.03710788, -0.17710693,  0.19630572, -0.32625699, -0.45168697,
        0.08850363,  0.44465056,  0.0520729 , -0.3625317 ]), array([0.62700466])]
迭代次数:
 10000
均方误差
 3.710511288209076e-05

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值