不好使地机器学习预测双色球蓝球号码

引入需要的库

import requests
import numpy as np
import pandas as pd
获取历史双色球的网页
url='https://datachart.500.com/ssq/history/newinc/history.php?start=03001' 

参考https://www.jianshu.com/p/79979e7982fa?utm_campaign=hugo获取历史数据

#获取历史所有双色球数据
response = requests.get(url)
response.encoding = 'utf-8'  
re_text = response.text

#网页数据解析
re=re_text.split('<tbody id="tdata">')[1].split('</tbody>')[0]
result=re.split('<tr class="t_tr1">')[1:]

all_numbers=[]
for i in result:
    each_numbers=[]
    i=i.replace('<!--<td>2</td>-->','')
    each=i.split('</td>')[:-1]
    for j in each:
        each_numbers.append(j.split('>')[1].replace('&nbsp;',''))
    
    all_numbers.append(each_numbers)
  
#定义列名称  
col=['期号','红球1','红球2','红球3','红球4','红球5','红球6','蓝球','快乐星期天','奖池奖金(元)',
     '一等奖注数','一等奖奖金(元)','二等奖注数','二等奖奖金(元)','总投注额(元)','开奖日期']

#解析完网页数据,生成双色球数据框
df_all=pd.DataFrame(all_numbers,columns=col)
df_all.head()
期号红球1红球2红球3红球4红球5红球6蓝球快乐星期天奖池奖金(元)一等奖注数一等奖奖金(元)二等奖注数二等奖奖金(元)总投注额(元)开奖日期
02003402081516263203878,740,204116,780,706182134,531340,793,6802020-05-07
12003307101221313201879,873,846410,000,00097263,480320,555,3622020-05-05
22003203111314152613843,200,936116,588,951135161,837338,904,0022020-05-03
32003102050915162709850,135,149145,603,72231034,081323,903,3922020-04-30
42003017182129303203896,891,829110,000,00074322,591327,786,6322020-04-28
df_all
期号红球1红球2红球3红球4红球5红球6蓝球快乐星期天奖池奖金(元)一等奖注数一等奖奖金(元)二等奖注数二等奖奖金(元)总投注额(元)开奖日期
02003402081516263203878,740,204116,780,706182134,531340,793,6802020-05-07
12003307101221313201879,873,846410,000,00097263,480320,555,3622020-05-05
22003203111314152613843,200,936116,588,951135161,837338,904,0022020-05-03
32003102050915162709850,135,149145,603,72231034,081323,903,3922020-04-30
42003017182129303203896,891,829110,000,00074322,591327,786,6322020-04-28
...................................................
253903005040615173031161314,280,829004260,15310,661,4382003-03-09
25400300404060710132503011,852,729003281,5789,517,7942003-03-06
25410300301071023283216039,881,677002332,3698,917,9602003-03-02
2542030020409192021261208,330,621002264,3327,398,8702003-02-27
25430300110111213262811102,097,070001898,74410,307,8062003-02-23

2544 rows × 16 columns

提取蓝球数据

df_all['蓝球']
0       03
1       01
2       13
3       09
4       03
        ..
2539    16
2540    03
2541    16
2542    12
2543    11
Name: 蓝球, Length: 2544, dtype: object

导入sklearn库

from sklearn.svm import SVR

修改行索引为时间

df_blue = df_all.iloc[::-1]
df_blue = df_blue.set_index('开奖日期')
df_blue
期号红球1红球2红球3红球4红球5红球6蓝球快乐星期天奖池奖金(元)一等奖注数一等奖奖金(元)二等奖注数二等奖奖金(元)总投注额(元)
开奖日期
2003-02-230300110111213262811102,097,070001898,74410,307,806
2003-02-27030020409192021261208,330,621002264,3327,398,870
2003-03-020300301071023283216039,881,677002332,3698,917,960
2003-03-060300404060710132503011,852,729003281,5789,517,794
2003-03-0903005040615173031161314,280,829004260,15310,661,438
................................................
2020-04-282003017182129303203896,891,829110,000,00074322,591327,786,632
2020-04-302003102050915162709850,135,149145,603,72231034,081323,903,392
2020-05-032003203111314152613843,200,936116,588,951135161,837338,904,002
2020-05-052003307101221313201879,873,846410,000,00097263,480320,555,362
2020-05-072003402081516263203878,740,204116,780,706182134,531340,793,680

2544 rows × 15 columns

提取20个开奖周期的蓝球数据到数据中

for i in range(1, 21, 1):
    df_blue.loc[:,'Close Minus ' + str(i)] = df_blue['蓝球'].shift(i)
df_blue
期号红球1红球2红球3红球4红球5红球6蓝球快乐星期天奖池奖金(元)...Close Minus 11Close Minus 12Close Minus 13Close Minus 14Close Minus 15Close Minus 16Close Minus 17Close Minus 18Close Minus 19Close Minus 20
开奖日期
2003-02-230300110111213262811102,097,070...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2003-02-27030020409192021261208,330,621...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2003-03-020300301071023283216039,881,677...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2003-03-060300404060710132503011,852,729...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2003-03-0903005040615173031161314,280,829...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
..................................................................
2020-04-282003017182129303203896,891,829...03130407010714090109
2020-04-302003102050915162709850,135,149...14031304070107140901
2020-05-032003203111314152613843,200,936...07140313040701071409
2020-05-052003307101221313201879,873,846...02071403130407010714
2020-05-072003402081516263203878,740,204...08020714031304070107

2544 rows × 35 columns

筛选有效数据

sp20 = df_blue[[x for x in df_blue.columns if 'Close Minus' in x or x == '蓝球']].iloc[20:,]
sp20
蓝球Close Minus 1Close Minus 2Close Minus 3Close Minus 4Close Minus 5Close Minus 6Close Minus 7Close Minus 8Close Minus 9...Close Minus 11Close Minus 12Close Minus 13Close Minus 14Close Minus 15Close Minus 16Close Minus 17Close Minus 18Close Minus 19Close Minus 20
开奖日期
2003-05-0401040913060613021212...13090807061603161211
2003-05-0808010409130606130212...15130908070616031612
2003-05-1102080104091306061302...12151309080706160316
2003-05-1514020801040913060613...12121513090807061603
2003-05-1812140208010409130606...02121215130908070616
..................................................................
2020-04-2803050806060208080207...03130407010714090109
2020-04-3009030508060602080802...14031304070107140901
2020-05-0313090305080606020808...07140313040701071409
2020-05-0501130903050806060208...02071403130407010714
2020-05-0703011309030508060602...08020714031304070107

2524 rows × 21 columns

提取训练集,留500作为预测集

x_train = sp20[:-500]
x_train
蓝球Close Minus 1Close Minus 2Close Minus 3Close Minus 4Close Minus 5Close Minus 6Close Minus 7Close Minus 8Close Minus 9...Close Minus 11Close Minus 12Close Minus 13Close Minus 14Close Minus 15Close Minus 16Close Minus 17Close Minus 18Close Minus 19Close Minus 20
开奖日期
2003-05-0401040913060613021212...13090807061603161211
2003-05-0808010409130606130212...15130908070616031612
2003-05-1102080104091306061302...12151309080706160316
2003-05-1514020801040913060613...12121513090807061603
2003-05-1812140208010409130606...02121215130908070616
..................................................................
2016-12-0115100307140312131507...01141316100310160809
2016-12-0412151003071403121315...02011413161003101608
2016-12-0613121510030714031213...07020114131610031016
2016-12-0805131215100307140312...15070201141316100310
2016-12-1106051312151003071403...13150702011413161003

2024 rows × 21 columns

目标值

y_train = sp20['蓝球'].shift(-1)[:-500]
y_train
开奖日期
2003-05-04    08
2003-05-08    02
2003-05-11    14
2003-05-15    12
2003-05-18    16
              ..
2016-12-01    12
2016-12-04    13
2016-12-06    05
2016-12-08    06
2016-12-11    10
Name: 蓝球, Length: 2024, dtype: object

预测集和对比集

x_test = sp20[-500:]
y_test = sp20['蓝球'].shift(-1)[-500:]
x_test
蓝球Close Minus 1Close Minus 2Close Minus 3Close Minus 4Close Minus 5Close Minus 6Close Minus 7Close Minus 8Close Minus 9...Close Minus 11Close Minus 12Close Minus 13Close Minus 14Close Minus 15Close Minus 16Close Minus 17Close Minus 18Close Minus 19Close Minus 20
开奖日期
2016-12-1310060513121510030714...12131507020114131610
2016-12-1512100605131215100307...03121315070201141316
2016-12-1814121006051312151003...14031213150702011413
2016-12-2007141210060513121510...07140312131507020114
2016-12-2205071412100605131215...03071403121315070201
..................................................................
2020-04-2803050806060208080207...03130407010714090109
2020-04-3009030508060602080802...14031304070107140901
2020-05-0313090305080606020808...07140313040701071409
2020-05-0501130903050806060208...02071403130407010714
2020-05-0703011309030508060602...08020714031304070107

500 rows × 21 columns

y_test
开奖日期
2016-12-13     12
2016-12-15     14
2016-12-18     07
2016-12-20     05
2016-12-22     07
             ... 
2020-04-28     09
2020-04-30     13
2020-05-03     01
2020-05-05     03
2020-05-07    NaN
Name: 蓝球, Length: 500, dtype: object

导入模型,拟合

clf = SVR(kernel='linear')
model = clf.fit(x_train, y_train)
preds = model.predict(x_test)
preds
array([ 8.58676633, 10.04212035,  8.99286766,  7.81664951,  8.85687919,
        8.17807331,  7.88675707,  9.12631827,  9.86707628,  9.50928296,
        9.02836386,  7.01681388,  8.8549026 ,  8.36320076,  9.514238  ,
        9.89583753,  8.12181409,  8.10611289,  8.69316687,  6.95661343,
        9.28758735,  8.97889805,  8.57062279,  9.47964576,  8.28717634,
        8.7176217 ,  8.57425781,  8.4514194 , 10.19625642,  7.92558213,
        7.50841961,  9.56776243,  8.13161002,  8.37967603,  9.07735228,
        8.67835037,  8.66892358,  9.28955127,  9.73093117,  8.63210465,
        8.18542601,  8.47619177,  8.71433588,  8.94164347,  7.74923166,
        8.0290494 ,  8.73319813,  8.89862196,  9.7517356 ,  9.25784447,
        9.34139293,  7.76386757,  8.71621187, 10.63192191,  8.63566704,
        8.85784687,  9.20373353,  8.81642518,  8.94184744,  9.32935945,
       10.1380704 ,  8.3947628 ,  8.64123079, 10.32142012,  9.08417377,
        8.86278631,  9.20851648,  7.72399751,  8.94238674,  8.25015006,
        8.92932569,  9.84590386,  8.26172306,  8.78810202,  8.19540522,
        9.00628716,  8.76530205,  9.29303384,  9.46007778,  9.17797816,
        7.39612198,  8.96372278, 10.54762032,  8.46384568,  9.77787681,
        8.87558466,  7.043298  ,  8.99920831,  8.44925127,  9.07705154,
        9.41660219,  8.38541241,  9.21890421,  8.54078717,  8.62960435,
        8.41015668,  8.94580875,  9.88939022,  8.63976203,  8.62031344,
        8.75268473,  8.11734574,  9.61889516,  9.29407703,  9.21098897,
        8.04105697,  9.59783473,  9.71685777,  9.18833669,  9.82468936,
        9.59897906,  7.68013441,  8.14455261,  9.3760988 ,  8.95203119,
       10.37716365, 10.44114214,  7.45247107,  9.38542959,  8.35913058,
        7.81164495,  9.73742441,  8.17743095,  9.31985947,  9.3599625 ,
        9.11066203,  8.85810417,  7.41233031,  8.07433288,  8.82599934,
        7.32655627,  9.38577877,  8.50466695,  8.78407966,  9.11041412,
        8.06996677,  8.03142014,  9.07850694,  8.85422537,  9.16743652,
        9.5473424 ,  8.52024396,  8.85845297,  8.01269069,  7.92689016,
        8.73021864, 10.20886234,  9.79064358,  8.87858553,  8.31592939,
        8.95579691,  8.00250802,  9.22678545,  9.31060474,  8.17547431,
        7.88314157,  9.03546515,  7.83289199,  7.82789981,  8.46082521,
        9.7996846 ,  7.10351517,  8.6908756 ,  8.70377681,  8.28338995,
        9.13512688,  7.59446757,  9.46477374,  8.37623523,  7.75977077,
        9.32594739,  8.33945126,  8.77559253,  9.22925321,  8.45836056,
        9.1597941 ,  9.3142494 ,  8.49366985,  7.98255797,  7.98760949,
        9.04398843,  8.14048901,  8.99392753,  8.64388447,  8.95796887,
        7.98834268,  9.76729164,  7.56718879,  8.57143079,  9.19044218,
        9.05176795,  9.52764316,  7.92642305,  8.376546  ,  9.18667362,
        8.28305921,  9.87468356,  8.47861223,  8.24944823,  9.30001674,
        8.06741557,  8.30831829,  9.63323753,  9.08122353,  7.4141506 ,
        8.55788839,  8.71653865,  7.83489853,  9.98595278, 10.57751475,
        9.02088601,  9.01491694,  9.1139619 ,  9.123289  ,  8.86248036,
       10.75222492,  8.56172642,  8.7192524 ,  9.16409649,  8.44308954,
        9.98260715,  9.23480853,  9.48459594,  8.70702024,  7.23703221,
        9.27731217,  8.79025182,  8.71373586,  9.52335647,  8.54168366,
        8.73097331,  9.25416056,  9.03534981,  9.05520839,  9.11663778,
        9.04749133,  8.47944654,  9.81438931,  9.01441964,  7.47684487,
        8.85582699,  8.92552577,  8.35441748,  9.1275847 ,  9.64776527,
        8.86203573,  8.54198461,  9.80352083,  8.16063045,  8.88554805,
        8.73174713,  8.64411905,  8.31263542,  9.2815712 ,  8.87880302,
        8.22193847,  9.10355758,  9.97753625,  7.62457492,  8.32021672,
        9.57287923,  7.93929366,  7.90336997,  9.13307444,  9.214177  ,
        8.00670382, 10.14327517,  8.89086259,  7.87531793,  9.65669472,
        9.76025634, 10.24672254,  9.81705915,  7.65693601, 10.02286585,
        8.32478448,  9.1857039 , 11.06159148,  7.64888748,  8.39394691,
        8.64290656,  8.36138954,  9.82690075,  8.6416944 ,  9.73837843,
        8.20274274,  7.95392611, 10.20091045,  8.80705934,  9.53212577,
       10.63799762,  6.93632862,  9.02839699,  9.05237286,  8.81093175,
       10.11080219,  9.06786406,  9.11300461,  8.4457192 ,  6.75585618,
        8.76649629,  8.8132105 ,  7.9615836 ,  9.27740485,  8.4162546 ,
        7.21073231,  6.94116465,  8.95241416,  8.69584013,  7.86647028,
        8.59555889,  8.97609799,  8.56667039,  8.72653531, 10.59874911,
        8.26162894,  7.63103954,  8.67408311,  9.39023857,  9.26900907,
        9.3962732 ,  9.71496951,  9.25516328,  7.62508116,  7.60458444,
        9.59621813,  8.25591984,  8.8008075 , 10.23247376,  9.18791843,
        8.90798302,  7.64298032,  8.94382033, 10.86344729,  8.89845921,
        8.92924935,  9.73165245,  7.90033258,  9.23674649,  9.53695982,
        9.29054449,  8.91345427,  8.27243875,  8.75229269,  9.45129901,
        7.76087736,  7.52857761,  8.99142609,  7.69416645,  8.48868024,
        7.73687423,  9.57259261,  7.67568725,  7.89184722,  9.20872855,
        8.29425278,  7.83079287,  9.73131746,  7.96336425,  8.69383879,
        7.58093308,  9.38637752,  9.08161383,  8.60423298,  8.15600622,
        8.85871779,  7.75418152,  9.11382437,  9.06194801,  8.35132712,
        8.28528026,  8.39206   ,  7.98935452,  9.67397555,  8.03195846,
        9.5347649 ,  8.8821871 ,  8.22867235,  8.48330769, 10.44842799,
        7.86631999,  8.77922836,  8.41417191,  8.17405712,  9.82266405,
       10.16471342,  9.19388109,  7.39121275,  9.32162756,  9.08471652,
        8.78041329,  9.48463257,  8.87974027,  8.17671834,  8.32294427,
        8.46619554,  7.46938187,  9.37276824,  9.34608469, 10.10050843,
        9.8652496 ,  8.41954059,  8.54300332,  8.71257141,  9.34343166,
        9.22338927,  8.64240256,  9.21684098,  9.08615084,  9.3776942 ,
       10.32916159,  8.569659  , 10.15227339,  9.19148815,  8.12867607,
        9.37237906,  8.90739615,  8.5545379 ,  9.23484912,  8.51969533,
        9.3051972 ,  7.71829896,  9.26173291,  8.93609247,  9.83937335,
        8.48773416,  8.35996003,  9.23122071,  8.19319728,  9.89078363,
       10.0732114 ,  9.48186304,  8.59211119,  9.26390005, 10.59957576,
        9.81902182,  9.8013536 , 10.38716622,  8.48734418,  9.84819186,
       11.18960239,  9.52739786,  9.38003868,  8.4054212 ,  9.33333853,
        8.33461283,  9.37063959,  9.87256988,  8.86097385,  8.54162744,
        9.41233592,  8.9490739 ,  8.62019312,  8.72840484,  9.59419651,
        7.82868303,  7.53581748, 10.06779794,  8.3374766 ,  9.03798276,
        8.84183194,  8.23103837,  8.43066046,  8.80670356,  8.39586374,
        8.68708077,  8.51252948,  9.83569563,  8.01033725,  9.60695578,
        8.47610467,  8.24988492, 10.02792516,  9.81598521,  8.79749459,
        8.7436275 ,  7.29313677,  9.18171846,  9.15880308,  9.57345822,
        9.84018262,  9.12219861,  8.27665531,  8.52655564,  8.50303702,
       10.82241005,  8.24057384,  9.06728118,  8.57496793,  7.9067271 ,
        9.57301183, 10.28193341,  9.83487469,  9.35375067,  7.85224255,
       10.05941534,  9.87626911,  8.59939251,  9.54206214,  9.61572063])

四舍五入转换数据类型

preds = np.round(preds)
preds
array([ 9., 10.,  9.,  8.,  9.,  8.,  8.,  9., 10., 10.,  9.,  7.,  9.,
        8., 10., 10.,  8.,  8.,  9.,  7.,  9.,  9.,  9.,  9.,  8.,  9.,
        9.,  8., 10.,  8.,  8., 10.,  8.,  8.,  9.,  9.,  9.,  9., 10.,
        9.,  8.,  8.,  9.,  9.,  8.,  8.,  9.,  9., 10.,  9.,  9.,  8.,
        9., 11.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  8.,  9., 10.,  9.,
        9.,  9.,  8.,  9.,  8.,  9., 10.,  8.,  9.,  8.,  9.,  9.,  9.,
        9.,  9.,  7.,  9., 11.,  8., 10.,  9.,  7.,  9.,  8.,  9.,  9.,
        8.,  9.,  9.,  9.,  8.,  9., 10.,  9.,  9.,  9.,  8., 10.,  9.,
        9.,  8., 10., 10.,  9., 10., 10.,  8.,  8.,  9.,  9., 10., 10.,
        7.,  9.,  8.,  8., 10.,  8.,  9.,  9.,  9.,  9.,  7.,  8.,  9.,
        7.,  9.,  9.,  9.,  9.,  8.,  8.,  9.,  9.,  9., 10.,  9.,  9.,
        8.,  8.,  9., 10., 10.,  9.,  8.,  9.,  8.,  9.,  9.,  8.,  8.,
        9.,  8.,  8.,  8., 10.,  7.,  9.,  9.,  8.,  9.,  8.,  9.,  8.,
        8.,  9.,  8.,  9.,  9.,  8.,  9.,  9.,  8.,  8.,  8.,  9.,  8.,
        9.,  9.,  9.,  8., 10.,  8.,  9.,  9.,  9., 10.,  8.,  8.,  9.,
        8., 10.,  8.,  8.,  9.,  8.,  8., 10.,  9.,  7.,  9.,  9.,  8.,
       10., 11.,  9.,  9.,  9.,  9.,  9., 11.,  9.,  9.,  9.,  8., 10.,
        9.,  9.,  9.,  7.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,
        9.,  9.,  8., 10.,  9.,  7.,  9.,  9.,  8.,  9., 10.,  9.,  9.,
       10.,  8.,  9.,  9.,  9.,  8.,  9.,  9.,  8.,  9., 10.,  8.,  8.,
       10.,  8.,  8.,  9.,  9.,  8., 10.,  9.,  8., 10., 10., 10., 10.,
        8., 10.,  8.,  9., 11.,  8.,  8.,  9.,  8., 10.,  9., 10.,  8.,
        8., 10.,  9., 10., 11.,  7.,  9.,  9.,  9., 10.,  9.,  9.,  8.,
        7.,  9.,  9.,  8.,  9.,  8.,  7.,  7.,  9.,  9.,  8.,  9.,  9.,
        9.,  9., 11.,  8.,  8.,  9.,  9.,  9.,  9., 10.,  9.,  8.,  8.,
       10.,  8.,  9., 10.,  9.,  9.,  8.,  9., 11.,  9.,  9., 10.,  8.,
        9., 10.,  9.,  9.,  8.,  9.,  9.,  8.,  8.,  9.,  8.,  8.,  8.,
       10.,  8.,  8.,  9.,  8.,  8., 10.,  8.,  9.,  8.,  9.,  9.,  9.,
        8.,  9.,  8.,  9.,  9.,  8.,  8.,  8.,  8., 10.,  8., 10.,  9.,
        8.,  8., 10.,  8.,  9.,  8.,  8., 10., 10.,  9.,  7.,  9.,  9.,
        9.,  9.,  9.,  8.,  8.,  8.,  7.,  9.,  9., 10., 10.,  8.,  9.,
        9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9., 10.,  9.,  8.,  9.,
        9.,  9.,  9.,  9.,  9.,  8.,  9.,  9., 10.,  8.,  8.,  9.,  8.,
       10., 10.,  9.,  9.,  9., 11., 10., 10., 10.,  8., 10., 11., 10.,
        9.,  8.,  9.,  8.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9., 10.,
        8.,  8., 10.,  8.,  9.,  9.,  8.,  8.,  9.,  8.,  9.,  9., 10.,
        8., 10.,  8.,  8., 10., 10.,  9.,  9.,  7.,  9.,  9., 10., 10.,
        9.,  8.,  9.,  9., 11.,  8.,  9.,  9.,  8., 10., 10., 10.,  9.,
        8., 10., 10.,  9., 10., 10.])

对比成功率

correct = np.mean(preds == y_test.values)
correct
0.0

结果好像成功率为0,然后发现这里都是类型不对
首先重新取目标集,不需要偏移,然后将字符串数据转为整型

y_test = sp20['蓝球'][-500:]
arr = y_test.values
arr = arr.astype(int)

结果集也重新转换整型,然后再比较

preds = preds.astype(int)
correct = np.mean(preds == arr)
correct
0.056

会发现成功率其实是0.056

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值