Public bicycle usage forecast

Public bicycle usage forecast

1. Introduction

Public bicycles are low-carbon, environmentally friendly, healthy, and it solve the “last mile” problem in transportation, and it is becoming more and more popular in cities across the country. The data for this project was taken from several public bicycle parking lots on a street in two cities. We hope to predict the number of public bicycles borrowed in the neighborhood within one hour based on time, weather and other information.

Data source:
http://sofasofa.io/competition.php?id=1#c1

# Import packages
import numpy as np
import pandas as pd
import math
from scipy.stats import norm, skew
from scipy import stats

# Modelling Algorithms :

# Classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier , GradientBoostingClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis , QuadraticDiscriminantAnalysis
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

# Regression
from sklearn.linear_model import LinearRegression,Ridge,Lasso,RidgeCV, ElasticNet
from sklearn.ensemble import RandomForestRegressor,BaggingRegressor,GradientBoostingRegressor,AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVC
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor

# Modelling Helpers :
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn.preprocessing import Imputer , Normalizer , scale
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV , KFold , cross_val_score

#preprocessing :
from sklearn.preprocessing import MinMaxScaler , StandardScaler, Imputer, LabelEncoder



#evaluation metrics :

# Regression
from sklearn.metrics import mean_squared_log_error,mean_squared_error, r2_score,mean_absolute_error 

# Classification
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score  
from sklearn.metrics import confusion_matrix, classification_report


# Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import seaborn as sns

# Configure visualisations
%matplotlib inline
mpl.style.use( 'ggplot' )
plt.style.use('fivethirtyeight')
sns.set(context="notebook", palette="dark", style = 'whitegrid' , color_codes=True)
params = { 
    'axes.labelsize': "large",
    'xtick.labelsize': 'x-large',
    'legend.fontsize': 20,
    'figure.dpi': 150,
    'figure.figsize': [25, 7]
}
plt.rcParams.update(params)

2. Explore the dataset

# Import data
df = pd.read_csv("train.csv")
# How the data looks
df.head()
idcityhouris_workdayweathertemp_1temp_2windy
01022123.00.7015
120101121.024.9348
23001125.327.4021
34070115.716.2011
451101121.125.0239

The dataset has 9 variables:
- Id
- y: the number of public bicycle borrowed in the neighborhood within one hour
- city: there are two cities in the dataset
- hour: local time in hour, 24-hour timing
- is_workday: 1 indicaite is workday, 0 indicate is hoilday or weekend
- temp_1: local temperature in Celsius degree
- temp_2: Somatosensory temperature in Celsius degree
- Weather: 1 indicate sunny, 2 indicate partly cloudy, 3 indicate mild precipitation weather, 4 indicate heavy precipitation weather
- Wind: wind speed, the larger the value, the greater the wind speed.

3. Visualization of the data

# Correlation of entire dataset
corr = df.corr()
sns.heatmap(data=corr, square=True , annot=True, cbar=True)
<matplotlib.axes._subplots.AxesSubplot at 0x214682d6860>

png

# Histogram of variable y
plt.hist('y' , data=df , bins=25)
(array([2372.,  996.,  872.,  857.,  747.,  753.,  651.,  467.,  376.,
         343.,  260.,  257.,  245.,  172.,  141.,  117.,   99.,   94.,
          52.,   33.,   43.,   28.,   16.,    4.,    5.]),
 array([  0.  ,   9.96,  19.92,  29.88,  39.84,  49.8 ,  59.76,  69.72,
         79.68,  89.64,  99.6 , 109.56, 119.52, 129.48, 139.44, 149.4 ,
        159.36, 169.32, 179.28, 189.24, 199.2 , 209.16, 219.12, 229.08,
        239.04, 249.  ]),
 <a list of 25 Patch objects>)

png

We can see from the graph, variable y extremely skew to right. We need to do log transform before apply it in the model.

# Histogram of variable temp_1
plt.hist('temp_1' , data=df , bins=25)
(array([ 16.,  18.,  66., 181., 300., 486., 531., 678., 754., 536., 681.,
        487., 577., 620., 515., 801., 705., 668., 581., 276., 262., 138.,
         83.,  33.,   7.]),
 array([-7.6  , -5.752, -3.904, -2.056, -0.208,  1.64 ,  3.488,  5.336,
         7.184,  9.032, 10.88 , 12.728, 14.576, 16.424, 18.272, 20.12 ,
        21.968, 23.816, 25.664, 27.512, 29.36 , 31.208, 33.056, 34.904,
        36.752, 38.6  ]),
 <a list of 25 Patch objects>)

png

temp_1 is appromixtaly normal distributed, and it is numeric variable. I decide to leave it there in the dataset.

# Boxplot between y and city
sns.factorplot(x="city",y="y",data=df,kind="box",aspect=1)
<seaborn.axisgrid.FacetGrid at 0x214632625f8>

png

We can see the y variable is lightly higher in city 0.

# Boxplot between y and hours
sns.factorplot(x="hour",y="y",data=df,kind="box",aspect=2.5)
<seaborn.axisgrid.FacetGrid at 0x2146b42f9e8>

png

We can see the borrow amount is concentrated in 7, 8, 9, 17, 18, 19 o’clock. It is exactly the commute time during a day.

The borrow amount is very low in 0,1,2,3,4,5 o’clock.

The borrow amount is decreasing from 17 to 23 o’clock.

Since the hour is categorical variable, we need change it into dummy variables before apply it in the model.

# Boxplot between y and is_workday
sns.factorplot(x="is_workday",y="y",data=df,kind="box",aspect=1)
<seaborn.axisgrid.FacetGrid at 0x2146b3f5400>

png

There is no big difference of borrow amount whether the day is workday.

# Boxplot between y and is_workday
sns.factorplot(x="weather",y="y",data=df,kind="box",aspect=1.5)
<seaborn.axisgrid.FacetGrid at 0x21477f30f98>

png

The borrow amount is larger when the weather is sunny and cloudy.

The borrow amount is very low when raining.

Since the weather is categorical variable, we need to change it into dummy variable.

sns.factorplot(x="wind",y="y",data=df,kind="box",aspect=2.5)
<seaborn.axisgrid.FacetGrid at 0x21477f30fd0>

png

4. Data Pre-processing

a. Drop the “id” column as we already have index
df.drop(["id"],axis=1,inplace=True)
df.head()
cityhouris_workdayweathertemp_1temp_2windy
0022123.00.7015
10101121.024.9348
2001125.327.4021
3070115.716.2011
41101121.125.0239
b. log transform in y
df["y"] = np.log(df["y"]+1)
c. Exclude y before doing dummy process
X = df.drop("y", 1)
y = df.y
X.head()
cityhouris_workdayweathertemp_1temp_2wind
0022123.00.70
10101121.024.93
2001125.327.40
3070115.716.20
41101121.125.02
d. change categorical data into dummy variables
todummy_list = ["hour","weather"]
def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df
X = dummy_df(X, todummy_list)
print(X.head(5))
   city  is_workday  temp_1  temp_2  wind  hour_0  hour_1  hour_2  hour_3  \
0     0           1     3.0     0.7     0       0       0       0       0   
1     0           1    21.0    24.9     3       0       0       0       0   
2     0           1    25.3    27.4     0       1       0       0       0   
3     0           0    15.7    16.2     0       0       0       0       0   
4     1           1    21.1    25.0     2       0       0       0       0   

   hour_4    ...      hour_18  hour_19  hour_20  hour_21  hour_22  hour_23  \
0       0    ...            0        0        0        0        1        0   
1       0    ...            0        0        0        0        0        0   
2       0    ...            0        0        0        0        0        0   
3       0    ...            0        0        0        0        0        0   
4       0    ...            0        0        0        0        0        0   

   weather_1  weather_2  weather_3  weather_4  
0          0          1          0          0  
1          1          0          0          0  
2          1          0          0          0  
3          1          0          0          0  
4          1          0          0          0  

[5 rows x 33 columns]
X.head()
cityis_workdaytemp_1temp_2windhour_0hour_1hour_2hour_3hour_4...hour_18hour_19hour_20hour_21hour_22hour_23weather_1weather_2weather_3weather_4
0013.00.7000000...0000100100
10121.024.9300000...0000001000
20125.327.4010000...0000001000
30015.716.2000000...0000001000
41121.125.0200000...0000001000

5 rows × 33 columns

5. Model Contribution

I run eight models in this part and collect the R2 score at the bottom

a. Split the dataset into two parts, training has 80% of the entire dataset, and testing have the rest.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=66)
b. Scale the data using StandardSacler() function
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
c. Model contribution
# Collect all R2 Scores.
R2_Scores = []
models = ['Linear Regression' , 'knn' , 'AdaBoost Regression' ,'GradientBoosting Regression',
          'RandomForest Regression' ,"SVM", "Neural Network", "Stacked model"]
# Linear Regression
clf_lr = LinearRegression()
clf_lr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_lr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_lr.predict(X_test)
print('')
print('####### Linear Regression #######')
print('Score : %.4f' % clf_lr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
####### Linear Regression #######
Score : 0.8011
[ 7.78652865e-01 -2.88601860e+25  7.96945499e-01  8.06271181e-01
  8.09304528e-01  7.78931963e-01  7.92729190e-01  8.13661958e-01
  8.10548852e-01  8.13604200e-01]

MSE    : 0.39 
MAE    : 0.47 
RMSE   : 0.62 
R2     : 0.80 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    0.2s finished
# k nearest neighbor
clf_knn = KNeighborsRegressor()
clf_knn.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_knn, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_knn.predict(X_test)
print('')
print('###### KNeighbours Regression ######')
print('Score : %.4f' % clf_knn.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    5.9s finished



###### KNeighbours Regression ######
Score : 0.9050
[0.8890756  0.89151628 0.90407412 0.8992113  0.91491348 0.90014117
 0.90287772 0.91229941 0.91322939 0.9018737 ]

MSE    : 0.18 
MAE    : 0.30 
RMSE   : 0.43 
R2     : 0.91 
# AdaBoost
clf_ada = AdaBoostRegressor(n_estimators=1000)
clf_ada.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_ada, X = X_train, y = y_train, cv = 5,verbose = 1)
y_pred = clf_ada.predict(X_test)
print('')
print('###### AdaBoost Regression ######')
print('Score : %.4f' % clf_ada.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
###### AdaBoost Regression ######
Score : 0.4474
[0.40772985 0.45865976 0.46406953 0.42710509 0.46842508]

MSE    : 1.07 
MAE    : 0.88 
RMSE   : 1.04 
R2     : 0.45 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.6s finished
# GradientBoosting
clf_gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1,max_depth=1, random_state=0, loss='ls',verbose = 1)
clf_gbr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_gbr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_gbr.predict(X_test)
print('')
print('###### Gradient Boosting Regression #######')
print('Score : %.4f' % clf_gbr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
      Iter       Train Loss   Remaining Time 
         1           1.9045            0.30s
         2           1.8553            0.31s
         3           1.8115            0.32s
         4           1.7711            0.28s
         5           1.7314            0.32s
         6           1.6960            0.33s
         7           1.6617            0.33s
         8           1.6290            0.31s
         9           1.5976            0.32s
        10           1.5686            0.33s
        20           1.3236            0.26s
        30           1.1465            0.22s
        40           1.0164            0.19s
        50           0.9181            0.17s
        60           0.8398            0.13s
        70           0.7762            0.10s
        80           0.7235            0.07s
        90           0.6791            0.03s
       100           0.6417            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9166            0.36s
         2           1.8676            0.41s
         3           1.8230            0.43s
         4           1.7820            0.40s
         5           1.7425            0.41s
         6           1.7065            0.40s
         7           1.6707            0.41s
         8           1.6379            0.40s
         9           1.6057            0.40s
        10           1.5761            0.39s
        20           1.3301            0.31s
        30           1.1513            0.27s
        40           1.0197            0.22s
        50           0.9207            0.17s
        60           0.8421            0.15s
        70           0.7777            0.10s
        80           0.7242            0.07s
        90           0.6793            0.03s
       100           0.6412            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9059            0.65s
         2           1.8553            0.54s
         3           1.8112            0.44s
         4           1.7699            0.42s
         5           1.7300            0.45s
         6           1.6943            0.42s
         7           1.6607            0.41s
         8           1.6271            0.42s
         9           1.5954            0.38s
        10           1.5661            0.40s
        20           1.3203            0.33s
        30           1.1436            0.30s
        40           1.0134            0.25s
        50           0.9148            0.20s
        60           0.8368            0.17s
        70           0.7734            0.14s
        80           0.7210            0.10s
        90           0.6770            0.05s
       100           0.6396            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9075            0.00s
         2           1.8603            0.85s
         3           1.8150            0.59s
         4           1.7747            0.44s
         5           1.7365            0.35s
         6           1.6999            0.54s
         7           1.6652            0.52s
         8           1.6336            0.48s
         9           1.6024            0.47s
        10           1.5724            0.46s
        20           1.3260            0.32s
        30           1.1478            0.27s
        40           1.0176            0.24s
        50           0.9190            0.21s
        60           0.8413            0.18s
        70           0.7779            0.13s
        80           0.7249            0.08s
        90           0.6804            0.04s
       100           0.6429            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8941            0.40s
         2           1.8454            0.44s
         3           1.8007            0.37s
         4           1.7606            0.40s
         5           1.7212            0.34s
         6           1.6852            0.39s
         7           1.6524            0.37s
         8           1.6199            0.38s
         9           1.5889            0.34s
        10           1.5596            0.37s
        20           1.3160            0.33s
        30           1.1411            0.27s
        40           1.0122            0.23s
        50           0.9149            0.19s
        60           0.8372            0.15s
        70           0.7735            0.11s
        80           0.7208            0.08s
        90           0.6766            0.04s
       100           0.6395            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8856            0.39s
         2           1.8365            0.34s
         3           1.7927            0.39s
         4           1.7526            0.40s
         5           1.7133            0.35s
         6           1.6780            0.38s
         7           1.6446            0.40s
         8           1.6119            0.40s
         9           1.5810            0.38s
        10           1.5523            0.37s
        20           1.3117            0.28s
        30           1.1371            0.21s
        40           1.0083            0.17s
        50           0.9111            0.14s
        60           0.8336            0.11s
        70           0.7705            0.09s
        80           0.7186            0.06s
        90           0.6745            0.03s
       100           0.6377            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9095            0.30s
         2           1.8598            0.33s
         3           1.8150            0.36s
         4           1.7744            0.38s
         5           1.7355            0.32s
         6           1.6993            0.38s
         7           1.6644            0.39s
         8           1.6314            0.35s
         9           1.6004            0.35s
        10           1.5707            0.36s
        20           1.3246            0.31s
        30           1.1472            0.26s
        40           1.0163            0.22s
        50           0.9180            0.19s
        60           0.8397            0.15s
        70           0.7761            0.11s
        80           0.7228            0.07s
        90           0.6779            0.03s
       100           0.6402            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9216            0.38s
         2           1.8713            0.34s
         3           1.8284            0.38s
         4           1.7869            0.41s
         5           1.7464            0.42s
         6           1.7118            0.41s
         7           1.6770            0.38s
         8           1.6435            0.40s
         9           1.6110            0.36s
        10           1.5815            0.39s
        20           1.3336            0.31s
        30           1.1549            0.27s
        40           1.0228            0.23s
        50           0.9233            0.19s
        60           0.8438            0.15s
        70           0.7793            0.12s
        80           0.7261            0.08s
        90           0.6813            0.04s
       100           0.6433            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9006            0.36s
         2           1.8519            0.38s
         3           1.8091            0.35s
         4           1.7676            0.40s
         5           1.7284            0.40s
         6           1.6938            0.34s
         7           1.6590            0.41s
         8           1.6266            0.40s
         9           1.5946            0.40s
        10           1.5657            0.40s
        20           1.3212            0.34s
        30           1.1450            0.30s
        40           1.0163            0.26s
        50           0.9182            0.21s
        60           0.8404            0.16s
        70           0.7771            0.12s
        80           0.7250            0.08s
        90           0.6809            0.04s
       100           0.6440            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8992            0.40s
         2           1.8493            0.39s
         3           1.8064            0.43s
         4           1.7656            0.41s
         5           1.7262            0.35s
         6           1.6915            0.35s
         7           1.6583            0.37s
         8           1.6252            0.33s
         9           1.5943            0.36s
        10           1.5656            0.35s
        20           1.3231            0.29s
        30           1.1477            0.25s
        40           1.0175            0.21s
        50           0.9197            0.17s
        60           0.8415            0.14s
        70           0.7776            0.10s
        80           0.7246            0.07s
        90           0.6799            0.03s
       100           0.6427            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9038            0.20s
         2           1.8552            0.35s
         3           1.8129            0.27s
         4           1.7728            0.38s
         5           1.7335            0.38s
         6           1.6985            0.37s
         7           1.6641            0.35s
         8           1.6317            0.35s
         9           1.6003            0.35s
        10           1.5714            0.36s
        20           1.3256            0.30s
        30           1.1480            0.25s
        40           1.0175            0.21s
        50           0.9192            0.18s
        60           0.8410            0.14s
        70           0.7771            0.11s
        80           0.7248            0.07s
        90           0.6806            0.04s
       100           0.6434            0.00s

###### Gradient Boosting Regression #######
Score : 0.6816
[0.65609682 0.66126518 0.67487789 0.67301113 0.66640737 0.65136067
 0.65592692 0.69187255 0.67727584 0.68215423]

MSE    : 0.62 
MAE    : 0.63 
RMSE   : 0.79 
R2     : 0.68 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    3.9s finished
# Random Forest
clf_rf = RandomForestRegressor()
clf_rf.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_rf, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_rf.predict(X_test)
print('')
print('###### Random Forest ######')
print('Score : %.4f' % clf_rf.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
###### Random Forest ######
Score : 0.9017
[0.89311041 0.88891544 0.90256345 0.89246356 0.90829249 0.89037486
 0.89787445 0.91075113 0.90245791 0.91223694]

MSE    : 0.19 
MAE    : 0.31 
RMSE   : 0.44 
R2     : 0.90 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    6.3s finished
# SVM
clf_svr = SVR()
clf_svr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_svr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_svr.predict(X_test)
print('')
print('###### SVM ######')
print('Score : %.4f' % clf_svr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.0min finished



###### SVM ######
Score : 0.9209
[0.91375922 0.91446739 0.91658973 0.9213643  0.93426432 0.92555275
 0.91839594 0.93011461 0.92602424 0.92820973]

MSE    : 0.15 
MAE    : 0.27 
RMSE   : 0.39 
R2     : 0.92 
# Neural Network
clf_nn = MLPRegressor()
clf_nn.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_nn, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_nn.predict(X_test)
print('')
print('###### Neural Network ######')
print('Score : %.4f' % clf_nn.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
###### Neural Network ######
Score : 0.9219
[0.91211438 0.89218197 0.91710665 0.92285755 0.93468093 0.92464651
 0.91431442 0.92979532 0.92579826 0.93059589]

MSE    : 0.15 
MAE    : 0.27 
RMSE   : 0.39 
R2     : 0.92 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:   23.3s finished
# Stacked model
from mlxtend.regressor import StackingRegressor
stregr = StackingRegressor(regressors=[clf_nn], 
                           meta_regressor=clf_svr)
stregr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = stregr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = stregr.predict(X_test)
print('')
print('###### Stacked Model ######')
print('Score : %.4f' % stregr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:   55.7s finished



###### Stacked Model ######
Score : 0.9168
[0.91291617 0.91417198 0.91740103 0.9216346  0.9348423  0.92472979
 0.91767926 0.93272922 0.92369016 0.92903444]

MSE    : 0.16 
MAE    : 0.27 
RMSE   : 0.40 
R2     : 0.92 
compare = pd.DataFrame({'Algorithms' : models , 'R2-Scores' : R2_Scores})
compare.sort_values(by='R2-Scores' ,ascending=False)
AlgorithmsR2-Scores
6Neural Network0.921914
5SVM0.920919
7Stacked model0.916799
1knn0.905014
4RandomForest Regression0.901718
0Linear Regression0.801112
3GradientBoosting Regression0.681629
2AdaBoost Regression0.447422

6. Feature importance

# Feature importance based on rf
feature_labels = np.array(['city','is_workday','temp_1','temp_2','wind','hour_0','hour_1',"hour_2","hour_3",
                           "hour_4","hour_5","hour_6","hour_7","hour_8","hour_9","hour_10","hour_11","hour_12",
                           "hour_13","hour_14","hour_15","hour_16","hour_17","hour_18","hour_19","hour_20",
                           "hour_21","hour_22","hour_23","weather_1","weather_2","weather_3","weather_4"])
importance = clf_rf.feature_importances_
feature_indexes_by_importance = importance.argsort()
for index in feature_indexes_by_importance:
    print('{}  -  {:.2f}%'.format(feature_labels[index], (importance[index] *100.0)))
weather_4  -  0.00%
hour_12  -  0.07%
hour_15  -  0.07%
hour_13  -  0.07%
hour_14  -  0.09%
hour_11  -  0.13%
hour_16  -  0.19%
weather_2  -  0.24%
hour_20  -  0.25%
hour_10  -  0.26%
weather_1  -  0.31%
hour_21  -  0.32%
hour_9  -  0.32%
hour_19  -  0.33%
hour_22  -  0.68%
hour_18  -  0.84%
hour_17  -  0.86%
wind  -  1.01%
hour_8  -  1.15%
city  -  1.19%
hour_7  -  1.33%
hour_23  -  1.65%
weather_3  -  1.76%
hour_6  -  3.28%
hour_0  -  4.16%
temp_2  -  4.77%
is_workday  -  7.65%
hour_1  -  8.02%
temp_1  -  9.47%
hour_5  -  10.00%
hour_2  -  10.80%
hour_3  -  14.25%
hour_4  -  14.50%
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值