Public bicycle usage forecast

最新推荐文章于 2024-10-16 19:12:10 发布

焉焉~�

最新推荐文章于 2024-10-16 19:12:10 发布

阅读量203

点赞数

文章标签：机器学习回归

本文链接：https://blog.csdn.net/weixin_43961592/article/details/84874222

版权

Public bicycle usage forecast

1. Introduction

Public bicycles are low-carbon, environmentally friendly, healthy, and it solve the “last mile” problem in transportation, and it is becoming more and more popular in cities across the country. The data for this project was taken from several public bicycle parking lots on a street in two cities. We hope to predict the number of public bicycles borrowed in the neighborhood within one hour based on time, weather and other information.

Data source:
http://sofasofa.io/competition.php?id=1#c1

# Import packages
import numpy as np
import pandas as pd
import math
from scipy.stats import norm, skew
from scipy import stats

# Modelling Algorithms :

# Classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier , GradientBoostingClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis , QuadraticDiscriminantAnalysis
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

# Regression
from sklearn.linear_model import LinearRegression,Ridge,Lasso,RidgeCV, ElasticNet
from sklearn.ensemble import RandomForestRegressor,BaggingRegressor,GradientBoostingRegressor,AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVC
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor

# Modelling Helpers :
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn.preprocessing import Imputer , Normalizer , scale
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV , KFold , cross_val_score

#preprocessing :
from sklearn.preprocessing import MinMaxScaler , StandardScaler, Imputer, LabelEncoder



#evaluation metrics :

# Regression
from sklearn.metrics import mean_squared_log_error,mean_squared_error, r2_score,mean_absolute_error 

# Classification
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score  
from sklearn.metrics import confusion_matrix, classification_report


# Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import seaborn as sns

# Configure visualisations
%matplotlib inline
mpl.style.use( 'ggplot' )
plt.style.use('fivethirtyeight')
sns.set(context="notebook", palette="dark", style = 'whitegrid' , color_codes=True)
params = { 
    'axes.labelsize': "large",
    'xtick.labelsize': 'x-large',
    'legend.fontsize': 20,
    'figure.dpi': 150,
    'figure.figsize': [25, 7]
}
plt.rcParams.update(params)

2. Explore the dataset

# Import data
df = pd.read_csv("train.csv")

# How the data looks
df.head()

	id	city	hour	is_workday	weather	temp_1	temp_2	wind	y
0	1	0	22	1	2	3.0	0.7	0	15
1	2	0	10	1	1	21.0	24.9	3	48
2	3	0	0	1	1	25.3	27.4	0	21
3	4	0	7	0	1	15.7	16.2	0	11
4	5	1	10	1	1	21.1	25.0	2	39

The dataset has 9 variables:
- Id
- y: the number of public bicycle borrowed in the neighborhood within one hour
- city: there are two cities in the dataset
- hour: local time in hour, 24-hour timing
- is_workday: 1 indicaite is workday, 0 indicate is hoilday or weekend
- temp_1: local temperature in Celsius degree
- temp_2: Somatosensory temperature in Celsius degree
- Weather: 1 indicate sunny, 2 indicate partly cloudy, 3 indicate mild precipitation weather, 4 indicate heavy precipitation weather
- Wind: wind speed, the larger the value, the greater the wind speed.

3. Visualization of the data

# Correlation of entire dataset
corr = df.corr()
sns.heatmap(data=corr, square=True , annot=True, cbar=True)

<matplotlib.axes._subplots.AxesSubplot at 0x214682d6860>

png

# Histogram of variable y
plt.hist('y' , data=df , bins=25)

(array([2372.,  996.,  872.,  857.,  747.,  753.,  651.,  467.,  376.,
         343.,  260.,  257.,  245.,  172.,  141.,  117.,   99.,   94.,
          52.,   33.,   43.,   28.,   16.,    4.,    5.]),
 array([  0.  ,   9.96,  19.92,  29.88,  39.84,  49.8 ,  59.76,  69.72,
         79.68,  89.64,  99.6 , 109.56, 119.52, 129.48, 139.44, 149.4 ,
        159.36, 169.32, 179.28, 189.24, 199.2 , 209.16, 219.12, 229.08,
        239.04, 249.  ]),
 <a list of 25 Patch objects>)

png

We can see from the graph, variable y extremely skew to right. We need to do log transform before apply it in the model.

# Histogram of variable temp_1
plt.hist('temp_1' , data=df , bins=25)

(array([ 16.,  18.,  66., 181., 300., 486., 531., 678., 754., 536., 681.,
        487., 577., 620., 515., 801., 705., 668., 581., 276., 262., 138.,
         83.,  33.,   7.]),
 array([-7.6  , -5.752, -3.904, -2.056, -0.208,  1.64 ,  3.488,  5.336,
         7.184,  9.032, 10.88 , 12.728, 14.576, 16.424, 18.272, 20.12 ,
        21.968, 23.816, 25.664, 27.512, 29.36 , 31.208, 33.056, 34.904,
        36.752, 38.6  ]),
 <a list of 25 Patch objects>)

png

temp_1 is appromixtaly normal distributed, and it is numeric variable. I decide to leave it there in the dataset.

# Boxplot between y and city
sns.factorplot(x="city",y="y",data=df,kind="box",aspect=1)

<seaborn.axisgrid.FacetGrid at 0x214632625f8>

png

We can see the y variable is lightly higher in city 0.

# Boxplot between y and hours
sns.factorplot(x="hour",y="y",data=df,kind="box",aspect=2.5)

<seaborn.axisgrid.FacetGrid at 0x2146b42f9e8>

png

We can see the borrow amount is concentrated in 7, 8, 9, 17, 18, 19 o’clock. It is exactly the commute time during a day.

The borrow amount is very low in 0,1,2,3,4,5 o’clock.

The borrow amount is decreasing from 17 to 23 o’clock.

Since the hour is categorical variable, we need change it into dummy variables before apply it in the model.

# Boxplot between y and is_workday
sns.factorplot(x="is_workday",y="y",data=df,kind="box",aspect=1)

<seaborn.axisgrid.FacetGrid at 0x2146b3f5400>

png

There is no big difference of borrow amount whether the day is workday.

# Boxplot between y and is_workday
sns.factorplot(x="weather",y="y",data=df,kind="box",aspect=1.5)

<seaborn.axisgrid.FacetGrid at 0x21477f30f98>

png

The borrow amount is larger when the weather is sunny and cloudy.

The borrow amount is very low when raining.

Since the weather is categorical variable, we need to change it into dummy variable.

sns.factorplot(x="wind",y="y",data=df,kind="box",aspect=2.5)

<seaborn.axisgrid.FacetGrid at 0x21477f30fd0>

png

4. Data Pre-processing

a. Drop the “id” column as we already have index

df.drop(["id"],axis=1,inplace=True)
df.head()

	city	hour	is_workday	weather	temp_1	temp_2	wind	y
0	0	22	1	2	3.0	0.7	0	15
1	0	10	1	1	21.0	24.9	3	48
2	0	0	1	1	25.3	27.4	0	21
3	0	7	0	1	15.7	16.2	0	11
4	1	10	1	1	21.1	25.0	2	39

b. log transform in y

df["y"] = np.log(df["y"]+1)

c. Exclude y before doing dummy process

X = df.drop("y", 1)
y = df.y

X.head()

	city	hour	is_workday	weather	temp_1	temp_2	wind
0	0	22	1	2	3.0	0.7	0
1	0	10	1	1	21.0	24.9	3
2	0	0	1	1	25.3	27.4	0
3	0	7	0	1	15.7	16.2	0
4	1	10	1	1	21.1	25.0	2

d. change categorical data into dummy variables

todummy_list = ["hour","weather"]

def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df

X = dummy_df(X, todummy_list)
print(X.head(5))

   city  is_workday  temp_1  temp_2  wind  hour_0  hour_1  hour_2  hour_3  \
0     0           1     3.0     0.7     0       0       0       0       0   
1     0           1    21.0    24.9     3       0       0       0       0   
2     0           1    25.3    27.4     0       1       0       0       0   
3     0           0    15.7    16.2     0       0       0       0       0   
4     1           1    21.1    25.0     2       0       0       0       0   

   hour_4    ...      hour_18  hour_19  hour_20  hour_21  hour_22  hour_23  \
0       0    ...            0        0        0        0        1        0   
1       0    ...            0        0        0        0        0        0   
2       0    ...            0        0        0        0        0        0   
3       0    ...            0        0        0        0        0        0   
4       0    ...            0        0        0        0        0        0   

   weather_1  weather_2  weather_3  weather_4  
0          0          1          0          0  
1          1          0          0          0  
2          1          0          0          0  
3          1          0          0          0  
4          1          0          0          0  

[5 rows x 33 columns]

X.head()

	city	is_workday	temp_1	temp_2	wind	hour_0	...	hour_22	weather_1	weather_2
0	0	1	3.0	0.7	0	0	...	1	0	1
1	0	1	21.0	24.9	3	0	...	0	1	0
2	0	1	25.3	27.4	0	1	...	0	1	0
3	0	0	15.7	16.2	0	0	...	0	1	0
4	1	1	21.1	25.0	2	0	...	0	1	0

5 rows × 33 columns

5. Model Contribution

I run eight models in this part and collect the R2 score at the bottom

a. Split the dataset into two parts, training has 80% of the entire dataset, and testing have the rest.


X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=66)

b. Scale the data using StandardSacler() function

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

c. Model contribution

# Collect all R2 Scores.
R2_Scores = []
models = ['Linear Regression' , 'knn' , 'AdaBoost Regression' ,'GradientBoosting Regression',
          'RandomForest Regression' ,"SVM", "Neural Network", "Stacked model"]

# Linear Regression
clf_lr = LinearRegression()
clf_lr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_lr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_lr.predict(X_test)
print('')
print('####### Linear Regression #######')
print('Score : %.4f' % clf_lr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

####### Linear Regression #######
Score : 0.8011
[ 7.78652865e-01 -2.88601860e+25  7.96945499e-01  8.06271181e-01
  8.09304528e-01  7.78931963e-01  7.92729190e-01  8.13661958e-01
  8.10548852e-01  8.13604200e-01]

MSE    : 0.39 
MAE    : 0.47 
RMSE   : 0.62 
R2     : 0.80 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    0.2s finished

# k nearest neighbor
clf_knn = KNeighborsRegressor()
clf_knn.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_knn, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_knn.predict(X_test)
print('')
print('###### KNeighbours Regression ######')
print('Score : %.4f' % clf_knn.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    5.9s finished



###### KNeighbours Regression ######
Score : 0.9050
[0.8890756  0.89151628 0.90407412 0.8992113  0.91491348 0.90014117
 0.90287772 0.91229941 0.91322939 0.9018737 ]

MSE    : 0.18 
MAE    : 0.30 
RMSE   : 0.43 
R2     : 0.91

# AdaBoost
clf_ada = AdaBoostRegressor(n_estimators=1000)
clf_ada.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_ada, X = X_train, y = y_train, cv = 5,verbose = 1)
y_pred = clf_ada.predict(X_test)
print('')
print('###### AdaBoost Regression ######')
print('Score : %.4f' % clf_ada.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

###### AdaBoost Regression ######
Score : 0.4474
[0.40772985 0.45865976 0.46406953 0.42710509 0.46842508]

MSE    : 1.07 
MAE    : 0.88 
RMSE   : 1.04 
R2     : 0.45 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.6s finished

# GradientBoosting
clf_gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1,max_depth=1, random_state=0, loss='ls',verbose = 1)
clf_gbr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_gbr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_gbr.predict(X_test)
print('')
print('###### Gradient Boosting Regression #######')
print('Score : %.4f' % clf_gbr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

      Iter       Train Loss   Remaining Time 
         1           1.9045            0.30s
         2           1.8553            0.31s
         3           1.8115            0.32s
         4           1.7711            0.28s
         5           1.7314            0.32s
         6           1.6960            0.33s
         7           1.6617            0.33s
         8           1.6290            0.31s
         9           1.5976            0.32s
        10           1.5686            0.33s
        20           1.3236            0.26s
        30           1.1465            0.22s
        40           1.0164            0.19s
        50           0.9181            0.17s
        60           0.8398            0.13s
        70           0.7762            0.10s
        80           0.7235            0.07s
        90           0.6791            0.03s
       100           0.6417            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9166            0.36s
         2           1.8676            0.41s
         3           1.8230            0.43s
         4           1.7820            0.40s
         5           1.7425            0.41s
         6           1.7065            0.40s
         7           1.6707            0.41s
         8           1.6379            0.40s
         9           1.6057            0.40s
        10           1.5761            0.39s
        20           1.3301            0.31s
        30           1.1513            0.27s
        40           1.0197            0.22s
        50           0.9207            0.17s
        60           0.8421            0.15s
        70           0.7777            0.10s
        80           0.7242            0.07s
        90           0.6793            0.03s
       100           0.6412            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9059            0.65s
         2           1.8553            0.54s
         3           1.8112            0.44s
         4           1.7699            0.42s
         5           1.7300            0.45s
         6           1.6943            0.42s
         7           1.6607            0.41s
         8           1.6271            0.42s
         9           1.5954            0.38s
        10           1.5661            0.40s
        20           1.3203            0.33s
        30           1.1436            0.30s
        40           1.0134            0.25s
        50           0.9148            0.20s
        60           0.8368            0.17s
        70           0.7734            0.14s
        80           0.7210            0.10s
        90           0.6770            0.05s
       100           0.6396            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9075            0.00s
         2           1.8603            0.85s
         3           1.8150            0.59s
         4           1.7747            0.44s
         5           1.7365            0.35s
         6           1.6999            0.54s
         7           1.6652            0.52s
         8           1.6336            0.48s
         9           1.6024            0.47s
        10           1.5724            0.46s
        20           1.3260            0.32s
        30           1.1478            0.27s
        40           1.0176            0.24s
        50           0.9190            0.21s
        60           0.8413            0.18s
        70           0.7779            0.13s
        80           0.7249            0.08s
        90           0.6804            0.04s
       100           0.6429            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8941            0.40s
         2           1.8454            0.44s
         3           1.8007            0.37s
         4           1.7606            0.40s
         5           1.7212            0.34s
         6           1.6852            0.39s
         7           1.6524            0.37s
         8           1.6199            0.38s
         9           1.5889            0.34s
        10           1.5596            0.37s
        20           1.3160            0.33s
        30           1.1411            0.27s
        40           1.0122            0.23s
        50           0.9149            0.19s
        60           0.8372            0.15s
        70           0.7735            0.11s
        80           0.7208            0.08s
        90           0.6766            0.04s
       100           0.6395            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8856            0.39s
         2           1.8365            0.34s
         3           1.7927            0.39s
         4           1.7526            0.40s
         5           1.7133            0.35s
         6           1.6780            0.38s
         7           1.6446            0.40s
         8           1.6119            0.40s
         9           1.5810            0.38s
        10           1.5523            0.37s
        20           1.3117            0.28s
        30           1.1371            0.21s
        40           1.0083            0.17s
        50           0.9111            0.14s
        60           0.8336            0.11s
        70           0.7705            0.09s
        80           0.7186            0.06s
        90           0.6745            0.03s
       100           0.6377            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9095            0.30s
         2           1.8598            0.33s
         3           1.8150            0.36s
         4           1.7744            0.38s
         5           1.7355            0.32s
         6           1.6993            0.38s
         7           1.6644            0.39s
         8           1.6314            0.35s
         9           1.6004            0.35s
        10           1.5707            0.36s
        20           1.3246            0.31s
        30           1.1472            0.26s
        40           1.0163            0.22s
        50           0.9180            0.19s
        60           0.8397            0.15s
        70           0.7761            0.11s
        80           0.7228            0.07s
        90           0.6779            0.03s
       100           0.6402            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9216            0.38s
         2           1.8713            0.34s
         3           1.8284            0.38s
         4           1.7869            0.41s
         5           1.7464            0.42s
         6           1.7118            0.41s
         7           1.6770            0.38s
         8           1.6435            0.40s
         9           1.6110            0.36s
        10           1.5815            0.39s
        20           1.3336            0.31s
        30           1.1549            0.27s
        40           1.0228            0.23s
        50           0.9233            0.19s
        60           0.8438            0.15s
        70           0.7793            0.12s
        80           0.7261            0.08s
        90           0.6813            0.04s
       100           0.6433            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9006            0.36s
         2           1.8519            0.38s
         3           1.8091            0.35s
         4           1.7676            0.40s
         5           1.7284            0.40s
         6           1.6938            0.34s
         7           1.6590            0.41s
         8           1.6266            0.40s
         9           1.5946            0.40s
        10           1.5657            0.40s
        20           1.3212            0.34s
        30           1.1450            0.30s
        40           1.0163            0.26s
        50           0.9182            0.21s
        60           0.8404            0.16s
        70           0.7771            0.12s
        80           0.7250            0.08s
        90           0.6809            0.04s
       100           0.6440            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.8992            0.40s
         2           1.8493            0.39s
         3           1.8064            0.43s
         4           1.7656            0.41s
         5           1.7262            0.35s
         6           1.6915            0.35s
         7           1.6583            0.37s
         8           1.6252            0.33s
         9           1.5943            0.36s
        10           1.5656            0.35s
        20           1.3231            0.29s
        30           1.1477            0.25s
        40           1.0175            0.21s
        50           0.9197            0.17s
        60           0.8415            0.14s
        70           0.7776            0.10s
        80           0.7246            0.07s
        90           0.6799            0.03s
       100           0.6427            0.00s
      Iter       Train Loss   Remaining Time 
         1           1.9038            0.20s
         2           1.8552            0.35s
         3           1.8129            0.27s
         4           1.7728            0.38s
         5           1.7335            0.38s
         6           1.6985            0.37s
         7           1.6641            0.35s
         8           1.6317            0.35s
         9           1.6003            0.35s
        10           1.5714            0.36s
        20           1.3256            0.30s
        30           1.1480            0.25s
        40           1.0175            0.21s
        50           0.9192            0.18s
        60           0.8410            0.14s
        70           0.7771            0.11s
        80           0.7248            0.07s
        90           0.6806            0.04s
       100           0.6434            0.00s

###### Gradient Boosting Regression #######
Score : 0.6816
[0.65609682 0.66126518 0.67487789 0.67301113 0.66640737 0.65136067
 0.65592692 0.69187255 0.67727584 0.68215423]

MSE    : 0.62 
MAE    : 0.63 
RMSE   : 0.79 
R2     : 0.68 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    3.9s finished

# Random Forest
clf_rf = RandomForestRegressor()
clf_rf.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_rf, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_rf.predict(X_test)
print('')
print('###### Random Forest ######')
print('Score : %.4f' % clf_rf.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

###### Random Forest ######
Score : 0.9017
[0.89311041 0.88891544 0.90256345 0.89246356 0.90829249 0.89037486
 0.89787445 0.91075113 0.90245791 0.91223694]

MSE    : 0.19 
MAE    : 0.31 
RMSE   : 0.44 
R2     : 0.90 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    6.3s finished

# SVM
clf_svr = SVR()
clf_svr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_svr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_svr.predict(X_test)
print('')
print('###### SVM ######')
print('Score : %.4f' % clf_svr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.0min finished



###### SVM ######
Score : 0.9209
[0.91375922 0.91446739 0.91658973 0.9213643  0.93426432 0.92555275
 0.91839594 0.93011461 0.92602424 0.92820973]

MSE    : 0.15 
MAE    : 0.27 
RMSE   : 0.39 
R2     : 0.92

# Neural Network
clf_nn = MLPRegressor()
clf_nn.fit(X_train , y_train)
accuracies = cross_val_score(estimator = clf_nn, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = clf_nn.predict(X_test)
print('')
print('###### Neural Network ######')
print('Score : %.4f' % clf_nn.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

###### Neural Network ######
Score : 0.9219
[0.91211438 0.89218197 0.91710665 0.92285755 0.93468093 0.92464651
 0.91431442 0.92979532 0.92579826 0.93059589]

MSE    : 0.15 
MAE    : 0.27 
RMSE   : 0.39 
R2     : 0.92 


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:   23.3s finished

# Stacked model
from mlxtend.regressor import StackingRegressor
stregr = StackingRegressor(regressors=[clf_nn], 
                           meta_regressor=clf_svr)
stregr.fit(X_train , y_train)
accuracies = cross_val_score(estimator = stregr, X = X_train, y = y_train, cv = 10,verbose = 1)
y_pred = stregr.predict(X_test)
print('')
print('###### Stacked Model ######')
print('Score : %.4f' % stregr.score(X_test, y_test))
print(accuracies)

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred)**0.5
r2 = r2_score(y_test, y_pred)

print('')
print('MSE    : %0.2f ' % mse)
print('MAE    : %0.2f ' % mae)
print('RMSE   : %0.2f ' % rmse)
print('R2     : %0.2f ' % r2)

R2_Scores.append(r2)

[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:   55.7s finished



###### Stacked Model ######
Score : 0.9168
[0.91291617 0.91417198 0.91740103 0.9216346  0.9348423  0.92472979
 0.91767926 0.93272922 0.92369016 0.92903444]

MSE    : 0.16 
MAE    : 0.27 
RMSE   : 0.40 
R2     : 0.92

compare = pd.DataFrame({'Algorithms' : models , 'R2-Scores' : R2_Scores})
compare.sort_values(by='R2-Scores' ,ascending=False)

	Algorithms	R2-Scores
6	Neural Network	0.921914
5	SVM	0.920919
7	Stacked model	0.916799
1	knn	0.905014
4	RandomForest Regression	0.901718
0	Linear Regression	0.801112
3	GradientBoosting Regression	0.681629
2	AdaBoost Regression	0.447422

6. Feature importance

# Feature importance based on rf
feature_labels = np.array(['city','is_workday','temp_1','temp_2','wind','hour_0','hour_1',"hour_2","hour_3",
                           "hour_4","hour_5","hour_6","hour_7","hour_8","hour_9","hour_10","hour_11","hour_12",
                           "hour_13","hour_14","hour_15","hour_16","hour_17","hour_18","hour_19","hour_20",
                           "hour_21","hour_22","hour_23","weather_1","weather_2","weather_3","weather_4"])
importance = clf_rf.feature_importances_
feature_indexes_by_importance = importance.argsort()
for index in feature_indexes_by_importance:
    print('{}  -  {:.2f}%'.format(feature_labels[index], (importance[index] *100.0)))

weather_4  -  0.00%
hour_12  -  0.07%
hour_15  -  0.07%
hour_13  -  0.07%
hour_14  -  0.09%
hour_11  -  0.13%
hour_16  -  0.19%
weather_2  -  0.24%
hour_20  -  0.25%
hour_10  -  0.26%
weather_1  -  0.31%
hour_21  -  0.32%
hour_9  -  0.32%
hour_19  -  0.33%
hour_22  -  0.68%
hour_18  -  0.84%
hour_17  -  0.86%
wind  -  1.01%
hour_8  -  1.15%
city  -  1.19%
hour_7  -  1.33%
hour_23  -  1.65%
weather_3  -  1.76%
hour_6  -  3.28%
hour_0  -  4.16%
temp_2  -  4.77%
is_workday  -  7.65%
hour_1  -  8.02%
temp_1  -  9.47%
hour_5  -  10.00%
hour_2  -  10.80%
hour_3  -  14.25%
hour_4  -  14.50%