import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_train=pd.read_csv('kaggle_bike_competition_train.csv' ,header = 0 )
df_train.head(10 )
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
datetime
season
holiday
workingday
weather
temp
atemp
humidity
windspeed
casual
registered
count
0
2011-01-01 00:00:00
1
0
0
1
9.84
14.395
81
0.0000
3
13
16
1
2011-01-01 01:00:00
1
0
0
1
9.02
13.635
80
0.0000
8
32
40
2
2011-01-01 02:00:00
1
0
0
1
9.02
13.635
80
0.0000
5
27
32
3
2011-01-01 03:00:00
1
0
0
1
9.84
14.395
75
0.0000
3
10
13
4
2011-01-01 04:00:00
1
0
0
1
9.84
14.395
75
0.0000
0
1
1
5
2011-01-01 05:00:00
1
0
0
2
9.84
12.880
75
6.0032
0
1
1
6
2011-01-01 06:00:00
1
0
0
1
9.02
13.635
80
0.0000
2
0
2
7
2011-01-01 07:00:00
1
0
0
1
8.20
12.880
86
0.0000
1
2
3
8
2011-01-01 08:00:00
1
0
0
1
9.84
14.395
75
0.0000
1
7
8
9
2011-01-01 09:00:00
1
0
0
1
13.12
17.425
76
0.0000
8
6
14
字段的名字和类型
df_train.dtypes
datetime object season int64 holiday int64 workingday int64 weather int64 temp float64 atemp float64 humidity int64 windspeed float64 casual int64 registered int64 count int64 dtype: object
df_train.shape
(10886, 12) 看看有没有缺省的字段,好吧,其实那个字段的数据都不缺!
df_train.count()
datetime 10886 season 10886 holiday 10886 workingday 10886 weather 10886 temp 10886 atemp 10886 humidity 10886 windspeed 10886 casual 10886 registered 10886 count 10886 dtype: int64
type(df_train.datetime)
pandas.core.series.Series
df_train['month' ]=pd.DatetimeIndex(df_train.datetime).month
df_train['dayofweek' ]=pd.DatetimeIndex(df_train.datetime).dayofweek
df_train['hour' ]=pd.DatetimeIndex(df_train.datetime).hour
df_train.head(20 )
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }