共享单车--数据分析

最新推荐文章于 2024-05-14 18:08:00 发布

Up_梅子酒

最新推荐文章于 2024-05-14 18:08:00 发布

阅读量2k

点赞数 2

分类专栏： Data Analysis 文章标签： python

本文链接：https://blog.csdn.net/eerywh/article/details/114274001

版权

共享单车EDA与模型选择

import pandas as pd 
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from datetime import datetime
import os
import  warnings
warnings.filterwarnings(action = 'ignore')

Kaggle-competition-bike-sharing-demand

EDA

data items

Numerical type: (use directly)
- temp: actual temperature
- atemp: body temperature
- humidity: humidity
- windspeed: wind speed
- casual: the number of bikes rented by unregistered users
- registered: Number of registered users rented bikes
- count: total number of rental bikes
Time series:
datetime: Change to a single year, month, day, hour, and week
Categorized data: (create dummies )
- season: season. 1: Spring; 2: Summer; 3: Autumn; 4: Winter
- holiday: Whether it is a holiday. 0: No; 1: Yes
- workingday: Whether it is a working day. 0: No; 1: Yes
- weather: weather. 1: sunny; 2: cloudy; 3: light rain or snow; 4: severe weather

** 目标：通过将历史使用情况与天气数据相结合，预测华盛顿共享单车的租赁需求，从而预测自行车租赁需求

导入数据

# import data
 
df = pd.read_csv('train.csv')
print(df.head(),'\n','df.shape: {}'.format(df.shape))

              datetime  season  holiday  workingday  weather  temp   atemp  \
0  2011-01-01 00:00:00       1        0           0        1  9.84  14.395   
1  2011-01-01 01:00:00       1        0           0        1  9.02  13.635   
2  2011-01-01 02:00:00       1        0           0        1  9.02  13.635   
3  2011-01-01 03:00:00       1        0           0        1  9.84  14.395   
4  2011-01-01 04:00:00       1        0           0        1  9.84  14.395   

   humidity  windspeed  casual  registered  count  
0        81        0.0       3          13     16  
1        80        0.0       8          32     40  
2        80        0.0       5          27     32  
3        75        0.0       3          10     13  
4        75        0.0       0           1      1   
 df.shape: (10886, 12)

df.describe()

	season	holiday	workingday	weather	temp	atemp	humidity	windspeed	casual	registered	count
count	10886.000000	10886.000000	10886.000000	10886.000000	10886.00000	10886.000000	10886.000000	10886.000000	10886.000000	10886.000000	10886.000000
mean	2.506614	0.028569	0.680875	1.418427	20.23086	23.655084	61.886460	12.799395	36.021955	155.552177	191.574132
std	1.116174	0.166599	0.466159	0.633839	7.79159	8.474601	19.245033	8.164537	49.960477	151.039033	181.144454
min	1.000000	0.000000	0.000000	1.000000	0.82000	0.760000	0.000000	0.000000	0.000000	0.000000	1.000000
25%	2.000000	0.000000	0.000000	1.000000	13.94000	16.665000	47.000000	7.001500	4.000000	36.000000	42.000000
50%	3.000000	0.000000	1.000000	1.000000	20.50000	24.240000	62.000000	12.998000	17.000000	118.000000	145.000000
75%	4.000000	0.000000	1.000000	2.000000	26.24000	31.060000	77.000000	16.997900	49.000000	222.000000	284.000000
max	4.000000	1.000000	1.000000	4.000000	41.00000	45.455000	100.000000	56.996900	367.000000	886.000000	977.000000

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   datetime    10886 non-null  object 
 1   season      10886 non-null  int64  
 2   holiday     10886 non-null  int64  
 3   workingday  10886 non-null  int64  
 4   weather     10886 non-null  int64  
 5   temp        10886 non-null  float64
 6   atemp       10886 non-null  float64
 7   humidity    10886 non-null  int64  
 8   windspeed   10886 non-null  float64
 9   casual      10886 non-null  int64  
 10  registered  10886 non-null  int64  
 11  count       10886 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.7+ KB

缺失值分析

df.isnull().sum()

datetime      0
season        0
holiday       0
workingday    0
weather       0
temp          0
atemp         0
humidity      0
windspeed     0
casual        0
registered    0
count         0
dtype: int64

labels = df[:100]['datetime'].astype('str').
labels

  File "<ipython-input-6-ffccf8289615>", line 1
    labels = df[:100]['datetime'].astype('str').

最低0.47元/天解锁文章

Up_梅子酒

关注

2
点赞
踩
16

收藏

觉得还不错? 一键收藏
0
评论
共享单车--数据分析

共享单车EDA与模型选择import pandas as pd import numpy as npfrom matplotlib import pyplot as pltimport seaborn as snsfrom datetime import datetimeimport osimport warningswarnings.filterwarnings(action = 'ignore')Kaggle-competition-bike-sharing-demandEDA
复制链接

扫一扫

专栏目录