import pandas as pd
import numpy as np
利用cate_list实现统计
一、加载数据
df = pd.read_csv(r"911.csv")
df.head(3)
| lat | lng | desc | zip | title | timeStamp | twp | addr | e |
---|
0 | 40.297876 | -75.581294 | REINDEER CT & DEAD END; NEW HANOVER; Station ... | 19525.0 | EMS: BACK PAINS/INJURY | 2015-12-10 17:10:52 | NEW HANOVER | REINDEER CT & DEAD END | 1 |
---|
1 | 40.258061 | -75.264680 | BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... | 19446.0 | EMS: DIABETIC EMERGENCY | 2015-12-10 17:29:21 | HATFIELD TOWNSHIP | BRIAR PATH & WHITEMARSH LN | 1 |
---|
2 | 40.121182 | -75.351975 | HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... | 19401.0 | Fire: GAS-ODOR/LEAK | 2015-12-10 14:39:21 | NORRISTOWN | HAWS AVE | 1 |
---|
二、得到种类名称
- 首先split
- 再set([list])
tmp_list = df["title"].str.split(": ").tolist()
tmp_list[0]
['EMS', 'BACK PAINS/INJURY']
cate_list = list(set([i[0] for i in tmp_list]))
cate_list
['Fire', 'EMS', 'Traffic']
三、构造分类矩阵
zeros_df = pd.DataFrame(
np.zeros((df.shape[0], len(cate_list))),
columns=cate_list
)
zeros_df.head(3)
| Fire | EMS | Traffic |
---|
0 | 0.0 | 0.0 | 0.0 |
---|
1 | 0.0 | 0.0 | 0.0 |
---|
2 | 0.0 | 0.0 | 0.0 |
---|
四、利用分类矩阵统计
for cate in cate_list:
zeros_df[cate][df["title"].str.contains(cate)] = 1
zeros_df.head(3)
| Fire | EMS | Traffic |
---|
0 | 0.0 | 1.0 | 0.0 |
---|
1 | 0.0 | 1.0 | 0.0 |
---|
2 | 1.0 | 0.0 | 0.0 |
---|
sum_ret = zeros_df.sum(axis=0)
print(type(sum_ret))
print(sum_ret)
<class 'pandas.core.series.Series'>
Fire 37432.0
EMS 124844.0
Traffic 87465.0
dtype: float64
利用df.groupby()实现统计
一、加载数据
df = pd.read_csv("./911.csv")
df["timeStamp"] = pd.to_datetime(df["timeStamp"])
二、改造DataFrame
temp_list = df["title"].str.split(": ").tolist()
cate_list = [i[0] for i in temp_list]
df["cate"] = pd.DataFrame(np.array(cate_list).reshape((df.shape[0],1)))
df.head(3)
| lat | lng | desc | zip | title | timeStamp | twp | addr | e | cate |
---|
0 | 40.297876 | -75.581294 | REINDEER CT & DEAD END; NEW HANOVER; Station ... | 19525.0 | EMS: BACK PAINS/INJURY | 2015-12-10 17:10:52 | NEW HANOVER | REINDEER CT & DEAD END | 1 | EMS |
---|
1 | 40.258061 | -75.264680 | BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... | 19446.0 | EMS: DIABETIC EMERGENCY | 2015-12-10 17:29:21 | HATFIELD TOWNSHIP | BRIAR PATH & WHITEMARSH LN | 1 | EMS |
---|
2 | 40.121182 | -75.351975 | HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... | 19401.0 | Fire: GAS-ODOR/LEAK | 2015-12-10 14:39:21 | NORRISTOWN | HAWS AVE | 1 | Fire |
---|
三、 利用df.groupby()统计
print(df.groupby(by="cate").count()['title'])
cate
EMS 124840
Fire 37432
Traffic 87465
Name: title, dtype: int64