2-3_Data_Analysis_part2

# The usual preamble
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# Make the graphs a bit prettier, and bigger


# This is necessary to show lots of columns in pandas 0.12. 
# Not necessary in pandas 0.13.
plt.style.use("bmh")
plt.rc('font', family='SimHei', size=13) #显示中文
pd.set_option('display.max_columns',1000)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth',1000)

继续part1的形式,把数据读进来。

complaints = pd.read_csv('311-service-requests.csv')

3.1 按条件来筛选数据

额,先看看全部的数据。

complaints.head()
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
02658965110/31/2013 02:08:41 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1143290-03 169 STREET169 STREET90 AVENUE91 AVENUENaNNaNADDRESSJAMAICANaNPrecinctAssigned10/31/2013 10:08:41 AM10/31/2013 02:35:17 AM12 QUEENSQUEENS1042027.0197389.0UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.708275-73.791604(40.70827532593202, -73.79160395779721)
12659369810/31/2013 02:01:04 AMNaNNYPDNew York City Police DepartmentIllegal ParkingCommercial Overnight ParkingStreet/Sidewalk1137858 AVENUE58 AVENUE58 PLACE59 STREETNaNNaNBLOCKFACEMASPETHNaNPrecinctOpen10/31/2013 10:01:04 AMNaN05 QUEENSQUEENS1009349.0201984.0UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.721041-73.909453(40.721040535628305, -73.90945306791765)
22659413910/31/2013 02:00:24 AM10/31/2013 02:40:32 AMNYPDNew York City Police DepartmentNoise - CommercialLoud Music/PartyClub/Bar/Restaurant100324060 BROADWAYBROADWAYWEST 171 STREETWEST 172 STREETNaNNaNADDRESSNEW YORKNaNPrecinctClosed10/31/2013 10:00:24 AM10/31/2013 02:39:42 AM12 MANHATTANMANHATTAN1001088.0246531.0UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.843330-73.939144(40.84332975466513, -73.93914371913482)
32659572110/31/2013 01:56:23 AM10/31/2013 02:21:48 AMNYPDNew York City Police DepartmentNoise - VehicleCar/Truck HornStreet/Sidewalk10023WEST 72 STREETWEST 72 STREETCOLUMBUS AVENUEAMSTERDAM AVENUENaNNaNBLOCKFACENEW YORKNaNPrecinctClosed10/31/2013 09:56:23 AM10/31/2013 02:21:10 AM07 MANHATTANMANHATTAN989730.0222727.0UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.778009-73.980213(40.7780087446372, -73.98021349023975)
42659093010/31/2013 01:53:44 AMNaNDOHMHDepartment of Health and Mental HygieneRodentCondition Attracting RodentsVacant Lot10027WEST 124 STREETWEST 124 STREETLENOX AVENUEADAM CLAYTON POWELL JR BOULEVARDNaNNaNBLOCKFACENEW YORKNaNNaNPending11/30/2013 01:53:44 AM10/31/2013 01:59:54 AM10 MANHATTANMANHATTAN998815.0233545.0UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.807691-73.947387(40.80769092704951, -73.94738703491433)

如果我们想挑选出来"Complaint Type"字段为某一类(比如"Noise - Street/Sidewalk")的数据,怎么选呢?

我们先看怎么选,一会儿解释下。

noise_complaints = complaints[complaints['Complaint Type'] == "Noise - Street/Sidewalk"]
noise_complaints[:3]
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
02658965110/31/2013 02:08:41 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1143290-03 169 STREET169 STREET90 AVENUE91 AVENUENaNNaNADDRESSJAMAICANaNPrecinctAssigned10/31/2013 10:08:41 AM10/31/2013 02:35:17 AM12 QUEENSQUEENS1042027.0197389.0UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.708275-73.791604(40.70827532593202, -73.79160395779721)
162659408610/31/2013 12:54:03 AM10/31/2013 02:16:39 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk10310173 CAMPBELL AVENUECAMPBELL AVENUEHENDERSON AVENUEWINEGAR LANENaNNaNADDRESSSTATEN ISLANDNaNPrecinctClosed10/31/2013 08:54:03 AM10/31/2013 02:07:14 AM01 STATEN ISLANDSTATEN ISLAND952013.0171076.0UnspecifiedSTATEN ISLANDUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.636182-74.116150(40.63618202176914, -74.1161500428337)
252659157310/31/2013 12:35:18 AM10/31/2013 02:41:35 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1031224 PRINCETON LANEPRINCETON LANEHAMPTON GREENDEAD ENDNaNNaNADDRESSSTATEN ISLANDNaNPrecinctClosed10/31/2013 08:35:18 AM10/31/2013 01:45:17 AM03 STATEN ISLANDSTATEN ISLAND929577.0140964.0UnspecifiedSTATEN ISLANDUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.553421-74.196743(40.55342078716953, -74.19674315017886)

你看看上面我们选出来的数据的 noise_complaints字段,发现我们确实按条件筛出来了想要的数据,背后的原理是什么呢?

我们先看看,中括号内的部分,会生成一个True或者False的布尔型dataframe

complaints['Complaint Type'] == "Noise - Street/Sidewalk"
0          True
1         False
2         False
3         False
4         False
5         False
6         False
7         False
8         False
9         False
10        False
11        False
12        False
13        False
14        False
15        False
16         True
17        False
18        False
19        False
20        False
21        False
22        False
23        False
24        False
25         True
26        False
27        False
28         True
29        False
          ...  
111039    False
111040    False
111041    False
111042     True
111043    False
111044     True
111045    False
111046    False
111047    False
111048     True
111049    False
111050    False
111051    False
111052    False
111053    False
111054     True
111055    False
111056    False
111057    False
111058    False
111059     True
111060    False
111061    False
111062    False
111063    False
111064    False
111065    False
111066     True
111067    False
111068    False
Name: Complaint Type, Length: 111069, dtype: bool

既然看到布尔型的数值出来了,大家一定会想到逻辑运算(与或非…),是的,如果你现在需要多个条件判定呢?

恩,比如你的条件是’Complaint Type’字段为"Noise - Street/Sidewalk",且’Borough’字段为"BROOKLYN":

is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
in_brooklyn = complaints['Borough'] == "BROOKLYN"
complaints[is_noise & in_brooklyn][:5]
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
312659556410/31/2013 12:30:36 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11236AVENUE JAVENUE JEAST 80 STREETEAST 81 STREETNaNNaNBLOCKFACEBROOKLYNNaNPrecinctOpen10/31/2013 08:30:36 AMNaN18 BROOKLYNBROOKLYN1008937.0170310.0UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.634104-73.911055(40.634103775951736, -73.91105541883589)
492659555310/31/2013 12:05:10 AM10/31/2013 02:43:43 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1122525 LEFFERTS AVENUELEFFERTS AVENUEWASHINGTON AVENUEBEDFORD AVENUENaNNaNADDRESSBROOKLYNNaNPrecinctClosed10/31/2013 08:05:10 AM10/31/2013 01:29:29 AM09 BROOKLYNBROOKLYN995366.0180388.0UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.661793-73.959934(40.6617931276793, -73.95993363978067)
1092659465310/30/2013 11:26:32 PM10/31/2013 12:18:54 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11222NaNNaNNaNNaNDOBBIN STREETNORMAN STREETINTERSECTIONBROOKLYNNaNPrecinctClosed10/31/2013 07:26:32 AM10/31/2013 12:18:54 AM01 BROOKLYNBROOKLYN996925.0203271.0UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.724600-73.954271(40.724599563793525, -73.95427134534344)
2362659199210/30/2013 10:02:58 PM10/30/2013 10:23:20 PMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk11218DITMAS AVENUEDITMAS AVENUENaNNaNNaNNaNLATLONGBROOKLYNNaNPrecinctClosed10/31/2013 06:02:58 AM10/30/2013 10:23:20 PM01 BROOKLYNBROOKLYN991895.0171051.0UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.636169-73.972455(40.63616876563881, -73.97245504682485)
3702659416710/30/2013 08:38:25 PM10/30/2013 10:26:28 PMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11218126 BEVERLY ROADBEVERLY ROADCHURCH AVENUEEAST 2 STREETNaNNaNADDRESSBROOKLYNNaNPrecinctClosed10/31/2013 04:38:25 AM10/30/2013 10:26:28 PM12 BROOKLYNBROOKLYN990144.0173511.0UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.642922-73.978762(40.6429222774404, -73.97876175474585)

我们也可以只看选出来的数据里面的一些列:

complaints[is_noise & in_brooklyn][['Complaint Type', 'Borough', 'Created Date', 'Descriptor']][:10]
Complaint TypeBoroughCreated DateDescriptor
31Noise - Street/SidewalkBROOKLYN10/31/2013 12:30:36 AMLoud Music/Party
49Noise - Street/SidewalkBROOKLYN10/31/2013 12:05:10 AMLoud Talking
109Noise - Street/SidewalkBROOKLYN10/30/2013 11:26:32 PMLoud Music/Party
236Noise - Street/SidewalkBROOKLYN10/30/2013 10:02:58 PMLoud Talking
370Noise - Street/SidewalkBROOKLYN10/30/2013 08:38:25 PMLoud Music/Party
378Noise - Street/SidewalkBROOKLYN10/30/2013 08:32:13 PMLoud Talking
656Noise - Street/SidewalkBROOKLYN10/30/2013 06:07:39 PMLoud Music/Party
1251Noise - Street/SidewalkBROOKLYN10/30/2013 03:04:51 PMLoud Talking
5416Noise - Street/SidewalkBROOKLYN10/29/2013 10:07:02 PMLoud Talking
5584Noise - Street/SidewalkBROOKLYN10/29/2013 08:15:59 PMLoud Music/Party

3.2 从numpy到pandas

说起来,pandas数据的每一列其实是pd.Series类型的

pd.Series([1,2,3])
0    1
1    2
2    3
dtype: int64

pandas Series底层是numpy数组,如果你在 Series后面加上 .values ,你得到的就是一个实实在在的numpy数组

np.array([1,2,3])
array([1, 2, 3])
pd.Series([1,2,3]).values
array([1, 2, 3], dtype=int64)

所以呢,其实刚才的布尔判定,和numpy有密不可分的关系(当然,你自己用pandas的时候,可以不管底层是什么实现的)

arr = np.array([1,2,3])
arr != 2
array([ True, False,  True])
arr[arr != 2]
array([1, 3])

3.3 所以,咱们汇总一下,分析点数据出来?

is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
noise_complaints = complaints[is_noise]
noise_complaints['Borough'].value_counts()
MANHATTAN        917
BROOKLYN         456
BRONX            292
QUEENS           226
STATEN ISLAND     36
Unspecified        1
Name: Borough, dtype: int64

发现曼哈顿好像最吵。绝对大小的数值当然也是有说服力的,咱们还是习惯转换成比例的形式,比如下面这样:

noise_complaint_counts = noise_complaints['Borough'].value_counts()
complaint_counts = complaints['Borough'].value_counts()
noise_complaint_counts / complaint_counts
BRONX            0.014833
BROOKLYN         0.013864
MANHATTAN        0.037755
QUEENS           0.010143
STATEN ISLAND    0.007474
Unspecified      0.000141
Name: Borough, dtype: float64

上一步操作你得到的结果其实都会是0,为啥?python默认的整型和整型除法得到的结果还是整型,所以我们最好把 complaint_counts 字段类型转换一下,转成float型的。

noise_complaint_counts / complaint_counts.astype(float)
BRONX            0.014833
BROOKLYN         0.013864
MANHATTAN        0.037755
QUEENS           0.010143
STATEN ISLAND    0.007474
Unspecified      0.000141
Name: Borough, dtype: float64

再画个图

(noise_complaint_counts / complaint_counts.astype(float)).plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x40c1f60>

在这里插入图片描述

So Manhattan really does complain more about noise than the other boroughs! Neat.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

安替-AnTi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值