2.3数据分析Data_Analysis_part2

# * coding:utf-8_*_
# 作者     :XiangLin
# 创建时间 :10/02/2020 11:15
# 文件     :2-3_Data_Analysis_part2.py
# IDE      :PyCharm
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# Make the graphs a bit prettier, and bigger


# This is necessary to show lots of columns in pandas 0.12.
# Not necessary in pandas 0.13.
plt.style.use("bmh")
plt.rc('font', family='SimHei', size=13) #显示中文
pd.set_option('display.max_columns',1000)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth',1000)

继续2.2的形式,把数据读进来。

3.1 按条件来筛选数据

额,先看看全部的数据。

complaints = pd.read_csv('311-service-requests.csv')
print(complaints.head())
输出
   Unique Key            Created Date             Closed Date Agency                              Agency Name           Complaint Type                    Descriptor        Location Type Incident Zip  Incident Address      Street Name   Cross Street 1                    Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type      City Landmark Facility Type    Status                Due Date Resolution Action Updated Date Community Board    Borough  X Coordinate (State Plane)  Y Coordinate (State Plane) Park Facility Name Park Borough  School Name School Number School Region  School Code School Phone Number School Address  School City School State   School Zip School Not Found  School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name   Latitude  Longitude                                  Location
0    26589651  10/31/2013 02:08:41 AM                     NaN   NYPD          New York City Police Department  Noise - Street/Sidewalk                  Loud Talking      Street/Sidewalk        11432  90-03 169 STREET       169 STREET        90 AVENUE                         91 AVENUE                   NaN                   NaN      ADDRESS   JAMAICA      NaN      Precinct  Assigned  10/31/2013 10:08:41 AM         10/31/2013 02:35:17 AM       12 QUEENS     QUEENS                   1042027.0                    197389.0        Unspecified       QUEENS  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.708275 -73.791604   (40.70827532593202, -73.79160395779721)
1    26593698  10/31/2013 02:01:04 AM                     NaN   NYPD          New York City Police Department          Illegal Parking  Commercial Overnight Parking      Street/Sidewalk        11378         58 AVENUE        58 AVENUE         58 PLACE                         59 STREET                   NaN                   NaN    BLOCKFACE   MASPETH      NaN      Precinct      Open  10/31/2013 10:01:04 AM                            NaN       05 QUEENS     QUEENS                   1009349.0                    201984.0        Unspecified       QUEENS  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.721041 -73.909453  (40.721040535628305, -73.90945306791765)
2    26594139  10/31/2013 02:00:24 AM  10/31/2013 02:40:32 AM   NYPD          New York City Police Department       Noise - Commercial              Loud Music/Party  Club/Bar/Restaurant        10032     4060 BROADWAY         BROADWAY  WEST 171 STREET                   WEST 172 STREET                   NaN                   NaN      ADDRESS  NEW YORK      NaN      Precinct    Closed  10/31/2013 10:00:24 AM         10/31/2013 02:39:42 AM    12 MANHATTAN  MANHATTAN                   1001088.0                    246531.0        Unspecified    MANHATTAN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.843330 -73.939144   (40.84332975466513, -73.93914371913482)
3    26595721  10/31/2013 01:56:23 AM  10/31/2013 02:21:48 AM   NYPD          New York City Police Department          Noise - Vehicle                Car/Truck Horn      Street/Sidewalk        10023    WEST 72 STREET   WEST 72 STREET  COLUMBUS AVENUE                  AMSTERDAM AVENUE                   NaN                   NaN    BLOCKFACE  NEW YORK      NaN      Precinct    Closed  10/31/2013 09:56:23 AM         10/31/2013 02:21:10 AM    07 MANHATTAN  MANHATTAN                    989730.0                    222727.0        Unspecified    MANHATTAN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.778009 -73.980213    (40.7780087446372, -73.98021349023975)
4    26590930  10/31/2013 01:53:44 AM                     NaN  DOHMH  Department of Health and Mental Hygiene                   Rodent  Condition Attracting Rodents           Vacant Lot        10027   WEST 124 STREET  WEST 124 STREET     LENOX AVENUE  ADAM CLAYTON POWELL JR BOULEVARD                   NaN                   NaN    BLOCKFACE  NEW YORK      NaN           NaN   Pending  11/30/2013 01:53:44 AM         10/31/2013 01:59:54 AM    10 MANHATTAN  MANHATTAN                    998815.0                    233545.0        Unspecified    MANHATTAN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.807691 -73.947387   (40.80769092704951, -73.94738703491433)

如果我们想挑选出来"Complaint Type"字段为某一类(比如"Noise - Street/Sidewalk")的数据,怎么选呢?
我们先看怎么选,一会儿解释下。

noise_complaints = complaints[complaints['Complaint Type'] == "Noise - Street/Sidewalk"]
print(noise_complaints[:3])
输出:
    Unique Key            Created Date             Closed Date Agency                      Agency Name           Complaint Type        Descriptor    Location Type Incident Zip     Incident Address      Street Name    Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type           City Landmark Facility Type    Status                Due Date Resolution Action Updated Date   Community Board        Borough  X Coordinate (State Plane)  Y Coordinate (State Plane) Park Facility Name   Park Borough  School Name School Number School Region  School Code School Phone Number School Address  School City School State   School Zip School Not Found  School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name   Latitude  Longitude                                 Location
0     26589651  10/31/2013 02:08:41 AM                     NaN   NYPD  New York City Police Department  Noise - Street/Sidewalk      Loud Talking  Street/Sidewalk        11432     90-03 169 STREET       169 STREET         90 AVENUE      91 AVENUE                   NaN                   NaN      ADDRESS        JAMAICA      NaN      Precinct  Assigned  10/31/2013 10:08:41 AM         10/31/2013 02:35:17 AM         12 QUEENS         QUEENS                   1042027.0                    197389.0        Unspecified         QUEENS  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.708275 -73.791604  (40.70827532593202, -73.79160395779721)
16    26594086  10/31/2013 12:54:03 AM  10/31/2013 02:16:39 AM   NYPD  New York City Police Department  Noise - Street/Sidewalk  Loud Music/Party  Street/Sidewalk        10310  173 CAMPBELL AVENUE  CAMPBELL AVENUE  HENDERSON AVENUE   WINEGAR LANE                   NaN                   NaN      ADDRESS  STATEN ISLAND      NaN      Precinct    Closed  10/31/2013 08:54:03 AM         10/31/2013 02:07:14 AM  01 STATEN ISLAND  STATEN ISLAND                    952013.0                    171076.0        Unspecified  STATEN ISLAND  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.636182 -74.116150   (40.63618202176914, -74.1161500428337)
25    26591573  10/31/2013 12:35:18 AM  10/31/2013 02:41:35 AM   NYPD  New York City Police Department  Noise - Street/Sidewalk      Loud Talking  Street/Sidewalk        10312    24 PRINCETON LANE   PRINCETON LANE     HAMPTON GREEN       DEAD END                   NaN                   NaN      ADDRESS  STATEN ISLAND      NaN      Precinct    Closed  10/31/2013 08:35:18 AM         10/31/2013 01:45:17 AM  03 STATEN ISLAND  STATEN ISLAND                    929577.0                    140964.0        Unspecified  STATEN ISLAND  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.553421 -74.196743  (40.55342078716953, -74.19674315017886)

你看看上面我们选出来的数据的 noise_complaints字段,发现我们确实按条件筛出来了想要的数据,背后的原理是什么呢?
我们先看看,中括号内的部分,会生成一个True或者False的布尔型dataframe

print(complaints['Complaint Type'] == "Noise - Street/Sidewalk")
输出:
0          True
1         False
2         False
3         False
4         False
5         False
6         False
7         False
8         False
9         False
10        False
11        False
12        False
13        False
14        False
15        False
16         True
17        False
18        False
19        False
20        False
21        False
22        False
23        False
24        False
25         True
26        False
27        False
28         True
29        False
          ...  
111039    False
111040    False
111041    False
111042     True
111043    False
111044     True
111045    False
111046    False
111047    False
111048     True
111049    False
111050    False
111051    False
111052    False
111053    False
111054     True
111055    False
111056    False
111057    False
111058    False
111059     True
111060    False
111061    False
111062    False
111063    False
111064    False
111065    False
111066     True
111067    False
111068    False

既然看到布尔型的数值出来了,大家一定会想到逻辑运算(与或非…),是的,如果你现在需要多个条件判定呢?
恩,比如你的条件是’Complaint Type’字段为"Noise - Street/Sidewalk",且’Borough’字段为"BROOKLYN":

is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
in_brooklyn = complaints['Borough'] == "BROOKLYN"
print(complaints[is_noise & in_brooklyn][:5])
输出:
    Unique Key            Created Date             Closed Date Agency                      Agency Name           Complaint Type        Descriptor    Location Type Incident Zip    Incident Address      Street Name     Cross Street 1  Cross Street 2 Intersection Street 1 Intersection Street 2  Address Type      City Landmark Facility Type  Status                Due Date Resolution Action Updated Date Community Board   Borough  X Coordinate (State Plane)  Y Coordinate (State Plane) Park Facility Name Park Borough  School Name School Number School Region  School Code School Phone Number School Address  School City School State   School Zip School Not Found  School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name   Latitude  Longitude                                  Location
31     26595564  10/31/2013 12:30:36 AM                     NaN   NYPD  New York City Police Department  Noise - Street/Sidewalk  Loud Music/Party  Street/Sidewalk        11236            AVENUE J         AVENUE J     EAST 80 STREET  EAST 81 STREET                   NaN                   NaN     BLOCKFACE  BROOKLYN      NaN      Precinct    Open  10/31/2013 08:30:36 AM                            NaN     18 BROOKLYN  BROOKLYN                   1008937.0                    170310.0        Unspecified     BROOKLYN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.634104 -73.911055  (40.634103775951736, -73.91105541883589)
49     26595553  10/31/2013 12:05:10 AM  10/31/2013 02:43:43 AM   NYPD  New York City Police Department  Noise - Street/Sidewalk      Loud Talking  Street/Sidewalk        11225  25 LEFFERTS AVENUE  LEFFERTS AVENUE  WASHINGTON AVENUE  BEDFORD AVENUE                   NaN                   NaN       ADDRESS  BROOKLYN      NaN      Precinct  Closed  10/31/2013 08:05:10 AM         10/31/2013 01:29:29 AM     09 BROOKLYN  BROOKLYN                    995366.0                    180388.0        Unspecified     BROOKLYN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.661793 -73.959934    (40.6617931276793, -73.95993363978067)
109    26594653  10/30/2013 11:26:32 PM  10/31/2013 12:18:54 AM   NYPD  New York City Police Department  Noise - Street/Sidewalk  Loud Music/Party  Street/Sidewalk        11222                 NaN              NaN                NaN             NaN         DOBBIN STREET         NORMAN STREET  INTERSECTION  BROOKLYN      NaN      Precinct  Closed  10/31/2013 07:26:32 AM         10/31/2013 12:18:54 AM     01 BROOKLYN  BROOKLYN                    996925.0                    203271.0        Unspecified     BROOKLYN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.724600 -73.954271  (40.724599563793525, -73.95427134534344)
236    26591992  10/30/2013 10:02:58 PM  10/30/2013 10:23:20 PM   NYPD  New York City Police Department  Noise - Street/Sidewalk      Loud Talking  Street/Sidewalk        11218       DITMAS AVENUE    DITMAS AVENUE                NaN             NaN                   NaN                   NaN       LATLONG  BROOKLYN      NaN      Precinct  Closed  10/31/2013 06:02:58 AM         10/30/2013 10:23:20 PM     01 BROOKLYN  BROOKLYN                    991895.0                    171051.0        Unspecified     BROOKLYN  Unspecified   Unspecified   Unspecified  Unspecified         Unspecified    Unspecified  Unspecified  Unspecified  Unspecified                N                           NaN          NaN                  NaN                   NaN                 NaN                      NaN       NaN                    NaN             NaN             NaN                 NaN  40.636169 -73.972455   (40.63616876563881, -73.97245504682485)
370    26594167  10/30/2013 08:38:25 PM  10/30/2013 10:26:28 PM   NYPD  New York City Police Department  Noise - Street/Sidewalk  Loud Music/Party  Street/Sidewalk        11218

我们也可以只看选出来的数据里面的一些列:

print(complaints[is_noise & in_brooklyn][['Complaint Type','Borough','Created Date','Descriptor']][:10])
输出:
               Complaint Type   Borough            Created Date        Descriptor
31    Noise - Street/Sidewalk  BROOKLYN  10/31/2013 12:30:36 AM  Loud Music/Party
49    Noise - Street/Sidewalk  BROOKLYN  10/31/2013 12:05:10 AM      Loud Talking
109   Noise - Street/Sidewalk  BROOKLYN  10/30/2013 11:26:32 PM  Loud Music/Party
236   Noise - Street/Sidewalk  BROOKLYN  10/30/2013 10:02:58 PM      Loud Talking
370   Noise - Street/Sidewalk  BROOKLYN  10/30/2013 08:38:25 PM  Loud Music/Party
378   Noise - Street/Sidewalk  BROOKLYN  10/30/2013 08:32:13 PM      Loud Talking
656   Noise - Street/Sidewalk  BROOKLYN  10/30/2013 06:07:39 PM  Loud Music/Party
1251  Noise - Street/Sidewalk  BROOKLYN  10/30/2013 03:04:51 PM      Loud Talking
5416  Noise - Street/Sidewalk  BROOKLYN  10/29/2013 10:07:02 PM      Loud Talking
5584  Noise - Street/Sidewalk  BROOKLYN  10/29/2013 08:15:59 PM  Loud Music/Party

3.2 从numpy到pandas

说起来,pandas数据的每一列其实是pd.Series类型的

n = pd.Series([1,2,3])
print(n)
# pandas Series底层是numpy数组,如果你在 Series后面加上 .values ,你得到的就是一个实实在在的numpy数组
n1 = np.array([1,2,3])
print(n1)
print(pd.Series([1,2,3]).values)
# 所以呢,其实刚才的布尔判定,和numpy有密不可分的关系(当然,你自己用pandas的时候,可以不管底层是什么实现的)
arr = np.array([1,2,3])
print(arr !=2)
print(arr[arr != 2])


输出:
0    1
1    2
2    3
dtype: int64
[1 2 3]
[1 2 3]
[ True False  True]
[1 3]

3.3 所以,咱们汇总一下,分析点数据出来?

is_noise = complaints['Complaint Type'] == 'Noise - Street/Sidewalk'
noise_complaints = complaints[is_noise]
counts = noise_complaints['Borough'].value_counts()
print(counts)
# 发现曼哈顿好像最吵。绝对大小的数值当然也是有说服力的,咱们还是习惯转换成比例的形式,比如下面这样:
noise_complaints_counts = noise_complaints['Borough'].value_counts()
complaints_counts = complaints['Borough'].value_counts()
rate = noise_complaints_counts / complaints_counts
print(rate)
# 上一步操作你得到的结果其实都会是0,为啥?python默认的整型和整型除法得到的结果还是整型,所以我们最好把 complaint_counts 字段类型转换一下,转成float型的。
rate1 = noise_complaints_counts / complaints_counts.astype(float)
print(rate1)
输出:
MANHATTAN        917
BROOKLYN         456
BRONX            292
QUEENS           226
STATEN ISLAND     36
Unspecified        1
Name: Borough, dtype: int64
BRONX            0.014833
BROOKLYN         0.013864
MANHATTAN        0.037755
QUEENS           0.010143
STATEN ISLAND    0.007474
Unspecified      0.000141
Name: Borough, dtype: float64
BRONX            0.014833
BROOKLYN         0.013864
MANHATTAN        0.037755
QUEENS           0.010143
STATEN ISLAND    0.007474
Unspecified      0.000141
Name: Borough, dtype: float64

再画个图

(noise_complaints_counts / complaints_counts.astype(float)).plot(kind = 'bar')
plt.show()

在这里插入图片描述
数据链接:链接:https://pan.baidu.com/s/1BX9z2WeAHrA0hduuf1OD0A
提取码:o2o8
来自七月在线数据挖掘算法
向林
2020年2月10日于重庆城口
好好学习,天天向上,终有所获

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值