数据分析必备:一步步教你如何用Pandas做数据分析(5)

1、Pandas CSV 文件

CSV(Comma-Separated Values,逗号分隔值,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。
CSV 是一种通用的、相对简单的文件格式,被用户、商业和科学广泛应用。
Pandas 可以很方便的处理 CSV 文件,本文以 nba.csv 为例,你可以下载 nba.csv 或打开 nba.csv 查看。

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.to_string())

to_string() 用于返回 DataFrame 类型的数据,如果不使用该函数,则输出结果为数据的前面 5 行和末尾 5 行,中间部分以 … 代替。

                         Name                    Team  Number Position   Age Height  Weight                College      Salary
0               Avery Bradley          Boston Celtics     0.0       PG  25.0    6-2   180.0                  Texas   7730337.0
1                 Jae Crowder          Boston Celtics    99.0       SF  25.0    6-6   235.0              Marquette   6796117.0
2                John Holland          Boston Celtics    30.0       SG  27.0    6-5   205.0      Boston University         NaN
3                 R.J. Hunter          Boston Celtics    28.0       SG  22.0    6-5   185.0          Georgia State   1148640.0
4               Jonas Jerebko          Boston Celtics     8.0       PF  29.0   6-10   231.0                    NaN   5000000.0
5                Amir Johnson          Boston Celtics    90.0       PF  29.0    6-9   240.0                    NaN  12000000.0
6               Jordan Mickey          Boston Celtics    55.0       PF  21.0    6-8   235.0                    LSU   1170960.0
7                Kelly Olynyk          Boston Celtics    41.0        C  25.0    7-0   238.0                Gonzaga   2165160.0
8                Terry Rozier          Boston Celtics    12.0       PG  22.0    6-2   190.0             Louisville   1824360.0
9                Marcus Smart          Boston Celtics    36.0       PG  22.0    6-4   220.0         Oklahoma State   3431040.0
10            Jared Sullinger          Boston Celtics     7.0        C  24.0    6-9   260.0             Ohio State   2569260.0
11              Isaiah Thomas          Boston Celtics     4.0       PG  27.0    5-9   185.0             Washington   6912869.0
12                Evan Turner          Boston Celtics    11.0       SG  27.0    6-7   220.0             Ohio State   3425510.0
13                James Young          Boston Celtics    13.0       SG  20.0    6-6   215.0               Kentucky   1749840.0
14               Tyler Zeller          Boston Celtics    44.0        C  26.0    7-0   253.0         North Carolina   2616975.0
15           Bojan Bogdanovic           Brooklyn Nets    44.0       SG  27.0    6-8   216.0                    NaN   3425510.0
16               Markel Brown           Brooklyn Nets    22.0       SG  24.0    6-3   190.0         Oklahoma State    845059.0
17            Wayne Ellington           Brooklyn Nets    21.0       SG  28.0    6-4   200.0         North Carolina   1500000.0
18    Rondae Hollis-Jefferson           Brooklyn Nets    24.0       SG  21.0    6-7   220.0                Arizona   1335480.0
19               Jarrett Jack           Brooklyn Nets     2.0       PG  32.0    6-3   200.0           Georgia Tech   6300000.0
20             Sergey Karasev           Brooklyn Nets    10.0       SG  22.0    6-7   208.0                    NaN   1599840.0
21            Sean Kilpatrick           Brooklyn Nets     6.0       SG  26.0    6-4   219.0             Cincinnati    134215.0
22               Shane Larkin           Brooklyn Nets     0.0       PG  23.0   5-11   175.0             Miami (FL)   1500000.0
23                Brook Lopez           Brooklyn Nets    11.0        C  28.0    7-0   275.0               Stanford  19689000.0
24           Chris McCullough           Brooklyn Nets     1.0       PF  21.0   6-11   200.0               Syracuse   1140240.0
25                Willie Reed           Brooklyn Nets    33.0       PF  26.0   6-10   220.0            Saint Louis    947276.0
26            Thomas Robinson           Brooklyn Nets    41.0       PF  25.0   6-10   237.0                 Kansas    981348.0
27                 Henry Sims           Brooklyn Nets    14.0        C  26.0   6-10   248.0             Georgetown    947276.0
28               Donald Sloan           Brooklyn Nets    15.0       PG  28.0    6-3   205.0              Texas A&M    947276.0
29             Thaddeus Young           Brooklyn Nets    30.0       PF  27.0    6-8   221.0           Georgia Tech  11235955.0
30              Arron Afflalo         New York Knicks     4.0       SG  30.0    6-5   210.0                   UCLA   8000000.0
31               Lou Amundson         New York Knicks    17.0       PF  33.0    6-9   220.0                   UNLV   1635476.0
32     Thanasis Antetokounmpo         New York Knicks    43.0       SF  23.0    6-7   205.0                    NaN     30888.0
33            Carmelo Anthony         New York Knicks     7.0       SF  32.0    6-8   240.0               Syracuse  22875000.0
34              Jose Calderon         New York Knicks     3.0       PG  34.0    6-3   200.0                    NaN   7402812.0
35           Cleanthony Early         New York Knicks    11.0       SF  25.0    6-8   210.0          Wichita State    845059.0
36          Langston Galloway         New York Knicks     2.0       SG  24.0    6-2   200.0         Saint Joseph's    845059.0
37               Jerian Grant         New York Knicks    13.0       PG  23.0    6-4   195.0             Notre Dame   1572360.0
38                Robin Lopez         New York Knicks     8.0        C  28.0    7-0   255.0               Stanford  12650000.0
39               Kyle O'Quinn         New York Knicks     9.0       PF  26.0   6-10   250.0          Norfolk State   3750000.0
40         Kristaps Porzingis         New York Knicks     6.0       PF  20.0    7-3   240.0                    NaN   4131720.0
41             Kevin Seraphin         New York Knicks     1.0        C  26.0   6-10   278.0                    NaN   2814000.0
42               Lance Thomas         New York Knicks    42.0       SF  28.0    6-8   235.0                   Duke   1636842.0
43              Sasha Vujacic         New York Knicks    18.0       SG  32.0    6-7   195.0                    NaN    947276.0
44           Derrick Williams         New York Knicks    23.0       PF  25.0    6-8   240.0                Arizona   4000000.0
45                Tony Wroten         New York Knicks     5.0       SG  23.0    6-6   205.0             Washington    167406.0
46                Elton Brand      Philadelphia 76ers    42.0       PF  37.0    6-9   254.0                   Duke         NaN
47              Isaiah Canaan      Philadelphia 76ers     0.0       PG  25.0    6-0   201.0           Murray State    947276.0
48           Robert Covington      Philadelphia 76ers    33.0       SF  25.0    6-9   215.0        Tennessee State   1000000.0
49                Joel Embiid      Philadelphia 76ers    21.0        C  22.0    7-0   250.0                 Kansas   4626960.0
50               Jerami Grant      Philadelphia 76ers    39.0       SF  22.0    6-8   210.0               Syracuse    845059.0
51             Richaun Holmes      Philadelphia 76ers    22.0       PF  22.0   6-10   245.0          Bowling Green   1074169.0
52                Carl Landry      Philadelphia 76ers     7.0       PF  32.0    6-9   248.0                 Purdue   6500000.0
53           Kendall Marshall      Philadelphia 76ers     5.0       PG  24.0    6-4   200.0         North Carolina   2144772.0
54             T.J. McConnell      Philadelphia 76ers    12.0       PG  24.0    6-2   200.0                Arizona    525093.0
55               Nerlens Noel      Philadelphia 76ers     4.0       PF  22.0   6-11   228.0               Kentucky   3457800.0
56              Jahlil Okafor      Philadelphia 76ers     8.0        C  20.0   6-11   275.0                   Duke   4582680.0
57                  Ish Smith      Philadelphia 76ers     1.0       PG  27.0    6-0   175.0            Wake Forest    947276.0
58               Nik Stauskas      Philadelphia 76ers    11.0       SG  22.0    6-6   205.0               Michigan   2869440.0
59            Hollis Thompson      Philadelphia 76ers    31.0       SG  25.0    6-8   206.0             Georgetown    947276.0
60             Christian Wood      Philadelphia 76ers    35.0       PF  20.0   6-11   220.0                   UNLV    525093.0
61            Bismack Biyombo         Toronto Raptors     8.0        C  23.0    6-9   245.0                    NaN   2814000.0
62              Bruno Caboclo         Toronto Raptors    20.0       SF  20.0    6-9   205.0                    NaN   1524000.0
63            DeMarre Carroll         Toronto Raptors     5.0       SF  29.0    6-8   212.0               Missouri  13600000.0
64              DeMar DeRozan         Toronto Raptors    10.0       SG  26.0    6-7   220.0                    USC  10050000.0
65              James Johnson         Toronto Raptors     3.0       PF  29.0    6-9   250.0            Wake Forest   2500000.0
66                Cory Joseph         Toronto Raptors     6.0       PG  24.0    6-3   190.0                  Texas   7000000.0
67                 Kyle Lowry         Toronto Raptors     7.0       PG  30.0    6-0   205.0              Villanova  12000000.0
68             Lucas Nogueira         Toronto Raptors    92.0        C  23.0    7-0   220.0                    NaN   1842000.0
69          Patrick Patterson         Toronto Raptors    54.0       PF  27.0    6-9   235.0               Kentucky   6268675.0
70              Norman Powell         Toronto Raptors    24.0       SG  23.0    6-4   215.0                   UCLA    650000.0
71              Terrence Ross         Toronto Raptors    31.0       SF  25.0    6-7   195.0             Washington   3553917.0
72                 Luis Scola         Toronto Raptors     4.0       PF  36.0    6-9   240.0                    NaN   2900000.0
73             Jason Thompson         Toronto Raptors     1.0       PF  29.0   6-11   250.0                  Rider    245177.0
74          Jonas Valanciunas         Toronto Raptors    17.0        C  24.0    7-0   255.0                    NaN   4660482.0
75               Delon Wright         Toronto Raptors    55.0       PG  24.0    6-5   190.0                   Utah   1509360.0
76            Leandro Barbosa   Golden State Warriors    19.0       SG  33.0    6-3   194.0                    NaN   2500000.0
77            Harrison Barnes   Golden State Warriors    40.0       SF  24.0    6-8   225.0         North Carolina   3873398.0
78               Andrew Bogut   Golden State Warriors    12.0        C  31.0    7-0   260.0                   Utah  13800000.0
79                  Ian Clark   Golden State Warriors    21.0       SG  25.0    6-3   175.0                Belmont    947276.0
80              Stephen Curry   Golden State Warriors    30.0       PG  28.0    6-3   190.0               Davidson  11370786.0
81               Festus Ezeli   Golden State Warriors    31.0        C  26.0   6-11   265.0             Vanderbilt   2008748.0
82             Draymond Green   Golden State Warriors    23.0       PF  26.0    6-7   230.0         Michigan State  14260870.0
83             Andre Iguodala   Golden State Warriors     9.0       SF  32.0    6-6   215.0                Arizona  11710456.0
84           Shaun Livingston   Golden State Warriors    34.0       PG  30.0    6-7   192.0                    NaN   5543725.0
85               Kevon Looney   Golden State Warriors    36.0       SF  20.0    6-9   220.0                   UCLA   1131960.0
86       James Michael McAdoo   Golden State Warriors    20.0       SF  23.0    6-9   240.0         North Carolina    845059.0
87               Brandon Rush   Golden State Warriors     4.0       SF  30.0    6-6   220.0                 Kansas   1270964.0
88          Marreese Speights   Golden State Warriors     5.0        C  28.0   6-10   255.0                Florida   3815000.0
89              Klay Thompson   Golden State Warriors    11.0       SG  26.0    6-7   215.0       Washington State  15501000.0
90           Anderson Varejao   Golden State Warriors    18.0       PF  33.0   6-11   273.0                    NaN    289755.0
91               Cole Aldrich    Los Angeles Clippers    45.0        C  27.0   6-11   250.0                 Kansas   1100602.0
92                 Jeff Ayres    Los Angeles Clippers    19.0       PF  29.0    6-9   250.0          Arizona State    111444.0
93             Jamal Crawford    Los Angeles Clippers    11.0       SG  36.0    6-5   195.0               Michigan   5675000.0
94             Branden Dawson    Los Angeles Clippers    22.0       SF  23.0    6-6   225.0         Michigan State    525093.0
95                 Jeff Green    Los Angeles Clippers     8.0       SF  29.0    6-9   235.0             Georgetown   9650000.0
96              Blake Griffin    Los Angeles Clippers    32.0       PF  27.0   6-10   251.0               Oklahoma  18907726.0
97             Wesley Johnson    Los Angeles Clippers    33.0       SF  28.0    6-7   215.0               Syracuse   1100602.0
···················
···················
···················
import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df)

输出结果为:

              Name            Team  ...            College     Salary
0    Avery Bradley  Boston Celtics  ...              Texas  7730337.0
1      Jae Crowder  Boston Celtics  ...          Marquette  6796117.0
2     John Holland  Boston Celtics  ...  Boston University        NaN
3      R.J. Hunter  Boston Celtics  ...      Georgia State  1148640.0
4    Jonas Jerebko  Boston Celtics  ...                NaN  5000000.0
..             ...             ...  ...                ...        ...
453   Shelvin Mack       Utah Jazz  ...             Butler  2433333.0
454      Raul Neto       Utah Jazz  ...                NaN   900000.0
455   Tibor Pleiss       Utah Jazz  ...                NaN  2900000.0
456    Jeff Withey       Utah Jazz  ...             Kansas   947276.0
457            NaN             NaN  ...                NaN        NaN

我们也可以使用 to_csv() 方法将 DataFrame 存储为 csv 文件:

import pandas as pd
# 三个字段 name, site, age
nme = ["Google", "Runoob", "Taobao", "Wiki"]
st = ["www.google.com", "www.runoob.com", "www.taobao.com", "www.wikipedia.org"]
ag = [90, 40, 80, 98]
# 字典
dict = {'name': nme, 'site': st, 'age': ag}  
df = pd.DataFrame(dict)
# 保存 dataframe
df.to_csv('site.csv')

执行成功后,我们打开 site.csv 文件,显示结果如下:
在这里插入图片描述

2、数据处理

2.1、head()

head( n ) 方法用于读取前面的 n 行,如果不填参数 n ,默认返回 5 行。
读取前面 5 行

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.head())

输出结果为:

            Name            Team  Number  ... Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0  ...  180.0              Texas  7730337.0
1    Jae Crowder  Boston Celtics    99.0  ...  235.0          Marquette  6796117.0
2   John Holland  Boston Celtics    30.0  ...  205.0  Boston University        NaN
3    R.J. Hunter  Boston Celtics    28.0  ...  185.0      Georgia State  1148640.0
4  Jonas Jerebko  Boston Celtics     8.0  ...  231.0                NaN  5000000.0
[5 rows x 9 columns]

读取前面 10 行

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.head(10))

输出结果为:

            Name            Team  Number  ... Weight            College      Salary
0  Avery Bradley  Boston Celtics     0.0  ...  180.0              Texas   7730337.0
1    Jae Crowder  Boston Celtics    99.0  ...  235.0          Marquette   6796117.0
2   John Holland  Boston Celtics    30.0  ...  205.0  Boston University         NaN
3    R.J. Hunter  Boston Celtics    28.0  ...  185.0      Georgia State   1148640.0
4  Jonas Jerebko  Boston Celtics     8.0  ...  231.0                NaN   5000000.0
5   Amir Johnson  Boston Celtics    90.0  ...  240.0                NaN  12000000.0
6  Jordan Mickey  Boston Celtics    55.0  ...  235.0                LSU   1170960.0
7   Kelly Olynyk  Boston Celtics    41.0  ...  238.0            Gonzaga   2165160.0
8   Terry Rozier  Boston Celtics    12.0  ...  190.0         Louisville   1824360.0
9   Marcus Smart  Boston Celtics    36.0  ...  220.0     Oklahoma State   3431040.0
[10 rows x 9 columns]

tail()
tail( n ) 方法用于读取尾部的 n 行,如果不填参数 n ,默认返回 5 行,空行各个字段的值返回 NaN。
读取末尾 5 行

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.tail())

输出结果为:

             Name       Team  Number Position  ...  Height Weight  College     Salary
453  Shelvin Mack  Utah Jazz     8.0       PG  ...     6-3  203.0   Butler  2433333.0
454     Raul Neto  Utah Jazz    25.0       PG  ...     6-1  179.0      NaN   900000.0
455  Tibor Pleiss  Utah Jazz    21.0        C  ...     7-3  256.0      NaN  2900000.0
456   Jeff Withey  Utah Jazz    24.0        C  ...     7-0  231.0   Kansas   947276.0
457           NaN        NaN     NaN      NaN  ...     NaN    NaN      NaN        NaN
[5 rows x 9 columns]

读取末尾 10 行

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.tail(10))

输出结果为:

              Name       Team  Number  ... Weight   College      Salary
448  Gordon Hayward  Utah Jazz    20.0  ...  226.0    Butler  15409570.0
449     Rodney Hood  Utah Jazz     5.0  ...  206.0      Duke   1348440.0
450      Joe Ingles  Utah Jazz     2.0  ...  226.0       NaN   2050000.0
451   Chris Johnson  Utah Jazz    23.0  ...  206.0    Dayton    981348.0
452      Trey Lyles  Utah Jazz    41.0  ...  234.0  Kentucky   2239800.0
453    Shelvin Mack  Utah Jazz     8.0  ...  203.0    Butler   2433333.0
454       Raul Neto  Utah Jazz    25.0  ...  179.0       NaN    900000.0
455    Tibor Pleiss  Utah Jazz    21.0  ...  256.0       NaN   2900000.0
456     Jeff Withey  Utah Jazz    24.0  ...  231.0    Kansas    947276.0
457             NaN        NaN     NaN  ...    NaN       NaN         NaN
[10 rows x 9 columns]

info()
info() 方法返回表格的一些基本信息:

import pandas as pd
df = pd.read_csv('E:\\Edge浏览器文件\\nba.csv')
print(df.info())

输出结果为:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object 
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB
None

non-null 为非空数据,我们可以看到上面的信息中,总共 458 行,College 字段的空值最多。

  • 14
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值