文章目录
导入数据
英国降雨数据:http://data.defra.gov.uk/statistics_2015/env/water/uk_rain_2014.csv
import pandas as pd
#导入数据
#uk_rain_2014.csv 第一行是标签,可以做列索引
df=pd.read_csv('C:\\Users\\cc\\Desktop\\Pandas\\uk_rain_2014.csv',header=0)#默认参数header也为0,表示以第一行为列索引
pandas.read_csv()读取csv文件数据到dataframe
注意:csv中的数据都是用逗号隔开的。
参数:
filepath_or_buffer :字符串、或者任何对象的read()方法。这个字符串可以是URL,有效的URL方案包括http、ftp、s3和文件。可以直接写入"文件名.csv"
sep:分隔符,默认是‘,’,CSV文件的分隔符
header:列名(列索引),默认第一行为列名(默认header=0)
header=None,说明第一行不是列名。这样,它会给新的列名:0,1,2,3,4…
可以给加上新列名,见另一个参数
names:当csv文件没有列名时候,可以用names加上要用的列名
index_col:要用的行名(index),int或sequence或False,默认为None,即默认添加从0开始的index
若要用第一列作为行索引,写index_col=0
print(df)
Water Year Rain (mm) Oct-Sep Outflow (m3/s) Oct-Sep Rain (mm) Dec-Feb \
0 1980/81 1182 5408 292
1 1981/82 1098 5112 257
2 1982/83 1156 5701 330
3 1983/84 993 4265 391
4 1984/85 1182 5364 217
5 1985/86 1027 4991 304
6 1986/87 1151 5196 295
7 1987/88 1210 5572 343
8 1988/89 976 4330 309
9 1989/90 1130 4973 470
10 1990/91 1022 4418 305
11 1991/92 1151 4506 246
12 1992/93 1130 5246 308
13 1993/94 1162 5583 422
14 1994/95 1110 5370 484
15 1995/96 856 3479 245
16 1996/97 1047 4019 258
17 1997/98 1169 4953 341
18 1998/99 1268 5824 360
19 1999/00 1204 5665 417
20 2000/01 1239 6092 328
21 2001/02 1185 5402 380
22 2002/03 1021 4366 272
23 2003/04 1165 4275 348
24 2004/05 1095 4547 309
25 2005/06 1046 4059 206
26 2006/07 1387 6391 437
27 2007/08 1225 5497 386
28 2008/09 1139 4941 268
29 2009/10 1103 4738 255
30 2010/11 1053 4521 265
31 2011/12 1285 5500 339
32 2012/13 1090 5329 350
Outflow (m3/s) Dec-Feb Rain (mm) Jun-Aug Outflow (m3/s) Jun-Aug
0 7248 174 2212
1 7316 242 1936
2 8567 124 1802
3 8905 141 1078
4 5813 343 4313
5 7951 229 2595
6 7593 267 2826
7 8456 294 3154
8 6465 200 1440
9 10520 209 1740
10 7120 216 1923
11 5493 280 2118
12 8751 219 2551
13 10109 193 1638
14 11486 103 1231
15 5515 172 1439
16 5770 256 2102
17 7747 285 3206
18 8771 225 2240
19 10021 197 2166
20 9347 236 2142
21 8891 259 3187
22 7093 176 1478
23 7493 315 2959
24 7183 217 1799
25 4578 188 1474
26 10926 357 5168
27 9485 320 3505
28 6690 323 3189
29 6435 244 1958
30 6593 267 2885
31 7630 379 5261
32 9615 187 1797
df.index
RangeIndex(start=0, stop=33, step=1)
df.columns
Index([u'Water Year', u'Rain (mm) Oct-Sep', u'Outflow (m3/s) Oct-Sep',
u'Rain (mm) Dec-Feb', u'Outflow (m3/s) Dec-Feb', u'Rain (mm) Jun-Aug',
u'Outflow (m3/s) Jun-Aug'],
dtype='object')
df.values
array([['1980/81', 1182L, 5408L, 292L, 7248L, 174L, 2212L],
['1981/82', 1098L, 5112L, 257L, 7316L, 242L, 1936L],
['1982/83', 1156L, 5701L, 330L, 8567L, 124L, 1802L],
['1983/84', 993L, 4265L, 391L, 8905L, 141L, 1078L],
['1984/85', 1182L, 5364L, 217L, 5813L, 343L, 4313L],
['1985/86', 1027L, 4991L, 304L, 7951L, 229L, 2595L],
['1986/87', 1151L, 5196L, 295L, 7593L, 267L, 2826L],
['1987/88', 1210L, 5572L, 343L, 8456L, 294L, 3154L],
['1988/89', 976L, 4330L, 309L, 6465L, 200L, 1440L],
['1989/90', 1130L, 4973L, 470L, 10520L, 209L, 1740L],
['1990/91', 1022L, 4418L, 305L, 7120L, 216L, 1923L],
['1991/92', 1151L, 4506L, 246L, 5493L, 280L, 2118L],
['1992/93', 1130L, 5246L, 308L, 8751L, 219L, 2551L],
['1993/94', 1162L, 5583L, 422L, 10109L, 193L, 1638L],
['1994/95', 1110L, 5370L, 484L, 11486L, 103L, 1231L],
['1995/96', 856L, 3479L, 245L, 5515L, 172L, 1439L],
['1996/97', 1047L, 4019L, 258L, 5770L, 256L, 2102L],
['1997/98', 1169L, 4953L, 341L, 7747L, 285L, 3206L],
['1998/99', 1268L, 5824L, 360L, 8771L, 225L, 2240L],
['1999/00', 1204L, 5665L, 417L, 10021L, 197L, 2166L],
['2000/01', 1239L, 6092L, 328L, 9347L, 236L, 2142L],
['2001/02', 1185L, 5402L, 380L, 8891L, 259L, 3187L],
['2002/03', 1021L, 4366L, 272L, 7093L, 176L, 1478L],
['2003/04', 1165L, 4275L, 348L, 7493L, 315L, 2959L],
['2004/05', 1095L, 4547L, 309L, 7183L, 217L, 1799L],
['2005/06', 1046L, 4059L, 206L, 4578L, 188L, 1474L],
['2006/07', 1387L, 6391L, 437L, 10926L, 357L, 5168L],
['2007/08', 1225L, 5497L, 386L, 9485L, 320L, 3505L],
['2008/09', 1139L, 4941L, 268L, 6690L, 323L, 3189L],
['2009/10', 1103L, 4738L, 255L, 6435L, 244L, 1958L],
['2010/11', 1053L, 4521L, 265L, 6593L, 267L, 2885L],
['2011/12', 1285L, 5500L, 339L, 7630L, 379L, 5261L],
['2012/13', 1090L, 5329L, 350L, 9615L, 187L, 1797L]], dtype=object)
#想知道一些基本统计信息
df.describe()
Rain (mm) Oct-Sep | Outflow (m3/s) Oct-Sep | Rain (mm) Dec-Feb | Outflow (m3/s) Dec-Feb | Rain (mm) Jun-Aug | Outflow (m3/s) Jun-Aug | |
---|---|---|---|---|---|---|
count | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 |
mean | 1129.000000 | 5019.181818 | 325.363636 | 7926.545455 | 237.484848 | 2439.757576 |
std | 101.900074 | 658.587762 | 69.995008 | 1692.800049 | 66.167931 | 1025.914106 |
min | 856.000000 | 3479.000000 | 206.000000 | 4578.000000 | 103.000000 | 1078.000000 |
25% | 1053.000000 | 4506.000000 | 268.000000 | 6690.000000 | 193.000000 | 1797.000000 |
50% | 1139.000000 | 5112.000000 | 309.000000 | 7630.000000 | 229.000000 | 2142.000000 |
75% | 1182.000000 | 5497.000000 | 360.000000 | 8905.000000 | 280.000000 | 2959.000000 |
max | 1387.000000 | 6391.000000 | 484.000000 | 11486.000000 | 379.000000 | 5261.000000 |
df.count()#查找每个列的非空值的数量
df