读取CSV文件:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv('test.csv')
print(df)
out:
Name Team Number Position Age Height Weight College Salary
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
.. ... ... ... ... ... ... ... ... ...
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
print(df.to_string())
#to_string返回dataframe类型的数据,全部打印到屏幕上
参数:index_col
决定读进来的哪一列做索引,默认是index_col=False
df = pd.read_csv("test.csv", index_col=False)
print(df)
输出:
df = pd.read_csv("test.csv", index_col=0)
print(df)
输出:
参数:usecols
指定读取哪几列:
df = pd.read_csv("test.csv", usecols=[1,2,3])
print(df)
输出:
参数:nrows
指定读取多少行
df = pd.read_csv("test.csv", usecols=[1,2,3], nrows=3)
print(df)
输出:
to_csv()将DataFrame转储为csv文件:
#!/usr/bin/python
import pandas as pd
#df = pd.read_csv('test.csv')
# 三个字段 name, site, age
nme = ["Google", "Runoob", "Taobao", "Wiki"]
st = ["www.google.com", "www.runoob.com", "www.taobao.com", "www.wikipedia.org"]
ag = [90, 40, 80, 90]
# 字典
dict = {'name': nme, 'site': st, 'age': ag}
df = pd.DataFrame(dict)
print(df)
# 保存 dataframe
df.to_csv('site.csv')
out:
name site age
0 Google www.google.com 90
1 Runoob www.runoob.com 40
2 Taobao www.taobao.com 80
3 Wiki www.wikipedia.org 90
同目录下会生成一个site.csv文件。
这种情况下前面是带有索引的。
参数:index
默认index=True
df.to_csv('site.csv', index=False)
不添加索引后,写入数据如下:
参数:columns
指定需要写入的列
df.to_csv('site.csv', index=False, columns=['name','age'])
写入的数据如下:
分别读取最前最后五行:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv('test.csv')
print(df.head(5))
print('\n')
print(df.tail(5))
out:
Name Team Number Position Age Height Weight College Salary
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
Name Team Number Position Age Height Weight College Salary
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
info()返回DataFrame的基本信息:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv('test.csv')
print(df.info())
out:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 457 non-null object
1 Team 457 non-null object
2 Number 457 non-null float64
3 Position 457 non-null object
4 Age 457 non-null float64
5 Height 457 non-null object
6 Weight 457 non-null float64
7 College 373 non-null object
8 Salary 446 non-null float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB
None
参考: