pandas 处理CSV文件

读取CSV文件:

#!/usr/bin/python
import pandas as pd

df = pd.read_csv('test.csv')

print(df)

out:
              Name            Team  Number Position   Age Height  Weight            College     Salary
0    Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0
1      Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0
2     John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN
3      R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0
4    Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0
..             ...             ...     ...      ...   ...    ...     ...                ...        ...
453   Shelvin Mack       Utah Jazz     8.0       PG  26.0    6-3   203.0             Butler  2433333.0
454      Raul Neto       Utah Jazz    25.0       PG  24.0    6-1   179.0                NaN   900000.0
455   Tibor Pleiss       Utah Jazz    21.0        C  26.0    7-3   256.0                NaN  2900000.0
456    Jeff Withey       Utah Jazz    24.0        C  26.0    7-0   231.0             Kansas   947276.0
457            NaN             NaN     NaN      NaN   NaN    NaN     NaN                NaN        NaN
print(df.to_string())

#to_string返回dataframe类型的数据,全部打印到屏幕上

参数:index_col

决定读进来的哪一列做索引,默认是index_col=False

df = pd.read_csv("test.csv", index_col=False)
print(df)

输出:

df = pd.read_csv("test.csv", index_col=0)
print(df)

 输出:

参数:usecols

指定读取哪几列:

df = pd.read_csv("test.csv", usecols=[1,2,3])
print(df)

输出:

参数:nrows

指定读取多少行

df = pd.read_csv("test.csv", usecols=[1,2,3], nrows=3)
print(df)

 输出:


to_csv()将DataFrame转储为csv文件:

#!/usr/bin/python
import pandas as pd

#df = pd.read_csv('test.csv')

# 三个字段 name, site, age
nme = ["Google", "Runoob", "Taobao", "Wiki"]
st = ["www.google.com", "www.runoob.com", "www.taobao.com", "www.wikipedia.org"]
ag = [90, 40, 80, 90]

# 字典
dict = {'name': nme, 'site': st, 'age': ag}

df = pd.DataFrame(dict)
print(df)
# 保存 dataframe
df.to_csv('site.csv')

out:
     name               site  age
0  Google     www.google.com   90
1  Runoob     www.runoob.com   40
2  Taobao     www.taobao.com   80
3    Wiki  www.wikipedia.org   90

同目录下会生成一个site.csv文件。

这种情况下前面是带有索引的。

 参数:index

默认index=True

df.to_csv('site.csv', index=False)

不添加索引后,写入数据如下:

参数:columns

指定需要写入的列

df.to_csv('site.csv', index=False, columns=['name','age'])

写入的数据如下:


分别读取最前最后五行:

#!/usr/bin/python
import pandas as pd

df = pd.read_csv('test.csv')

print(df.head(5))
print('\n')
print(df.tail(5))

out:
            Name            Team  Number Position   Age Height  Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0


             Name       Team  Number Position   Age Height  Weight College     Salary
453  Shelvin Mack  Utah Jazz     8.0       PG  26.0    6-3   203.0  Butler  2433333.0
454     Raul Neto  Utah Jazz    25.0       PG  24.0    6-1   179.0     NaN   900000.0
455  Tibor Pleiss  Utah Jazz    21.0        C  26.0    7-3   256.0     NaN  2900000.0
456   Jeff Withey  Utah Jazz    24.0        C  26.0    7-0   231.0  Kansas   947276.0
457           NaN        NaN     NaN      NaN   NaN    NaN     NaN     NaN        NaN

info()返回DataFrame的基本信息:

#!/usr/bin/python
import pandas as pd

df = pd.read_csv('test.csv')

print(df.info())

out:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Name      457 non-null    object
 1   Team      457 non-null    object
 2   Number    457 non-null    float64
 3   Position  457 non-null    object
 4   Age       457 non-null    float64
 5   Height    457 non-null    object
 6   Weight    457 non-null    float64
 7   College   373 non-null    object
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB
None

参考:

Pandas CSV 文件 | 菜鸟教程

第006篇:Pandas的文件操作 - CSV格式 - 知乎

Pandas-CJavaPy

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值