panda处理CSV文件

也可参考:机器学习中,python如何使用pandas提取数据并把数据分成训练集和测试集*

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('5678'))
df
ABCD
50123
64567
7891011
812131415

主要介绍了五种pandas访问CSV文件数据的方法:

1)loc,行名称列名称
2)iloc,行数列数
3)ix,行名称列名称/行数列数混合
4) df.x,选择单列
5)df[]

1,loc,行名称列名称

1.1,行

df.loc['6']           # 行名称为6的行,即第2行,即4 5 6 7
 
A    4
B    5
C    6
D    7
Name: 6, dtype: int64
df.loc['5':'7']
ABCD
50123
64567
7891011
      # 行名称为5至7的行,即前3行,注意是前闭后闭
df.loc[['5', '7']] 
# 行名称为5和7的行,前第1第3行 
ABCD
50123
7891011

1.2,列

df.loc[:, 'B']   # 列名称为'B‘的列,即第2列,1 5 9 13
5     1
6     5
7     9
8    13
Name: B, dtype: int64
df.loc[:, 'A':'C']    # 列名称为A至C的列,即前3列,注意是前闭后闭
ABC
5012
6456
78910
8121314
df.loc[:, ['A', 'C']] # 列名称为A和C的列,即第1列第3列
AC
502
646
7810
81214

1.3,块

df.loc['5':'7', 'A':'C']       # 行名称为5至7,列名称为A~C的一块数据,即前3行前3列
ABC
5012
6456
78910
df.loc[['5', '7'], ['A', 'C']] # 行名称为5和7,列名称为A和C的一块数据 
AC
502
7810
df.loc['5':'7', ['A', 'C']]    # 行名称5至7,列名称A和C的一块数据
AC
502
646
7810
df.loc[['5', '7'], 'A':'C']    # 行名称为5和7,列名称为A~C的一块数据
ABC
5012
78910

1.4,单元格

df.loc['5', 'A']   # 行名称为5,列名称为A的单元格数据
0
df.at['5', 'A']    # 同loc但速度快点
0

2,iloc,行数列数

2.1,行

df.iloc[1]          # 第2行,即行名称为6的行,4 5 6 7
A    4
B    5
C    6
D    7
Name: 6, dtype: int64
df.iloc[0:3]        # 前3行
ABCD
50123
64567
7891011
df.iloc[[0, 3]]     # 第1第4行
ABCD
50123
812131415

2.2,列

df.iloc[:, 1]       # 第2列,即列名称为'B'的列,1 4 9 13
5     1
6     5
7     9
8    13
Name: B, dtype: int64
df.iloc[:, 0:3]     # 前3列
ABC
5012
6456
78910
8121314
df.iloc[:, [0, 3]]  # 第1第4列
AD
503
647
7811
81215

2.3,块

df.iloc[0:3, 0:3]          # 前3行,前3列的一块数据
ABC
5012
6456
78910
df.iloc[[0, 3],  [0, 3]]   # 第1第4行,第1第4列的一块数据
AD
503
81215
df.iloc[0:3, [0, 3]]       # 前3行,第1第4列的一块数据
AD
503
647
7811
df.iloc[[0, 3], 0:3]       # 第1第4行,前3列的一块数据
ABC
5012
8121314

2.4,单元格

df.iloc[1, 1]   # 第1行,第1列的单元格
5
df.iat[1, 1]    # 同iloc但速度快点
5

3,df.ix,行名称列名称/行数列数混合

3.1,行

df.ix['7']    # 单行,下同
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





A     8
B     9
C    10
D    11
Name: 7, dtype: int64
print(df.ix[2])
df.ix[2].values 
A     8
B     9
C    10
D    11
Name: 7, dtype: int64


/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:2: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  





array([ 8,  9, 10, 11])
df.ix[1: 3]   # 多行,下同
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
ABCD
64567
7891011
df.ix[[1, 3]]
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
ABCD
64567
812131415
df.ix['5':'7']
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
ABCD
50123
64567
7891011
df.ix[['5','7']]
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
ABCD
50123
7891011
#如果是行/列的名称是int类型时,只能根据行/列名称选,不能根据行数/列数选:
df.index = range(1, 5)
df.ix[1]
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:3: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until





A    0
B    1
C    2
D    3
Name: 1, dtype: int64
df.index = range(11, 15) 
df.ix[2]   # KeyError: 2 

3.2,列

df.ix[:, 'C']   # 单列,下同
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





11     2
12     6
13    10
14    14
Name: C, dtype: int64
df.ix[:, 2] 
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





11     2
12     6
13    10
14    14
Name: C, dtype: int64
df.ix[:, 1: 3]   # 多列,下同
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
BC
1112
1256
13910
141314
df.ix[:, [1, 3]]
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
BD
1113
1257
13911
141315
df.ix[:, 'A':'C']
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
ABC
11012
12456
138910
14121314
df.ix[:, ['A','C']]
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
AC
1102
1246
13810
141214

4,df[]

4.1,行
可以接收行名称或者行数,但必须是切片

df[0:1] 
ABCD
110123
df[0:3]      # 前3行 
ABCD
110123
124567
13891011
df[1:3]
ABCD
64567
7891011

行数切片时,如果行索引类型为int,则会根据行数来切片,不会根据行名称切片,这点与df.ix不同,例如:

df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=range(5, 9))
df[5:8]   # Empty DataFrame
ABCD
df[0:3]   # 第1至第3行
ABCD
50123
64567
7891011

4.2,列

只能接收列名称,可以是单个或列表

df[['A','B']]
AB
501
645
789
81213
df['A']        # A列
5     0
6     4
7     8
8    12
Name: A, dtype: int64
#5,df.x

df.A   # A列
5     0
6     4
7     8
8    12
Name: A, dtype: int64

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

智能学习者

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值