panda处理CSV文件

最新推荐文章于 2024-07-09 16:25:58 发布

智能学习者

最新推荐文章于 2024-07-09 16:25:58 发布

阅读量1k

点赞数 1

分类专栏： csv文件处理文章标签： python 机器学习 csv pandas

本文链接：https://blog.csdn.net/qq_43790749/article/details/114897569

版权

csv文件处理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

也可参考：机器学习中,python如何使用pandas提取数据并把数据分成训练集和测试集*

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=list('5678'))
df

	A	B	C	D
5	0	1	2	3
6	4	5	6	7
7	8	9	10	11
8	12	13	14	15

主要介绍了五种pandas访问CSV文件数据的方法：

1）loc，行名称列名称
2）iloc，行数列数
3）ix，行名称列名称/行数列数混合
4) df.x，选择单列
5）df[]

1，loc，行名称列名称

1.1，行

df.loc['6']           # 行名称为6的行，即第2行，即4 5 6 7

A    4
B    5
C    6
D    7
Name: 6, dtype: int64

df.loc['5':'7']

	A	B	C	D
5	0	1	2	3
6	4	5	6	7
7	8	9	10	11

      # 行名称为5至7的行，即前3行，注意是前闭后闭
df.loc[['5', '7']] 
# 行名称为5和7的行，前第1第3行

	A	B	C	D
5	0	1	2	3
7	8	9	10	11

1.2，列

df.loc[:, 'B']   # 列名称为'B‘的列，即第2列，1 5 9 13

5     1
6     5
7     9
8    13
Name: B, dtype: int64

df.loc[:, 'A':'C']    # 列名称为A至C的列，即前3列，注意是前闭后闭

	A	B	C
5	0	1	2
6	4	5	6
7	8	9	10
8	12	13	14

df.loc[:, ['A', 'C']] # 列名称为A和C的列，即第1列第3列

	A	C
5	0	2
6	4	6
7	8	10
8	12	14

1.3，块

df.loc['5':'7', 'A':'C']       # 行名称为5至7，列名称为A~C的一块数据，即前3行前3列

	A	B	C
5	0	1	2
6	4	5	6
7	8	9	10

df.loc[['5', '7'], ['A', 'C']] # 行名称为5和7，列名称为A和C的一块数据

	A	C
5	0	2
7	8	10

df.loc['5':'7', ['A', 'C']]    # 行名称5至7，列名称A和C的一块数据

	A	C
5	0	2
6	4	6
7	8	10

df.loc[['5', '7'], 'A':'C']    # 行名称为5和7，列名称为A~C的一块数据

	A	B	C
5	0	1	2
7	8	9	10

1.4，单元格

df.loc['5', 'A']   # 行名称为5，列名称为A的单元格数据

df.at['5', 'A']    # 同loc但速度快点

2，iloc，行数列数

2.1，行

df.iloc[1]          # 第2行，即行名称为6的行，4 5 6 7

A    4
B    5
C    6
D    7
Name: 6, dtype: int64

df.iloc[0:3]        # 前3行

	A	B	C	D
5	0	1	2	3
6	4	5	6	7
7	8	9	10	11

df.iloc[[0, 3]]     # 第1第4行

	A	B	C	D
5	0	1	2	3
8	12	13	14	15

2.2，列

df.iloc[:, 1]       # 第2列，即列名称为'B'的列，1 4 9 13

5     1
6     5
7     9
8    13
Name: B, dtype: int64

df.iloc[:, 0:3]     # 前3列

	A	B	C
5	0	1	2
6	4	5	6
7	8	9	10
8	12	13	14

df.iloc[:, [0, 3]]  # 第1第4列

	A	D
5	0	3
6	4	7
7	8	11
8	12	15

2.3，块

df.iloc[0:3, 0:3]          # 前3行，前3列的一块数据

	A	B	C
5	0	1	2
6	4	5	6
7	8	9	10

df.iloc[[0, 3],  [0, 3]]   # 第1第4行，第1第4列的一块数据

	A	D
5	0	3
8	12	15

df.iloc[0:3, [0, 3]]       # 前3行，第1第4列的一块数据

	A	D
5	0	3
6	4	7
7	8	11

df.iloc[[0, 3], 0:3]       # 第1第4行，前3列的一块数据

	A	B	C
5	0	1	2
8	12	13	14

2.4，单元格

df.iloc[1, 1]   # 第1行，第1列的单元格

df.iat[1, 1]    # 同iloc但速度快点

3，df.ix，行名称列名称/行数列数混合

3.1，行

df.ix['7']    # 单行，下同

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





A     8
B     9
C    10
D    11
Name: 7, dtype: int64

print(df.ix[2])
df.ix[2].values

A     8
B     9
C    10
D    11
Name: 7, dtype: int64


/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:2: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  





array([ 8,  9, 10, 11])

df.ix[1: 3]   # 多行，下同

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	B	C	D
6	4	5	6	7
7	8	9	10	11

df.ix[[1, 3]]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	B	C	D
6	4	5	6	7
8	12	13	14	15

df.ix['5':'7']

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	B	C	D
5	0	1	2	3
6	4	5	6	7
7	8	9	10	11

df.ix[['5','7']]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	B	C	D
5	0	1	2	3
7	8	9	10	11

#如果是行/列的名称是int类型时，只能根据行/列名称选，不能根据行数/列数选：
df.index = range(1, 5)
df.ix[1]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:3: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until





A    0
B    1
C    2
D    3
Name: 1, dtype: int64

df.index = range(11, 15) 
df.ix[2]   # KeyError: 2

3.2，列

df.ix[:, 'C']   # 单列，下同

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





11     2
12     6
13    10
14    14
Name: C, dtype: int64

df.ix[:, 2]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.





11     2
12     6
13    10
14    14
Name: C, dtype: int64

df.ix[:, 1: 3]   # 多列，下同

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	B	C
11	1	2
12	5	6
13	9	10
14	13	14

df.ix[:, [1, 3]]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	B	D
11	1	3
12	5	7
13	9	11
14	13	15

df.ix[:, 'A':'C']

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	B	C
11	0	1	2
12	4	5	6
13	8	9	10
14	12	13	14

df.ix[:, ['A','C']]

/home/xiaohaipeng/.local/lib/python3.5/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

	A	C
11	0	2
12	4	6
13	8	10
14	12	14

4，df[]

4.1，行
可以接收行名称或者行数，但必须是切片

df[0:1]

	A	B	C	D
11	0	1	2	3

df[0:3]      # 前3行

	A	B	C	D
11	0	1	2	3
12	4	5	6	7
13	8	9	10	11

df[1:3]

	A	B	C	D
6	4	5	6	7
7	8	9	10	11

行数切片时，如果行索引类型为int，则会根据行数来切片，不会根据行名称切片，这点与df.ix不同，例如：

df = pd.DataFrame(np.arange(16).reshape(4, 4), columns=list('ABCD'), index=range(5, 9))
df[5:8]   # Empty DataFrame

	A	B	C	D

df[0:3]   # 第1至第3行

	A	B	C	D
5	0	1	2	3
6	4	5	6	7
7	8	9	10	11

4.2，列

只能接收列名称，可以是单个或列表

df[['A','B']]

	A	B
5	0	1
6	4	5
7	8	9
8	12	13

df['A']        # A列

5     0
6     4
7     8
8    12
Name: A, dtype: int64

#5，df.x

df.A   # A列

5     0
6     4
7     8
8    12
Name: A, dtype: int64

智能学习者

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录