numpy Series DataFrame索引的使用方法

最新推荐文章于 2024-07-08 19:59:20 发布

mo926983

最新推荐文章于 2024-07-08 19:59:20 发布

阅读量1.4k

点赞数

分类专栏： python 文章标签： python numpy

本文链接：https://blog.csdn.net/mo926983/article/details/106423993

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、numpy

1. 一维numpy数组的索引

arr=np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[5] # 数组名[索引号]

5

# 切片索引 数组名[起始索引号:终止索引号]  不包括终止索引号
arr[5:8]

array([5, 6, 7])

arr[5:8]=12   #直接修改原数组
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

2. 二维numpy数组的索引

arr2=np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2[2] #访问索引以（2）开头的值，包括（2,0），（2,1），（2，2）表示某一行的值

array([3, 6, 9])

arr2[:1]#访问数组的某几行

array([[1, 2, 3]])

arr2[0][2]  #前边表示行，后边表示列，可以访问具体的某个值与arr2[0,2]效果一样

3

arr2[:1,:2] #,前边表示行，后边表示列，可以访问具体的某个值和一个范围的值

array([[1, 2]])

3. 布尔型索引

names=np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data=np.random.randn(7,4)
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

data

array([[-0.29702806,  0.16529614, -0.72684132,  1.34725395],
       [-0.42805177, -1.26251365, -0.18073154, -0.49129737],
       [ 0.29208283,  0.36520126,  0.62905828,  1.00058399],
       [ 0.34855322,  0.28424986,  0.07021468, -0.11893882],
       [ 0.69535789, -0.11105581, -1.44341613,  1.45839566],
       [-0.08256376,  0.49573743, -0.9002769 , -2.49175582],
       [ 0.14072487,  1.57377442,  0.95774634,  0.04137535]])

names=='Bob'  #假设每个名字对应data数组中的一行，想选中对应于Bob的所有行。要保证比尔型数组长度与被索引的长度一致


array([ True, False, False,  True, False, False, False])

data[names=='Bob']

array([[0.        , 0.16529614, 0.        , 1.34725395],
       [0.34855322, 0.28424986, 0.07021468, 0.        ]])

data[data<0]=0  #data<0结果是一个布尔型数组
data

array([[0.        , 0.16529614, 0.        , 1.34725395],
       [0.        , 0.        , 0.        , 0.        ],
       [0.29208283, 0.36520126, 0.62905828, 1.00058399],
       [0.34855322, 0.28424986, 0.07021468, 0.        ],
       [0.69535789, 0.        , 0.        , 1.45839566],
       [0.        , 0.49573743, 0.        , 0.        ],
       [0.14072487, 1.57377442, 0.95774634, 0.04137535]])

二、Series

1. 索引

obj=pd.Series(np.arange(4),index=['a','b','c','d'])
obj

a    0
b    1
c    2
d    3
dtype: int32

obj[1] #对象名[索引的序号]

1

obj['b'] #对象名[索引名]

1

obj[['c','b']] #同时添加多个索引名时要把所有索引名放在一个列表中

b    1
c    2
dtype: int32

2.切片

obj[0:2] #索引数字切片，不包含右边界

a    0
b    1
dtype: int32

obj['b':'c']  #注：c,b不可以调换位置，否则查不出数据。左端的标签必须在右端的标签前边

b    1
c    2
dtype: int32

三、DataFrame

一、索引

数据：

import pandas as pd
data=pd.DataFrame(np.arange(16).reshape((4,4)),
                 index=['Ohio','Colorado','Utah','New York'],
                 columns=['one','two','three','four'])
data

	   		one	two	three	four
Ohio		0	 1	  2	     3
Colorado	4	 5	  6	     7
Utah	    8	 9	  10	 11
New York	12	 13	  14	 15

索引操作

用值或者序列对DataFrame进行索引获取一个或多个列

data['two']

Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

data[['two','one']]


			two	one
Ohio		1	0
Colorado	5	4
Utah		9	8
New York	13	12

选取行

data[:2]

			one	two	three	four
Ohio		0	1	2		3
Colorado	4	5	6		7

data[data['three']>5]#根据值选择

			one	two	three	four
Colorado	4	5	6		7
Utah		8	9	10		11
New York	12	13	14		15

用loc （轴标签）和iloc（整数索引）进行选取

data.loc['Colorado':'Utah','one':'three']
# 标签切片，包含左右边界
			one	two	three
Colorado	4	5	6
Utah		8	9	10

data.iloc[2,[3,0,1]]
#整数切片，用iloc和整数选取，不包含右边界
four    11
one      8
two      9
Name: Utah, dtype: int32

data.iloc[[1,2],[3,0,1]]

			four	one		two
Colorado	7		4		5
Utah		11		8		9

data.iloc[:,:3][data.three>5]
#data.three>5为布尔型索引
			one	two	three
Colorado	4	5	6
Utah		8	9	10
New York	12	13	14

整数索引都不包含右边界

同样的Series对象使用三种索引的区别

ser=pd.Series(np.arange(5))
ser

0    0
1    1
2    2
3    3
4    4
dtype: int32

ser[2] #得到的是值

2

ser.loc[2:2]#返回的Series对象，不包括右边界

2    2
dtype: int32

ser.iloc[2:3]#返回的Series对象，包括右边界

2    2
dtype: int32

mo926983

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录