python pandas basics

最新推荐文章于 2024-07-25 00:08:05 发布

eggplant323

最新推荐文章于 2024-07-25 00:08:05 发布

阅读量122

点赞数

分类专栏： python 文章标签： python pandas numpy

本文链接：https://blog.csdn.net/eggplant323/article/details/128223913

版权

python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

python 知识点整理（六）

本文只是对python部分知识点进行学习和整理
本篇主要是针对python的numpy basics的总结

相较于list，numpy有着更快速的效率
reason：1.fixed type 2.contiguous memory
numpy some advantages: ndarray多维阵列/ math function/tools for reading writing to disk/ 更多的数学处理功能/API

creating ndarrays

np.array

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

copy: if the object needs to be copied
order:C is row direction;F is column direction;A is arbitrary direction
subok: an array consistent with the base class typpe is retruned
ndmin:minimum dimension of the denerated array

import numpy as np
data2=[[1,2,3,4],[5,6,7,8]]
arr2=np.array(data2)
print(arr2)
print(arr2.ndim)
print(arr2.shape)

[[1 2 3 4]
 [5 6 7 8]]
2
(2, 4)

np.dtype

numpy.dtype(object,align,copy)

align:if true, fill the field
copy:copy the dtype object. false means a reference to build-in data type object

data3=np.random.randn(2,3)
print(data3)
print(data3*10)
print(data3+data3)
print(data3.shape)
print(data3.dtype)

[[ 1.75302313  1.11403706 -0.44986101]
 [-0.51547876 -0.22752194  0.44738767]]
[[17.53023126 11.14037063 -4.49861006]
 [-5.15478762 -2.2752194   4.47387673]]
[[ 3.50604625  2.22807413 -0.89972201]
 [-1.03095752 -0.45504388  0.89477535]]
(2, 3)
float64

np.empty

numpy.empty(shape, dtype = float, order = 'C')

空数组但是和全0数组是有区别的
shape:shape of the array(tuple)
order: row first/column firse

a=np.empty((1,2))
a

array([[3.18547019e+283, 4.78629679e-185]])

data type

arr3=np.array([1,2,3],dtype=np.float64)
arr4=np.array([1,2,3],dtype=np.int32)
print(arr3.dtype,arr4.dtype)

float64 int32

use the .astype to convert the data type

arr3_int=arr3.astype(np.int32)
arr3_int

array([1, 2, 3])

Arithmetic with NumPy Arrays

数组相关数学运算
并且相同大小的数组可以进行布尔比较

arr5=np.array([[1,2,3],[4,5,6]])
print(arr5*arr5)
print(arr5-arr5)
print(1/arr5)
print(arr5*0.5)
arr5_compare=np.array([[0,4,1],[7,2,12]])
print(arr5>arr5_compare)

[[ 1  4  9]
 [16 25 36]]
[[0 0 0]
 [0 0 0]]
[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]
[[0.5 1.  1.5]
 [2.  2.5 3. ]]
[[ True False  True]
 [False  True False]]

Indexing and Slicing

one dimension

arr6=np.arange(10)
print(arr6)
print(arr6[4])
print(arr6[4:7])
arr6[5:7]=99#chage the item value
print(arr6)

[0 1 2 3 4 5 6 7 8 9]
4
[4 5 6]
[ 0  1  2  3  4 99 99  7  8  9]

high dimension

高维进行索引的时候首先针对不再是单个的标量而是一个个小数组
单独的标量可以多次递归访问得到

arr_2d=np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr_2d[1])
print(arr_2d[0,1])
print(arr_2d[0][1])

[4 5 6]
2
2

3_D

arr_3d=np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(arr_3d)
arr_3d[0]=99
arr_3d

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
array([[[99, 99, 99],
        [99, 99, 99]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

index with slices

print(arr_2d)
arr_2d[1,:2]#第1行 前2列

[[1 2 3]
 [4 5 6]
 [7 8 9]]
array([4, 5])

arr_2d[:2,2]#前两行，第3列

array([3, 6])

arr_2d[:,:1]#所有行 前1列

array([[1],
       [4],
       [7]])

arr_2d[:2,1:]=99
arr_2d

array([[ 1, 99, 99],
       [ 4, 99, 99],
       [ 7,  8,  9]])

boolean indexing

可以通过对名字的索引从而得到一个布尔形式的结果并应用到后续的操作中

name =np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
data =np.random.randn(7,4)
print(data)
print(name=='Bob')
print(data[name=='Bob'])
print(data[name=='Bob',2:])
print(data[name=='Bob',3])

[[ 0.30105086  0.88459349  0.4694958  -0.132868  ]
 [ 0.36000461  1.1606915  -1.80743514  0.48794727]
 [ 0.12488756  0.55243284  0.39060315  0.53376568]
 [ 1.57220325  0.53586929 -0.71008634  0.32867587]
 [-0.13595332 -0.59006423  0.01553604 -0.65240547]
 [-0.2549401  -0.24149723  0.60159715 -0.22450749]
 [-1.06174782 -1.04370202 -0.14431594 -0.29345469]]
[ True False False  True False False False]
[[ 0.30105086  0.88459349  0.4694958  -0.132868  ]
 [ 1.57220325  0.53586929 -0.71008634  0.32867587]]
[[ 0.4694958  -0.132868  ]
 [-0.71008634  0.32867587]]
[-0.132868    0.32867587]

筛选条件时候可以使用 | 或者是 &

mask=(name=='Bob')|(name=='Will')
mask

array([ True, False,  True,  True,  True, False, False])

通过布尔形式进行赋值

data[data<0]=0
data

array([[0.30105086, 0.88459349, 0.4694958 , 0.        ],
       [0.36000461, 1.1606915 , 0.        , 0.48794727],
       [0.12488756, 0.55243284, 0.39060315, 0.53376568],
       [1.57220325, 0.53586929, 0.        , 0.32867587],
       [0.        , 0.        , 0.01553604, 0.        ],
       [0.        , 0.        , 0.60159715, 0.        ],
       [0.        , 0.        , 0.        , 0.        ]])

fancy indexing

fancy 主要针对的是numpy对整数数组的操作
并将新数据保存在新的数组当中

arr7=np.empty((8,4))
for i in range(8):
    arr7[i]=i
print(arr7)
#按照指定顺序选取数据
print(arr7[[4,3,0,6]])
#负数索引
print(arr7[[-3,-5,-7]])

[[0. 0. 0. 0.]
 [1. 1. 1. 1.]
 [2. 2. 2. 2.]
 [3. 3. 3. 3.]
 [4. 4. 4. 4.]
 [5. 5. 5. 5.]
 [6. 6. 6. 6.]
 [7. 7. 7. 7.]]
[[4. 4. 4. 4.]
 [3. 3. 3. 3.]
 [0. 0. 0. 0.]
 [6. 6. 6. 6.]]
[[5. 5. 5. 5.]
 [3. 3. 3. 3.]
 [1. 1. 1. 1.]]

arr8=np.arange(32).reshape(8,4)
print(arr8)
#指定4个数值位置
print(arr8[[1,5,7,2],[0,3,1,2]])
#读取指定行 列的位置根据要求更改
print(arr8[[1,5,7,2]][:,[0,3,1,2]])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]
 [24 25 26 27]
 [28 29 30 31]]
[ 4 23 29 10]
[[ 4  7  5  6]
 [20 23 21 22]
 [28 31 29 30]
 [ 8 11  9 10]]