numpy随笔

最新推荐文章于 2024-08-08 02:08:22 发布

最白の白菜

最新推荐文章于 2024-08-08 02:08:22 发布

阅读量625

点赞数 1

分类专栏： # 机器学习文章标签： python numpy 机器学习

本文链接：https://blog.csdn.net/qq_43966129/article/details/121553634

版权

机器学习专栏收录该内容

21 篇文章 4 订阅

订阅专栏

本文介绍了numpy库的基本使用，包括加载数据、矩阵操作、数据选取、数组运算以及矩阵的数学变换。通过实例展示了如何读取csv文件、创建和操作数组、索引选取元素、矩阵乘法以及数学函数的运用。还涉及到了数据类型的转换、数组的切片、拼接、复制等操作，为数据预处理和分析提供了基础。

摘要由CSDN通过智能技术生成

numpy随笔

数据组成：一般数据是由长方形表格组成，数据是由每一个样本组成，每一行表示一个样本，每一列表示当前数据的指标。numpy就是专门做矩阵计算的。 notebook的好处就是可以随时随地运行一个代码块 shift + enter快捷键运行按住tab键自动补齐代码半小时掌握 Jupyter Notebook常用用法：https://baijiahao.baidu.com/s?id=1685474425246208044&wfr=spider&for=pc

# import numpy as np
# 用numpy 打开一个数据，第一个属性：文件位置，将代码和文件放在同一个路径下或指定绝对路径
# 第二个属性：分隔符，数据之间用逗号分离；第三个属性是用什么方式去读数据
world_alcohol = np.genfromtxt("F:/Python学习/唐宇迪-python数据分析与机器学习实战/课程资料/唐宇迪-机器学习课程资料\Python库代码（4个）/1-科学计算库numpy/world_alcohol.txt",delimiter=",",dtype = str)
print(type(world_alcohol))#ndarray是numpy的核心结构
print(world_alcohol)#像是list的格式，把它当成一个二维矩阵
# 如果对某个函数不是很熟悉，例如不知道genfromtxt是干什么的
print(help(np.genfromtxt))

<class ‘numpy.ndarray’>
[[‘Year’ ‘WHO region’ ‘Country’ ‘Beverage Types’ ‘Display Value’]
[‘1986’ ‘Western Pacific’ ‘Viet Nam’ ‘Wine’ ‘0’]
[‘1986’ ‘Americas’ ‘Uruguay’ ‘Other’ ‘0.5’]
…
[‘1987’ ‘Africa’ ‘Malawi’ ‘Other’ ‘0.75’]
[‘1989’ ‘Americas’ ‘Bahamas’ ‘Wine’ ‘1.5’]
[‘1985’ ‘Africa’ ‘Malawi’ ‘Spirits’ ‘0.31’]]

# 使用numpy生成一个array数组
# 一维，vector向量
vector = np.array([5, 10, 15, 20])
# 二维，matrix矩阵
matrix = np.array([[5, 10, 15],[20,25,30],[35,40,45]])
print(vector)
print(matrix)

[ 5 10 15 20]
[[ 5 10 15]
[20 25 30]
[35 40 45]]

# .shape是描述创建出来的ndarray的结构，即矩阵的行与列分别对应多少
vector = np.array([1,2,3,4])
print(vector.shape)# 因为是一维数组，所以只会打印出来有几个元素
matrix = np.array([[5, 10, 15],[20,25,30]])# 两行三列
print(matrix.shape)

(4,)
(2, 3)

# 当定义numpy.array的时候，里面的数据类型必须是相同的结构。不像是list，里面可以是整型，也可以是字符串
numbers1 = np.array([1,2,3,4])
print(numbers1)
numbers1.dtype
# 只改变其中一个元素的数据类型，但是为了满足都是同一个类型，会将int转换成float
numbers2 = np.array([1,2,3,4.0])
print(numbers2)
numbers2.dtype

[1 2 3 4]
[1. 2. 3. 4.]
dtype(‘float64’)

# 按索引取数据,skip_header属性表示跳过页眉
world_alcohol = np.genfromtxt("F:/Python学习/唐宇迪-python数据分析与机器学习实战/课程资料/唐宇迪-机器学习课程资料\Python库代码（4个）/1-科学计算库numpy/world_alcohol.txt",delimiter=",",dtype = str,skip_header=1)
print(world_alcohol)
# 现在想获取第一行第四列的数据0.5（下标都是从0开始数）
uruguay_other_1986 = world_alcohol[1,4]
# 想获取Cte d'Ivoire
third_country = world_alcohol[2,2]
print(uruguay_other_1986)
print(third_country)

[[‘1986’ ‘Western Pacific’ ‘Viet Nam’ ‘Wine’ ‘0’]
[‘1986’ ‘Americas’ ‘Uruguay’ ‘Other’ ‘0.5’]
[‘1985’ ‘Africa’ “Cte d’Ivoire” ‘Wine’ ‘1.62’]
…
[‘1987’ ‘Africa’ ‘Malawi’ ‘Other’ ‘0.75’]
[‘1989’ ‘Americas’ ‘Bahamas’ ‘Wine’ ‘1.5’]
[‘1985’ ‘Africa’ ‘Malawi’ ‘Spirits’ ‘0.31’]]
0.5
Cte d’Ivoire

vector = np.array([5,10,15,20])
# 一维获取从0开始不包括3的数据
print(vector[0:3])

[ 5 10 15]

# 二维获取某一列数据数据
matrix = np.array([[5, 10, 15],[20,25,30],[35,40,45]])
# ：指的是所有行
print(matrix[:,1])

[10 25 40]

# 如果想选择两列
matrix = np.array([[5, 10, 15],[20,25,30],[35,40,45]])
# 相当于是所有行，第0列和第1列.0:2表示的是切片，包括头，但是不包含尾部。即是从0到1
print(matrix[:,0:2])

[[ 5 10]
[20 25]
[35 40]]

# 取某些行再取某些列
matrix = np.array([[5, 10, 15],[20,25,30],[35,40,45]])
# 取第1，2行，取0,1列
print(matrix[1:3,0:2])

[[20 25]
[35 40]]

# 在numpy中也有一些计算
vector = np.array([5, 10, 15, 20])
# 判断数组里有没有值是10的,对每一个元素都进行判断，比起for循环简直太方便了
vector == 10

array([False, True, False, False])

# 对于二维数组也是一样的
matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
matrix == 25

array([[False, False, False],
[False, True, False],
[False, False, False]])

# 用判断的方法返回的是一堆布尔值，但是想找出来等于某个值的数具体是什么
vector = np.array([5, 10, 15, 20])
equal_to_ten = (vector == 10)
print(equal_to_ten)
# 相当于将布尔值当成是索引
print(vector[equal_to_ten])

[False True False False]
[10]

matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
# 看全部元素的第二列有没有是25的
second_column_25 = (matrix[:,1] == 25)
print(second_column_25)
# 定位到等于true（即25）的那一行
print(matrix[second_column_25,:])

[False True False]
[[20 25 30]]

# 与或判断
vector = np.array([5, 10, 15, 20])
equal_to_ten_and_five = (vector == 10) & (vector == 5)
print(equal_to_ten_and_five)
vector = np.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
print(equal_to_ten_or_five)

[False False False False]
[ True True False False]

# 数组整体类型的改变
vector = np.array(["1", "2", "3"])
print(vector.dtype) 
print(vector)
# 进行值类型的转换
vector = vector.astype(float)
print(vector.dtype) 
print(vector)

<U1
[‘1’ ‘2’ ‘3’]
float64
[1. 2. 3.]

# 对数组进行求极值
vector = np.array([5, 10, 15, 20])
# array的一个属性min
vector.min()

matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
# 按照行进行求和，axis=1表示维度行
matrix.sum(axis=1)
# 按照列进行求和，axis=0表示维度列
matrix.sum(axis=0)

array([60, 75, 90])

# 当构造一个矩阵之后，要对矩阵做一些变换
# 构造一个array，里面有15个元素
print(np.arange(15))
# 想把它转换成矩阵的形式,3行5列
a = np.arange(15).reshape(3,5)
a

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])

a.shape

(3, 5)

# ndim表示矩阵的维度
a.ndim

# 里面的数据是什么类型
a.dtype.name

‘int32’

a.size

# 初始化一个空矩阵，3行4列，参数是元组的格式
np.zeros((3,4))

array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])

# 三维全是1的矩阵,同时指定类型
np.ones((2,3,4),dtype = np.int32)

array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],

[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])

# 构造数组序列，从10开始到30结束（小于不等于，不包含尾部），每个数差5
np.arange(10,30,5)

array([10, 15, 20, 25])

# 进入random模块，使用random函数，创造2行3列的矩阵
np.random.random((2,3))

array([[0.92493211, 0.86573647, 0.42330079],
[0.34714776, 0.73032038, 0.49993023]])

from numpy import pi
# 指定一个区间，得到100个数，这100个数的间隔是平均取的
np.linspace(0,2*pi,100)

array([0. , 0.06346652, 0.12693304, 0.19039955, 0.25386607,
0.31733259, 0.38079911, 0.44426563, 0.50773215, 0.57119866,
0.63466518, 0.6981317 , 0.76159822, 0.82506474, 0.88853126,
0.95199777, 1.01546429, 1.07893081, 1.14239733, 1.20586385,
1.26933037, 1.33279688, 1.3962634 , 1.45972992, 1.52319644,
1.58666296, 1.65012947, 1.71359599, 1.77706251, 1.84052903,
1.90399555, 1.96746207, 2.03092858, 2.0943951 , 2.15786162,
2.22132814, 2.28479466, 2.34826118, 2.41172769, 2.47519421,
2.53866073, 2.60212725, 2.66559377, 2.72906028, 2.7925268 ,
2.85599332, 2.91945984, 2.98292636, 3.04639288, 3.10985939,
3.17332591, 3.23679243, 3.30025895, 3.36372547, 3.42719199,
3.4906585 , 3.55412502, 3.61759154, 3.68105806, 3.74452458,
3.8079911 , 3.87145761, 3.93492413, 3.99839065, 4.06185717,
4.12532369, 4.1887902 , 4.25225672, 4.31572324, 4.37918976,
4.44265628, 4.5061228 , 4.56958931, 4.63305583, 4.69652235,
4.75998887, 4.82345539, 4.88692191, 4.95038842, 5.01385494,
5.07732146, 5.14078798, 5.2042545 , 5.26772102, 5.33118753,
5.39465405, 5.45812057, 5.52158709, 5.58505361, 5.64852012,
5.71198664, 5.77545316, 5.83891968, 5.9023862 , 5.96585272,
6.02931923, 6.09278575, 6.15625227, 6.21971879, 6.28318531])

# array的数学运算
a = np.array([20,30,40,50])
b = np.arange(4)
print(a)
print(b)
# 两个array对应的shape一样，两个矩阵对应位置相减
c = a - b
print(c)
# 如果对应的shape不一样，每一个位置都减去对应的数
c = c - 1
print(c)
# 数组的每一个数都进行平方操作
b**2
print(b**2)
# 判断
print(a<25)

[20 30 40 50]
[0 1 2 3]
[20 29 38 47]
[19 28 37 46]
[0 1 4 9]
[ True False False False]

# 矩阵相乘，一个是*，一个是.dot
A = np.array( [[1,1],
               [0,1]] )
B = np.array( [[2,0],
               [3,4]] )
print(A)
print("--------")
print(B)
print("--------")
# 对应位置相乘
print(A*B)
print("--------")
# 矩阵乘法，第一行乘以第一列，第二列。。。。
print(A.dot(B))
print("--------")
print(np.dot(A, B))

[[1 1]
 [0 1]]
--------
[[2 0]
 [3 4]]
--------
[[2 0]
 [0 4]]
--------
[[5 4]
 [3 4]]
--------
[[5 4]
 [3 4]]

# 数学运算
import numpy as np
B = np.arange(3)
print(B)
print(np.exp(B))
print(np.sqrt(B))

[0 1 2]
[1. 2.71828183 7.3890561 ]
[0. 1. 1.41421356]

# 矩阵操作
# 对三行四列的矩阵填入随机值，[-1,+1] 上乘以10向下取整
a = np.floor(10*np.random.random((3,4)))
print(a)
print("---------")
# 将向量拉成矩阵reshape
# 将矩阵拉成向量
print(a.ravel())
print("---------")
# 变成6行2列的矩阵
a.shape = (6,2)
print(a)
print("---------")
# 求转置，行列转换
print(a.T)
print("---------")
# 让矩阵的行数变为3，实际上列数就确定下来了，另一个维度就可以默认计算，写成-1
a.reshape(3,-1)

[[4. 0. 6. 8.]
 [4. 9. 9. 2.]
 [2. 6. 5. 0.]]
---------
[4. 0. 6. 8. 4. 9. 9. 2. 2. 6. 5. 0.]
---------
[[4. 0.]
 [6. 8.]
 [4. 9.]
 [9. 2.]
 [2. 6.]
 [5. 0.]]
---------
[[4. 6. 4. 9. 2. 5.]
 [0. 8. 9. 2. 6. 0.]]
---------

array([[4., 0., 6., 8.],
       [4., 9., 9., 2.],
       [2., 6., 5., 0.]])

# 将两种数据拼接
a = np.floor(10*np.random.random((2,2)))
b = np.floor(10*np.random.random((2,2)))
print(a)
print("---------")
print(b)
print("---------")
# 如果想横着拼
print(np.hstack((a,b)))
# 竖着拼接
print(np.vstack((a,b)))

[[5. 8.]
 [4. 5.]]
---------
[[5. 6.]
 [6. 4.]]
---------
[[5. 8. 5. 6.]
 [4. 5. 6. 4.]]
[[5. 8.]
 [4. 5.]
 [5. 6.]
 [6. 4.]]

# 对矩阵切分
a = np.floor(10*np.random.random((2,12)))
print(a)
print("---------")
# 把数组按行平均分成三份
print(np.hsplit(a,3))
print("---------")
# 对指定位置切分，在第三块和第四块切一刀
print(np.hsplit(a,(3,4)))
print("---------")
# 把数组按列平均分成三份
a = np.floor(10*np.random.random((12,2)))
print(a)
print("---------")
print(np.vsplit(a,3))

[[6. 1. 8. 4. 7. 6. 0. 0. 4. 3. 5. 2.]
 [4. 4. 3. 9. 8. 1. 4. 7. 4. 1. 0. 4.]]
---------
[array([[6., 1., 8., 4.],
       [4., 4., 3., 9.]]), array([[7., 6., 0., 0.],
       [8., 1., 4., 7.]]), array([[4., 3., 5., 2.],
       [4., 1., 0., 4.]])]
---------
[array([[6., 1., 8.],
       [4., 4., 3.]]), array([[4.],
       [9.]]), array([[7., 6., 0., 0., 4., 3., 5., 2.],
       [8., 1., 4., 7., 4., 1., 0., 4.]])]
---------
[[4. 5.]
 [4. 0.]
 [7. 1.]
 [6. 6.]
 [4. 8.]
 [3. 0.]
 [0. 4.]
 [6. 5.]
 [7. 2.]
 [7. 6.]
 [4. 0.]
 [6. 5.]]
---------
[array([[4., 5.],
       [4., 0.],
       [7., 1.],
       [6., 6.]]), array([[4., 8.],
       [3., 0.],
       [0., 4.],
       [6., 5.]]), array([[7., 2.],
       [7., 6.],
       [4., 0.],
       [6., 5.]])]

# 对矩阵的复制操作
a = np.arange(12)
b = a
print(b is a)
b.shape = (3,4)
# 将b改变之后，a也会发生变化
print(a.shape)
# 两个id值一模一样，说明它们就是名字不同，但是指向的区域是一样的
print(id(a))
print(id(b))

True
(3, 4)
1852769593024
1852769593024

# 潜复制
c = a.view()
print(c is a)
c.shape = (2,6)
print(a.shape)
c[0,4] = 1234
# 当把c的某一个值发生变化，a也会变。
print(a)
# 说明c和a虽然指向不同的东西，但是它们共用了一堆值
print(id(a))
print(id(b))

False
(3, 4)
[[ 0 1 2 3]
[1234 5 6 7]
[ 8 9 10 11]]

# 想要复制之后，二者不相关
d = a.copy()
print(d is a)
d[0,0] = 9999
print(d)
print(a)

False
[[9999 1 2 3]
[1234 5 6 7]
[ 8 9 10 11]]
[[ 0 1 2 3]
[1234 5 6 7]
[ 8 9 10 11]]

data = np.sin(np.arange(20)).reshape(5,4)
print(data)
# 如果想找哪个位置的值最大，返回它的索引.axis=0按列进行看
ind = data.argmax(axis=0)
# 对于第一列是第二行的数最大，以此类推
print(ind)
# 可以利用索引找出对应的值
data_max = data[ind,range(data.shape[1])]
print(data_max)

[[ 0. 0.84147098 0.90929743 0.14112001]
[-0.7568025 -0.95892427 -0.2794155 0.6569866 ]
[ 0.98935825 0.41211849 -0.54402111 -0.99999021]
[-0.53657292 0.42016704 0.99060736 0.65028784]
[-0.28790332 -0.96139749 -0.75098725 0.14987721]]
[2 0 3 1]
[0.98935825 0.84147098 0.99060736 0.6569866 ]

a = np.arange(0, 40, 10)
print(a)
# 对a进行扩展，行变成原来的两倍，宽变成原来的两倍
b = np.tile(a,(2,2))
print(b)
c = np.tile(a,(4,3))
print(c)

[ 0 10 20 30]
[[ 0 10 20 30 0 10 20 30]
[ 0 10 20 30 0 10 20 30]]
[[ 0 10 20 30 0 10 20 30 0 10 20 30]
[ 0 10 20 30 0 10 20 30 0 10 20 30]
[ 0 10 20 30 0 10 20 30 0 10 20 30]
[ 0 10 20 30 0 10 20 30 0 10 20 30]]

a = np.array([[4, 3, 5], [1, 2, 1]])
# 对当前的数组进行排序,axis=1按行排序
b = np.sort(a,axis=1)
print(b)

[[3 4 5]
[1 1 2]]

a = np.array([4, 3, 1, 2])
# 对数组的数从小到大排序，并返回对应数的下标
j = np.argsort(a)
print(j)
print("----------")
# 将索引传入数组，就可以得到排序后的结果
print(a[j])

[2 3 1 0]
----------
[1 2 3 4]

最白の白菜

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录