numpy 是 python科学计算的核心库。PYTHON里涉及到科学计算的包括Pandas,sklearn等都是基于numpy进行二次开发包装的。numpy功能非常强大,其余scipy构建了强大的PYTHON数理计算功能,函数接口丰富复杂。
对于本次课程来说,我们重点学习的是以下几点:
1. 数组的定义和应用
2. 数组元素的索引选取
3. 数组的计算
4. 线性代数的运行计算
Arrays
array用来存储同类型的序列数据,能够被非负整数进行索引。 维度的数量就是array的秩(rank)。
我们可以通过python的列表来创建array,并且通过方括号进行索引获取元素
import numpy as np
a = np.array([1,3,4,6,10])
print(a)
print(a.size)
print(a.shape)
print(a[2])
[ 1 3 4 6 10]
5
(5,)
4
# 二维数组
b = np.array([[[1,2,3,4],[5,6,7,8]]])
print(b.shape)
#b[0,1]
(1, 2, 4)
创建Array
numpy提供了内置的函数来创建一些特殊的数组
np.zeros(3)
array([ 0., 0., 0.])
np.ones([3,3])
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
b.shape
(1, 2, 4)
np.zeros_like(b)
array([[[0, 0, 0, 0],
[0, 0, 0, 0]]])
np.eye(3)
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Array的常用属性和方法
- 统计计算
- 排序
- 按照大小查索引
- 条件查找
- shape
a = np.random.rand(3,4)
a.shape
(3, 4)
a.size
12
len(a)
3
a
array([[ 0.36963134, 0.12590815, 0.52912576, 0.38604634],
[ 0.98066039, 0.93271032, 0.30694261, 0.58081517],
[ 0.85971519, 0.89180773, 0.39815457, 0.73372857]])
np.sum(a)
np.sum(a,axis = 1)
np.sum(a,axis = 0)
array([ 2.21000691, 1.95042619, 1.23422294, 1.70059008])
np.mean(a)
np.std(a)
0.27852909235886786
a
array([[ 0.36963134, 0.12590815, 0.52912576, 0.38604634],
[ 0.98066039, 0.93271032, 0.30694261, 0.58081517],
[ 0.85971519, 0.89180773, 0.39815457, 0.73372857]])
# 排序
np.sort(a,axis = 1)
array([[ 0.12590815, 0.36963134, 0.38604634, 0.52912576],
[ 0.30694261, 0.58081517, 0.93271032, 0.98066039],
[ 0.39815457, 0.73372857, 0.85971519, 0.89180773]])
# Returns the indices that would sort this array.
a.argsort()
array([[1, 0, 3, 2],
[2, 3, 1, 0],
[2, 3, 0, 1]], dtype=int64)
a
array([[ 0.36963134, 0.12590815, 0.52912576, 0.38604634],
[ 0.98066039, 0.93271032, 0.30694261, 0.58081517],
[ 0.85971519, 0.89180773, 0.39815457, 0.73372857]])
# Returns the indices of the maximum values along an axis.
np.argmax(a,axis = 1)
array([2, 0, 1], dtype=int64)
np.max(a,axis = 1)
array([ 0.52912576, 0.98066039, 0.89180773])
a
array([[ 0.36963134, 0.12590815, 0.52912576, 0.38604634],
[ 0.98066039, 0.93271032, 0.30694261, 0.58081517],
[ 0.85971519, 0.89180773, 0.39815457, 0.73372857]])
# Return elements, either from `x` or `y`, depending on `condition`.
# If only `condition` is given, return ``condition.nonzero()``
np.where(a>0.5)
(array([0, 1, 1, 1, 2, 2, 2], dtype=int64),
array([2, 0, 1, 3, 0, 1, 3], dtype=int64))
随机数
numpy可以根据一定的规则创建随机数,随机数的使用会在后面概率论,数据挖掘的时候经常用到。
官方主页RANDOM
常用的一些方法:
- rand(d0, d1, …, dn) Random values in a given shape.
- randn(d0, d1, …, dn) Return a sample (or samples) from the “standard normal” distribution.
- randint(low[, high, size, dtype]) Return random integers from low (inclusive) to high (exclusive).
- random([size]) Return random floats in the half-open interval [0.0, 1.0).
- sample([size]) Return random floats in the half-open interval [0.0, 1.0).
- choice(a[, size, replace, p]) Generates a random sample from a given 1-D array
np.random.rand(10)
np.random.rand(3,4)
array([[ 0.66871582, 0.41359784, 0.06186174, 0.91262814],
[ 0.10415888, 0.74117872, 0.28998329, 0.73763488],
[ 0.76904933, 0.92487812, 0.9111976 , 0.00709124]])
np.random.randn(10)
array([ 2.45296079, 1.59713311, 0.84757927, 0.27085421, 0.62772085,
-0.02441075, -1.79474675, -0.869072 , -0.74012579, 0.34411744])
np.random.randint(10)
np.random.randint(1,10,size = (3,4))
array([[4, 6, 5, 8],
[8, 6, 7, 5],
[6, 7, 4, 1]])
np.random.random((2,2))
array([[ 0.75932906, 0.71121568],
[ 0.90087898, 0.48370479]])
np.random.choice(10,(3,4))
array([[5, 7, 2, 6],
[1, 5, 3, 1],
[8, 1, 8, 7]])
np.random.choice([1,4,5,7.08],(3,4))
array([[ 5. , 7.08, 7.08, 4. ],
[ 4. , 4. , 7.08, 4. ],
[ 7.08, 7.08, 7.08, 7.08]])
?np.random.choice
数组的索引
切片选取类似于list,但是array可以是多维度的,因此我们需要指定每一个维度上的操作
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) # 2维数组,shape = 3 * 4
a[1:3,0:1]
#a[:,:1]
array([[5],
[9]])
整数索引
a[[1,2],[0,1]]
array([ 5, 10])
布尔型索引
a >4
a[a>4]
array([ 5, 6, 7, 8, 9, 10, 11, 12])
图解索引
数组数学
a = np.random.random([3,4])
b = np.random.random([3,4])
a
array([[ 0.76004347, 0.4258463 , 0.08326275, 0.93285095],
[ 0.87100438, 0.89512213, 0.66405053, 0.37225536],
[ 0.59545034, 0.41663924, 0.51195997, 0.77346328]])
a + 2
array([[ 2.48043437, 2.3062315 , 2.37038885, 2.24346901],
[ 2.57617282, 2.45257504, 2.59148344, 2.9932576 ],
[ 2.01946187, 2.9662433 , 2.59164076, 2.89874224]])
a * 10
array([[ 7.60043471, 4.25846302, 0.83262751, 9.32850955],
[ 8.71004379, 8.95122127, 6.64050534, 3.72255362],
[ 5.95450342, 4.16639238, 5.11959969, 7.73463281]])
b
array([[ 0.56674767, 0.83059901, 0.08406071, 0.60134785],
[ 0.68305575, 0.85945331, 0.50625002, 0.65044408],
[ 0.00539243, 0.39640508, 0.43254736, 0.94011285]])
# Elementwise
a + b
a - b
a * b
a / b
array([[ 1.34106147, 0.51269782, 0.99050729, 1.5512668 ],
[ 1.27515856, 1.04150175, 1.31170472, 0.57230955],
[ 110.42337422, 1.05104414, 1.18359286, 0.82273451]])
# Elementwise
np.add(a,b)
np.subtract(a,b)
np.multiply(a,b)
np.divide(a,b)
array([[ 0.720169 , 0.67768136, 1.59930077, 0.40340647],
[ 0.78913739, 0.55002961, 0.7516446 , 0.42565485],
[ 1.71390674, 0.76294821, 3.74288752, 0.50604784]])
*是元素力度的计算(Elementwise),并不是矩阵计算。我们使用dot函数进行内积求解
# shape(a) = 3*4 shape(b.T) = 4*3
a.dot(b.T) # (3*4) * (4*3) = 3 * 3
np.dot(a,b.T)
array([[ 1.35242743, 1.53406623, 1.08590637],
[ 1.51680278, 1.94256711, 0.99672314],
[ 1.19168644, 1.52708211, 1.11695854]])
线性代数
numpy和scipy可以进行线性代数的计算,但是我们目前还没补充线性代数知识。因此这一章节我们会挪动到 线性代数 理论知识章节进行讲解!