Python学习笔记之(三)——强大的数值计算利器 Numpy
(首发日期:2018年01月12日18:33:52 更新日期:2018年01月12日18:33:56)
多维数组对象
1.创建ndarray
最简单的创建数组的方式是使用 array 函数。
创建数组
import numpy as np
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1
array([ 6. , 7.5, 8. , 0. , 1. ])
嵌套序列,如等长列表的列表,将会转化为一个多维数组:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2
array([[1, 2, 3, 4], [5, 6, 7, 8]])
arr2.shape
(2, 4)
arr2.dtype
dtype(‘int64’)
arr2.ndim
2
arr2.itemsize
8
除 np.array 之外,还有许多函数来创建新的数组。例如, zeros 和 ones 使用给定的长度或形状分别的创建0‘s 和 1‘s数组。 empty 会创建一个没有使用特定值来初始化的数组。
zero_array = np.zeros(10)
zero_array
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
zero_array2d = np.zeros((2,3))
zero_array2d
array([[ 0., 0., 0.], [ 0., 0., 0.]])
zero_array2d.shape
(2, 3)
zero_array2d.ndim
2
zero_array3d = np.empty((2, 3, 4))
zero_array3d
array([[[ 6.92462113e-310, 4.67814784e-310, 1.05089879e-153,
2.65862903e-260],
[ 1.27827550e-152, 1.39806876e-152, 6.07886271e+247,
1.36722786e+161],
[ 3.55455412e+180, 6.19640467e+223, 2.25563599e-153,
7.20310877e+252]],
[[ 9.01700787e+223, 1.96567740e-062, 1.94299459e-109,
1.35717430e+131],
[ 8.41799468e-053, 5.46454592e-095, 6.01347002e-154,
8.41799468e-053],
[ 5.46454592e-095, 6.01347002e-154, 8.41799468e-053,
5.55994737e+141]]])
3维数组,最内层的数据数量在shape的最右侧,最外层的在最左侧。
zero_array3d.shape
(2, 3, 4)
arange 是Python内建 range 函数的数组版本:
arange_array = np.arange(15)
arange_array
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
#arange_array = np.arange(15).shape(3,5)
#arange_array
arange_array = np.arange(15).reshape(3,5)
arange_array
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
输入两个参数试试,第一个参数是起始点,第二个参数是终止点
arange_array = np.arange(2,15)
arange_array
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
2.ndarray的数据类型
虽然你一定已经知道了数据类型是怎么回事,在python当中是怎么设定和查询显示数据类型,但是这里仍旧要强调一下: Dtypes是使NumPy如此强大和灵活的一部分。在大多数情况下,它们直接映射到底层的机器表示,这使得很容易地读取和写入二进制流到磁盘上,也能链接低级语言,如C 或Fortran编写的代码。
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr1.dtype
arr2.dtype
dtype(‘float64’)
dtype(‘int32’)
- 多行运行结果显示
但是不爽的是,如果你不特殊设置一下,你就只能看到“dtype(‘int32’)”,而arry1的类型没有显示出来,怎么办?
打开终端,运行“gedit ~/.ipython/profile_default/ipython_config.py”
写入:
c = get_config()
# Run all nodes interactively
c.InteractiveShell.ast_node_interactivity = "all"
好了,重启jupyter服务,现在再运行上面的语句,每个都可以显示了!
- 打开语法帮助
我们经常遇见一些类的函数不知道怎么用,语法是什么样子的,这个就可以直接显示了:库、方法或变量的前面打上?,即可打开相关语法的帮助文档
?np.arange
- 运行python文件
有木有太兴奋?好了,还有更兴奋的——运行python文件!
这有啥兴奋的,前面不都是这样干的吗?拷贝一下,粘贴过来,运行!
偶,不是的,是直接调用一个已经成型的py文件,而不是粘贴过来,那样太low了!
%run 可以运行.py格式的python代码——这是众所周知的。不那么为人知晓的事实是它也可以运行其它的jupyter notebook文件,这一点很有用。
%run hello.py
this sentence is printed by hello.py
%run newhello.ipynb
new hello
Stored 'myworldstr' (str)
NameError Traceback (most recent call last)
/media/lucky/B4FE-5315/wt/study/python/DOC/python_basic/newhello.ipynb in ()
1 del myworldstr
—-> 2 myworldstr
NameError: name ‘myworldstr’ is not defined
- %load 从外部脚本中插入代码
该操作用外部脚本替换当前cell。可以使用你的电脑中的一个文件作为来源,也可以使用URL
# %load hello.py
print('hello')
hello
if __name__ == "__main__":
print("Hello World!")
Hello World!
- store 存储变量
可以在notebook之间传递变量。参看newhello.ipynb
%store -r myworldstr
myworldstr
‘hello lucky’
- %who: 列出所有的全局变量
%who
arange_array arr1 arr2 data1 data2 myworldstr np zero_array zero_array2d
zero_array3d
%who str
myworldstr
- 计时
%%time
%%time
import time
for _ in range(1000):
time.sleep(0.01)# sleep for 0.01 seconds
CPU times: user 40 ms, sys: 16 ms, total: 56 ms
Wall time: 10.1 s
%timeit
%timeit np.random.normal(size=100)
12.2 µs ± 92.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%prun np.random.normal(size=100)
#添加觉得运行迟钝的程序函数
#%pdb
#def pick_and_take():
# picked = np.random.randint(0, 10)
# raise NotImplementedError()
#pick_and_take()
LATEX公式
P(A∣B)=P(B∣A)∗P(A)P(B)
f(x)=3x+7在notebook内用不同的内核运行代码
%%bash
%%HTML
%%python2
%%python3
%%ruby
%%perl
%%python2
print 'hello'
hello
#%load_ext Cpp
#%load_ext rpy2.ipython
%lsmagic
Available line magics:
%alias %alias_magic %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %profile %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode
Available cell magics:
%%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile
Automagic is ON, % prefix IS NOT needed for line magics.
%load_ext Cython
%%cython
def multiply_by_2(float x):
return 2.0 * x
multiply_by_2(23.)
46.0
%prun print('hello')
hello
- 运行终端命令
!ls
hello.py pythonstudy_Numpy.ipynb some_array.npy
newhello.ipynb Python study.odt Untitled.ipynb
Python study.docx PythonStudy_Pandas.html
pythonstudy_Numpy.html PythonStudy_Pandas.ipynb
!pip list | grep pandas
[31mDEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.[0m
pandas (0.21.0)
跳转点执行
Hello jump使用ndarray的 astype 方法显示的把一个数组的dtype转换或 投射 到另外的类型:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype
arr.astype(np.float64)
arr.dtype
float_arr = arr.astype(np.float64)
float_arr.dtype
dtype(‘int64’)
array([ 1., 2., 3., 4., 5.])
dtype(‘int64’)
dtype(‘float64’)
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr
arr.astype(np.int32)
arr.dtype
array([ 3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
array([ 3, -1, -2, 0, 12, 10], dtype=int32)
dtype(‘float64’)
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings
numeric_strings.astype(np.float)
array([b’1.25’, b’-9.6’, b’42’], dtype=’|S4’)
array([ 1.25, -9.6 , 42. ])
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
相同大小的数组间的算术运算,其操作作用在对应的元素上
int_array = np.arange(10)
int_array2 = int_array * int_array
int_array
int_array2
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
arr = np.arange(10)
arr
arr[5:8] =12
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2][1]
8
arr2d
arr2d[:2]#第一轴向到2
arr2d[:2,1:]#第一轴向到2,第二轴向从1开始到最大
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
array([[1, 2, 3],
[4, 5, 6]])
array([[2, 3],
[5, 6]])
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
names
data
names == 'Bob'
data[names == 'Bob',1:]
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
dtype='<U4')
array([[ 0.09318088, 2.34566421, -0.36704993, -0.50920889],
[-0.65558474, 0.30480291, -0.71754155, 0.33042249],
[ 0.47818558, -1.02463247, -1.48671785, 0.3632606 ],
[-0.03603778, -0.89355894, -0.60354252, -0.30880452],
[ 1.70789727, -2.13951066, 2.449766 , -0.82311206],
[ 1.23073682, 1.12839591, -0.56178105, 1.93335746],
[ 1.42005936, -1.0114898 , -0.20086095, -0.15490157]])
array([ True, False, False, True, False, False, False], dtype=bool)
array([[ 2.34566421, -0.36704993, -0.50920889],
[-0.89355894, -0.60354252, -0.30880452]])
为了选择除了 ‘Bob’ 之外的所有东西,你可以使用 != 或用 - 对条件表达式取反,也可以使用布尔算术操作符如 & (and) 和 | (or)来结合多个布尔条件
```python
names != 'Bob'
data[~(names == 'Bob')]
```
array([False, True, True, False, True, True, True], dtype=bool)
array([[-0.65558474, 0.30480291, -0.71754155, 0.33042249],
[ 0.47818558, -1.02463247, -1.48671785, 0.3632606 ],
[ 1.70789727, -2.13951066, 2.449766 , -0.82311206],
[ 1.23073682, 1.12839591, -0.56178105, 1.93335746],
[ 1.42005936, -1.0114898 , -0.20086095, -0.15490157]])
为了设置 data 中所有的负值为0,我们只需要:
data[data < 0] = 0
data
data[names != 'Joe'] = 7
data
data[names == 'Joe'] = 1
data
array([[ 0.09318088, 2.34566421, 0. , 0. ],
[ 0. , 0.30480291, 0. , 0.33042249],
[ 0.47818558, 0. , 0. , 0.3632606 ],
[ 0. , 0. , 0. , 0. ],
[ 1.70789727, 0. , 2.449766 , 0. ],
[ 1.23073682, 1.12839591, 0. , 1.93335746],
[ 1.42005936, 0. , 0. , 0. ]])
array([[ 7. , 7. , 7. , 7. ],
[ 0. , 0.30480291, 0. , 0.33042249],
[ 7. , 7. , 7. , 7. ],
[ 7. , 7. , 7. , 7. ],
[ 7. , 7. , 7. , 7. ],
[ 1.23073682, 1.12839591, 0. , 1.93335746],
[ 1.42005936, 0. , 0. , 0. ]])
array([[ 7., 7., 7., 7.],
[ 1., 1., 1., 1.],
[ 7., 7., 7., 7.],
[ 7., 7., 7., 7.],
[ 7., 7., 7., 7.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
arr = np.empty((8, 4))
for i in range(8):
arr[i] = i
arr
arr[[5,4,1,7]]
arr[[-1,-2,-4,-3]]
array([[ 0., 0., 0., 0.],
[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 3., 3., 3., 3.],
[ 4., 4., 4., 4.],
[ 5., 5., 5., 5.],
[ 6., 6., 6., 6.],
[ 7., 7., 7., 7.]])
array([[ 5., 5., 5., 5.],
[ 4., 4., 4., 4.],
[ 1., 1., 1., 1.],
[ 7., 7., 7., 7.]])
array([[ 7., 7., 7., 7.],
[ 6., 6., 6., 6.],
[ 4., 4., 4., 4.],
[ 5., 5., 5., 5.]])
arr = np.arange(32).reshape((8, 4))
arr
arr[[1, 5, 7, 2], [0, 3, 1, 2]]#[1,0],[5,3],[7,1],[2,2]
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
array([ 4, 23, 29, 10])
arr[[1, 5, 7, 2]]
array([[ 4, 5, 6, 7],
[20, 21, 22, 23],
[28, 29, 30, 31],
[ 8, 9, 10, 11]])
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]
#看着复杂,实际上让arr_p=arr[[1,5,7,2]],然后arr_p[:,[0,3,1,2]]就清楚了
arr_p=arr[[1,5,7,2]]
arr_p[:,[0,3,1,2]]
array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30],
[ 8, 11, 9, 10]])
array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30],
[ 8, 11, 9, 10]])
由上面的公式可见实际上是一回事
看下面的内容,也和上面是一个意思,使用了np.ix_()
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30],
[ 8, 11, 9, 10]])
1.7转置数组和交换坐标轴
转置是一种特殊形式的变形,类似的它会返回基础数据的一个视窗,而不会拷贝任何东西。数组有 transpose 方法和专门的 T 属性:
arr = np.arange(15).reshape((3, 5))
arr
#T 属性,转置
arr.T
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
乘法运算
arr_multi=arr*arr
arr_multi
arr1=np.arange(10).reshape((2,5))
arr1
#arr_multi = arr*arr1,不同长度不能直接相乘
arr2 = np.arange(15).reshape((5,3))
arr2
#arr_multi = arr*arr2,相同长度不同结构仍旧不能直接相乘
arr3 = np.arange(15).reshape((3,5))
arr_multi = arr*arr3#相同长度和结构可以直接相乘,相乘是对位乘积
arr_multi
array([[ 0, 1, 4, 9, 16],
[ 25, 36, 49, 64, 81],
[100, 121, 144, 169, 196]])
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
array([[ 0, 1, 4, 9, 16],
[ 25, 36, 49, 64, 81],
[100, 121, 144, 169, 196]])
arr_dot = np.dot(arr.T,arr)
arr_dot
array([[125, 140, 155, 170, 185],
[140, 158, 176, 194, 212],
[155, 176, 197, 218, 239],
[170, 194, 218, 242, 266],
[185, 212, 239, 266, 293]])
arr_dot = np.dot(arr.T,arr)
arr_dot
arr_dot = np.dot(arr,arr.T)
arr_dot
array([[125, 140, 155, 170, 185],
[140, 158, 176, 194, 212],
[155, 176, 197, 218, 239],
[170, 194, 218, 242, 266],
[185, 212, 239, 266, 293]])
array([[ 30, 80, 130],
[ 80, 255, 430],
[130, 430, 730]])
arr是一个(3,5)矩阵,arr.T则是(5,3)矩阵,于是arr.T dot arr的结果就是(5,5)矩阵
arr
arrT = np.transpose(arr)
arrT
arr.transpose((0,1))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
?arr.transpose
使用上面的命令查看transpose可以看出“their order indicates how the axes are permuted”,就是说transpose()的参数顺序指出的是轴的排序,那么什么是轴序?哈,简单说就是shape,看下面的栗子(好想举个栗子)
arr = np.arange(24).reshape((4, 3, 2))
arr
arr.transpose((1, 0, 2))
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
array([[[ 0, 1],
[ 6, 7],
[12, 13],
[18, 19]],
[[ 2, 3],
[ 8, 9],
[14, 15],
[20, 21]],
[[ 4, 5],
[10, 11],
[16, 17],
[22, 23]]])
这里(2,2,4)就是轴序,transpose((1, 0, 2))就是把轴序给修改了,将默认的(0,1,2)修改成了(1,0,2).
arr.transpose((0, 2, 1))
#这起始就是将轴1和轴2进行交换
array([[[ 0, 2, 4],
[ 1, 3, 5]],
[[ 6, 8, 10],
[ 7, 9, 11]],
[[12, 14, 16],
[13, 15, 17]],
[[18, 20, 22],
[19, 21, 23]]])
#关于将轴交换还有一个专门的函数
arr.swapaxes(1, 2)
array([[[ 0, 2, 4],
[ 1, 3, 5]],
[[ 6, 8, 10],
[ 7, 9, 11]],
[[12, 14, 16],
[13, 15, 17]],
[[18, 20, 22],
[19, 21, 23]]])
结果和前面的transpose((0,2,1))一样,概念也是一样的。
实际上这些都只是考量了一个对多维数组的概念,从一维、二维、三维…一个个想清楚,上述的这些轴交换啥的都是小菜
一维:ARR = [A,B,C,D] shape(4)
假如每一元素都是一个等长的数组,也就是说每个元素都是由更高底层(深层)的数据组成:
A= [a1,a2,a3]
B= [b1,b2,b3]
C= [c1,c2,c3]
D= [d1,d2,d3]
罗列上去就是:
ARR = [A,B,C,D]=[[a1,a2,a3],[b1,b2,b3], [c1,c2,c3],[d1,d2,d3]]
然后呢?得到了一个二维数组:shape(4,3),4行3列。浅层是轴0,深层是轴1,。轴交换就是浅层和深层交换,本例从(4,3)变为(3,4):
ARRT = XXX=[[a1,b1,c1,d1],[a2,b2,c2,d2],[a3,b3,c3,d3]]
还回到ARR,如果其中的a1也是由更深层的数据组成a1 = [a11,a12],b1 = [b11,b12],…
于是得到三维数据结构:shape(4,3,2)
ARR = [A,B,C,D]=[[a1,a2,a3],[b1,b2,b3], [c1,c2,c3],[d1,d2,d3]]=[[[a11,a12],[a21,a22]….]]]
直接交换轴1,2,则由(4,3,2)变为(4,2,3):A,B,C,D的位置不变,变他们的内容结构就是了。
总之,所有交换的核心就是单元的转置,不管多少层,只看相邻交换的两个单元的结构就行了,其他的都是干扰。
arr
arr.transpose((1, 2, 0))#(4,3,2)->(3,2,4),实际就是(4,3)交换得到(3,4,2),然后(4,2)交换,得到(3,2,4)
arr.transpose((1,0,2))
arr.transpose((1,2,0))
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
array([[[ 0, 6, 12, 18],
[ 1, 7, 13, 19]],
[[ 2, 8, 14, 20],
[ 3, 9, 15, 21]],
[[ 4, 10, 16, 22],
[ 5, 11, 17, 23]]])
array([[[ 0, 1],
[ 6, 7],
[12, 13],
[18, 19]],
[[ 2, 3],
[ 8, 9],
[14, 15],
[20, 21]],
[[ 4, 5],
[10, 11],
[16, 17],
[22, 23]]])
array([[[ 0, 6, 12, 18],
[ 1, 7, 13, 19]],
[[ 2, 8, 14, 20],
[ 3, 9, 15, 21]],
[[ 4, 10, 16, 22],
[ 5, 11, 17, 23]]])
2.快速的基于元素的数组函数
import numpy as np
arr1 = np.arange(10)
arr1
arr2 = np.arange(1,11)
arr2
np.sqrt(arr2)#unary ufuns
np.maximum(arr1,arr2)#banary ufuncs
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
array([ 1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974, 2.64575131, 2.82842712, 3. , 3.16227766])
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
points = np.arange(-5, 5, 0.01) # 1000个等间隔点
xs, ys = np.meshgrid(points, points)
ys
xs
array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97],
[ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98],
[ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]])
array([[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
...,
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99]])
import matplotlib.pyplot as plt
%matplotlib inline
#别把这个忘了,否则显示不出来。
z = np.sqrt(xs ** 2 + ys ** 2)
z
#z[500]
#plt.subplot(122);
plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")
array([[ 7.07106781, 7.06400028, 7.05693985, ..., 7.04988652,
7.05693985, 7.06400028],
[ 7.06400028, 7.05692568, 7.04985815, ..., 7.04279774,
7.04985815, 7.05692568],
[ 7.05693985, 7.04985815, 7.04278354, ..., 7.03571603,
7.04278354, 7.04985815],
...,
[ 7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 ,
7.03571603, 7.04279774],
[ 7.05693985, 7.04985815, 7.04278354, ..., 7.03571603,
7.04278354, 7.04985815],
[ 7.06400028, 7.05692568, 7.04985815, ..., 7.04279774,
7.04985815, 7.05692568]])
<matplotlib.image.AxesImage at 0x7f785f5fd550>
<matplotlib.colorbar.Colorbar at 0x7f785f5bd908>
Text(0.5,1,'Image plot of $\\sqrt{x^2 + y^2}$ for a grid of values')
?plt.imshow
用数组操作来表达条件逻辑:numpy.where 是三元表达式 x if condition else y 的矢量化版本
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
result = np.where(cond, xarr, yarr)
result
array([ 1.1, 2.2, 1.3, 1.4, 2.5])
np.where 的第一个和第二个参数不需要是数组;它们中的一个或两个可以是纯量。 在数据分析中 where 的典型使用是生成一个新的数组,其值基于另一个数组。假如你有一个矩阵,其数据是随机生成的,你想要把其中的正值替换为2,负值替换为-2,使用 np.where 非常容易
arr = np.random.randn(4, 4)
arr
array([[-0.50860637, -0.85804767, 0.60322409, 0.15165021],
[ 1.11725901, 1.44286186, 0.09640674, 1.0025972 ],
[-0.59096178, -0.7718587 , 0.31794432, -1.19395758],
[-0.13094242, 1.08773177, 0.40416343, -0.92219399]])
np.where(arr > 0, 2, -2)
array([[-2, -2, 2, 2],
[ 2, 2, 2, 2],
[-2, -2, 2, -2],
[-2, 2, 2, -2]])
arr#可见arr并没有被修改
array([[-0.50860637, -0.85804767, 0.60322409, 0.15165021],
[ 1.11725901, 1.44286186, 0.09640674, 1.0025972 ],
[-0.59096178, -0.7718587 , 0.31794432, -1.19395758],
[-0.13094242, 1.08773177, 0.40416343, -0.92219399]])
np.where(arr > 0, 2, arr) # 仅设置正值为 2
arr#依旧没有更改arr
array([[-0.50860637, -0.85804767, 2. , 2. ],
[ 2. , 2. , 2. , 2. ],
[-0.59096178, -0.7718587 , 2. , -1.19395758],
[-0.13094242, 2. , 2. , -0.92219399]])
array([[-0.50860637, -0.85804767, 0.60322409, 0.15165021],
[ 1.11725901, 1.44286186, 0.09640674, 1.0025972 ],
[-0.59096178, -0.7718587 , 0.31794432, -1.19395758],
[-0.13094242, 1.08773177, 0.40416343, -0.92219399]])
arr = np.where(arr > 0, 2, arr) # 仅设置正值为 2
arr#更改了arr
array([[-0.50860637, -0.85804767, 2. , 2. ],
[ 2. , 2. , 2. , 2. ],
[-0.59096178, -0.7718587 , 2. , -1.19395758],
[-0.13094242, 2. , 2. , -0.92219399]])
cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, False, True, False])
np.where(cond1 & cond2, 0,
np.where(cond1, 1,
np.where(cond2, 2, 3)))
#本栗子的意思就是,如果满足条件1和条件2均为True 则赋值0,否则如果满足条件1为True则赋值1,否则如果满足条件2为True则赋值2,否则赋值3
array([0, 2, 1, 0, 3])
result = 1 * cond1 + 2 * cond2 + 3 * ~(cond1 | cond2)
result
array([3, 2, 1, 3, 3])
arr = np.random.randn(5, 4) # 正态分布数据
arr
arr.mean()
arr.std()
arr.sum()
array([[ 0.23259969, 0.25324526, -1.68901485, -0.4363489 ],
[-0.06618925, -0.46966859, -0.47744812, 1.26768173],
[-0.26637815, 0.33650136, -0.78287286, 0.15818271],
[-0.38862792, -0.48476329, 2.28018714, 0.58877072],
[-1.13710048, 0.29967906, 1.88435667, -0.63261904]])
0.023508644921630895
0.93000384516976498
0.47017289843261789
它对给定坐标轴进行统计
arr.mean(axis=1)#axis =1 指每列是一个矢量
array([-0.4098797 , 0.06359394, -0.13864173, 0.49889166, 0.10357905])
arr.mean(axis=0)#axis =0 指每行是一个矢量
array([-0.32513922, -0.01300124, 0.2430416 , 0.18913345])
像 cumsum 和 cumprod 这些函数并不聚集,而是产生一个 intermediate results 的数组:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr.cumsum(0)#过程累加
arr.sum(0)#过程累加的最后结果和sum一样
arr.cumsum(1)
arr.sum(1)
array([[ 0, 1, 2],
[ 3, 5, 7],
[ 9, 12, 15]])
array([ 9, 12, 15])
array([[ 0, 1, 3],
[ 3, 7, 12],
[ 6, 13, 21]])
array([ 3, 12, 21])
arr.cumprod(0)
arr.cumprod(1)
array([[ 0, 1, 2],
[ 0, 4, 10],
[ 0, 28, 80]])
array([[ 0, 0, 0],
[ 3, 12, 60],
[ 6, 42, 336]])
布尔数组
from numpy.random import randn
arr = np.random.randn(100)
(arr > 0).sum() # 正值的个数
48
from numpy.random import randn
以后就不用总报错啦
bools = np.array([False, False, True, False])
bools.any()
bools.all()
True
False
arr = randn(8)
arr
arr.sort()
arr
array([ 0.58949914, -0.32831395, -0.11372713, -0.86316093, -0.07670326,
0.08122922, 0.97167985, -0.3009604 ])
array([-0.86316093, -0.32831395, -0.3009604 , -0.11372713, -0.07670326,
0.08122922, 0.58949914, 0.97167985])
arr = randn(5, 3)
arr
arr.sort(1)
arr
arr.sort(0)
arr
array([[-0.9933549 , -0.11998081, -0.27950985],
[-0.58477226, 1.19327845, 1.12167901],
[ 1.72336582, 0.2090648 , -0.70501336],
[-0.00379459, 0.33374255, -1.41377977],
[-1.65020743, 0.15346388, -0.74909774]])
array([[-0.9933549 , -0.27950985, -0.11998081],
[-0.58477226, 1.12167901, 1.19327845],
[-0.70501336, 0.2090648 , 1.72336582],
[-1.41377977, -0.00379459, 0.33374255],
[-1.65020743, -0.74909774, 0.15346388]])
array([[-1.65020743, -0.74909774, -0.11998081],
[-1.41377977, -0.27950985, 0.15346388],
[-0.9933549 , -0.00379459, 0.33374255],
[-0.70501336, 0.2090648 , 1.19327845],
[-0.58477226, 1.12167901, 1.72336582]])
Unique 和其它集合逻辑
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)
array(['Bob', 'Joe', 'Will'],
dtype='<U4')
另一个函数 np.in1d ,测试一个数组的值和另一个的关系,看第一个数组的每个数值是否包含在第二个数组集合内
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])#第一个值6,包含在数组[2,3,6]内,所以是true
array([ True, False, False, True, True, False, True], dtype=bool)
关于数组的文件输入和输出
NumPy能够保存数据到磁盘和从磁盘加载数据,不论数据是文本或二进制的
np.save 和 np.load 是两个主力功能,有效的保存和加载磁盘数据。数组默认保存为未经过压缩的原始二进制数据,文件扩展名为 .npy
arr = np.arange(10)
np.save('some_array', arr)#存档
arr1 = np.load('some_array.npy')
arr1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
你可以使用 np.savez 并以关键字参数传递数组来保存多个数组到一个zip的归档文件中
arr2 = np.arange(0,20,2)
arr2
np.savez('array_archive.npz', a=arr1, b=arr2)#把两个数组打包,并给予每个数组一个字典名称a,b,这个在提取的时候用
arch = np.load('array_archive.npz')
arch['b']
arch['a']
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
保存和加载文本文件
arr = np.arange(20)
arr
np.savetxt('array_ex.txt', arr,delimiter=',')
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
arr = np.loadtxt('array_ex.txt', delimiter=',')
arr
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16., 17., 18., 19.])
arr2d = np.arange(20).reshape(5,4)
arr2d
np.savetxt('array_ex2d.txt', arr2d,fmt='%d', delimiter=',', newline='\n', header='', footer='', comments='# ')
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
arr2d= np.loadtxt('array_ex2d.txt', delimiter=',')
arr2d
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.],
[ 16., 17., 18., 19.]])
线性代数
点乘dot
x = np.array([[1., 2., 3.], [4., 5., 6.]])#(2,3)
y = np.array([[6., 23.], [-1, 7], [8, 9]])#(3,2)
x
y
z = np.dot(x,y)#(2,2)
z
x.dot(y)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
array([[ 6., 23.],
[ -1., 7.],
[ 8., 9.]])
array([[ 28., 64.],
[ 67., 181.]])
array([[ 28., 64.],
[ 67., 181.]])
x.T.dot(y.T)
array([[ 98., 27., 44.],
[ 127., 33., 61.],
[ 156., 39., 78.]])
from numpy.linalg import inv, qr
X = randn(5,4)
X
mat = X.T.dot(X)
mat
matInv = inv(mat)#逆矩阵
mat.dot(matInv)
array([[ 0.58025411, -1.78253711, -0.3809231 , -1.49789588],
[ 0.98564028, -1.48796886, 0.76852488, 1.78383275],
[ 0.95577617, 0.50655144, 0.53523176, 0.20574492],
[ 0.48989695, -1.64701629, -0.4171108 , -1.01323554],
[-0.01362112, 0.58519871, -0.60356926, 0.83877434]])
array([[ 2.46187422, -2.83161604, 0.85189863, 0.57789722],
[-2.83161604, 8.70320444, 0.14037121, 2.27965288],
[ 0.85189863, 0.14037121, 1.5604832 , 1.96799729],
[ 0.57789722, 2.27965288, 1.96799729, 7.19827096]])
array([[ 1.00000000e+00, 1.38777878e-17, 5.55111512e-17,
5.55111512e-17],
[ 1.24900090e-16, 1.00000000e+00, 1.11022302e-16,
0.00000000e+00],
[ -2.22044605e-16, -1.38777878e-16, 1.00000000e+00,
0.00000000e+00],
[ -3.88578059e-16, -1.11022302e-16, 0.00000000e+00,
1.00000000e+00]])
q, r = qr(mat)
q
r
array([[-0.63273728, -0.38670716, 0.65804013, -0.13070689],
[ 0.72776628, -0.56970465, 0.32526197, -0.19999376],
[-0.21895027, -0.34247813, -0.55582815, -0.72513763],
[-0.14852794, -0.63922056, -0.39018317, 0.64582786]])
array([[-3.89083161, 7.75624128, -1.07084141, -0.20664056],
[ 0. , -5.36852487, -2.20182111, -6.79748462],
[ 0. , 0. , -1.02899901, -2.78074852],
[ 0. , 0. , 0. , 2.69032354]])
from numpy.linalg import det
mat
det(mat)
array([[ 2.46187422, -2.83161604, 0.85189863, 0.57789722],
[-2.83161604, 8.70320444, 0.14037121, 2.27965288],
[ 0.85189863, 0.14037121, 1.5604832 , 1.96799729],
[ 0.57789722, 2.27965288, 1.96799729, 7.19827096]])
57.825163887847275
示例:随机游走
利用数组操作来模拟随机游走的示例程序
import numpy as np
from numpy.random import randint
nsteps = 10
draws = randint(0, 2, size=nsteps)
print('draws =',draws)
steps = np.where(draws > 0, 1, -1)
print('steps =',steps)
walk = steps.cumsum()
print('walk =',walk)
walk.min()
walk.max()
draws = [0 0 0 0 1 0 1 0 1 0]
steps = [-1 -1 -1 -1 1 -1 1 -1 1 -1]
walk = [-1 -2 -3 -4 -3 -4 -3 -4 -3 -4]
-4
-1
#np.abs(walk) >= 10
(np.abs(walk) >= 2).argmax()#argmax 返回布尔数组(最大值为 True)中第一个最大值的索引
1
一次跟踪多个游走
walkers = 8
nsteps = 10
draws = randint(0,2,size = (walkers,nsteps))
print('draws = \n',draws)
steps = np.where(draws >0,1,-1)
print('steps =\n',steps)
walk = steps.cumsum(1)#特别要注意这里的参数1,概念要清楚,是以轴1为变量进行计算,或者说每行内部进行计算,每行是一个单元
print('walk =\n',walk)
walk.min()
walk.max()
walk.min(1)#每行内部进行计算,轴1
walk.max(1)
draws =
[[0 1 0 1 1 1 1 1 0 1]
[0 1 0 0 0 1 1 0 0 1]
[0 1 1 0 0 0 0 0 0 1]
[0 0 1 1 1 0 0 1 0 1]
[1 0 0 0 0 0 0 0 1 1]
[0 1 0 1 0 1 0 0 1 0]
[1 0 0 0 0 1 1 0 1 0]
[1 0 1 0 1 1 1 0 1 1]]
steps =
[[-1 1 -1 1 1 1 1 1 -1 1]
[-1 1 -1 -1 -1 1 1 -1 -1 1]
[-1 1 1 -1 -1 -1 -1 -1 -1 1]
[-1 -1 1 1 1 -1 -1 1 -1 1]
[ 1 -1 -1 -1 -1 -1 -1 -1 1 1]
[-1 1 -1 1 -1 1 -1 -1 1 -1]
[ 1 -1 -1 -1 -1 1 1 -1 1 -1]
[ 1 -1 1 -1 1 1 1 -1 1 1]]
walk =
[[-1 0 -1 0 1 2 3 4 3 4]
[-1 0 -1 -2 -3 -2 -1 -2 -3 -2]
[-1 0 1 0 -1 -2 -3 -4 -5 -4]
[-1 -2 -1 0 1 0 -1 0 -1 0]
[ 1 0 -1 -2 -3 -4 -5 -6 -5 -4]
[-1 0 -1 0 -1 0 -1 -2 -1 -2]
[ 1 0 -1 -2 -3 -2 -1 -2 -1 -2]
[ 1 0 1 0 1 2 3 2 3 4]]
-6
4
array([-1, -3, -5, -2, -6, -2, -3, 0])
array([4, 0, 1, 1, 1, 0, 1, 4])
(np.abs(walk) >= 5).argmax()#找第一个>=5的walk位置,显然这个是计算了第一行的,没有考虑全局
(np.abs(walk) >= 5).argmax(1)#找出每一行 >=5的第一个walk位置
#接下来找全局的第一个>=5的walk位置
rseq = (np.abs(walk) >= 5).argmax(1)
np.sort(rseq)
print('rseq =\n',rseq)
nrseq = np.sort(rseq)
(nrseq >0).argmax()
nrseq[(nrseq >0).argmax()]#找出第一个>=5的walk位置
#这个位置是属于哪一个walker?
(rseq== nrseq[(nrseq >0).argmax()]).argmax()
#这样就找到了最早达到指标(>=5)的元素的位置(哪个walker第几步)
28
array([0, 0, 8, 0, 6, 0, 0, 0])
array([0, 0, 0, 0, 0, 0, 6, 8])
rseq =
[0 0 8 0 6 0 0 0]
6
6
4
nrseq = nrseq[nrseq >0]
nrseq
(rseq == nrseq[0]).argmax()
array([6, 8])
4