Optional Lab: Python, NumPy and Vectorization

A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.

1 Outline

1.1 Goals

In this lab, you will review the features of NumPy and Python that are used in Course 1.

1.2 Useful References

2 Python and NumPy

Python是我们本课程使用的编程语言,其有一系列的数据类型和算术运算;NumPy是一个库,扩展了Python的基本功能,以添加更丰富的数据集,包括更多的数据类型、向量、矩阵和许多矩阵函数
 
二者可以无缝衔接协同工作,Python的算术运算符可以处理NumPy的数据类型,许多NumPy函数可以接受Python的数据类型

import numpy as np    # it is an unofficial standard to use np for numpy
import time

3 Vector

3.1 Abstract

  • vector是有序的数字数组,用小写粗体字母表示 x \mathbf{x} x
  • vector中的元素都是相同类型,不能同时包含字符和数字
  • vector中元素的数量通常被称为维度,数学家称其为秩
  • vector中的索引为0至n - 1,可以用索引进行引用,单独引用时会写在下标,如 x 0 x_0 x0 ,此时不加粗

3.2 NumPy Arrays

NumPy的基本数据结构是一个可索引的n维数组(n-demensional array),包含相同类型(dtype)的元素
上面,维度指向量中元素的数量,这里指数组的索引数量
一维数组1-D array有一个索引,在course 1中,将vectors表示为NumPy的1-D arrays

  • 1-D array, shape (n,): n elements indexed [0] through [n-1]

3.3 Vector Creation

NumPy的数据创建通常会由第一个参数,代表对象的shape,this can either be a single value for a 1-D result or a tuple (n,m,…) specifying the shape of the result.

# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.zeros(4) :   a = [0. 0. 0. 0.], a shape = (4,), a data type = float64
np.zeros(4,) :  a = [0. 0. 0. 0.], a shape = (4,), a data type = float64
np.random.random_sample(4): a = [0.38919476 0.38019795 0.86953179 0.1653972 ], a shape = (4,), a data type = float64

Some data creation routines do not take a shape tuple.

# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument
a = np.arange(4.);              print(f"np.arange(4.):     a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.rand(4);          print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.arange(4.):     a = [0. 1. 2. 3.], a shape = (4,), a data type = float64
np.random.rand(4): a = [0.56777089 0.44204559 0.45052726 0.41138661], a shape = (4,), a data type = float64

values can be specified manually as well.

# NumPy routines which allocate memory and fill with user specified values
a = np.array([5,4,3,2]);  print(f"np.array([5,4,3,2]):  a = {a},     a shape = {a.shape}, a data type = {a.dtype}")
a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出如下

np.array([5,4,3,2]):  a = [5 4 3 2],     a shape = (4,), a data type = int32
np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64

这些都是创建一个具有四个元素的one-dimensional vector的方法
a.shape返回尺寸,返回数据类型为tuple,对于n行m列的数组,返回值是 (n, m)
此处 a,shape = (4, ) 表示一个具有四个元素的一维数组

3.4 Operations on Vectors

Let’s explore some operations using vectors.

3.4.1 Indexing

可以通过索引和切片来访问向量的元素,NumPy提供了一套非常完整的索引和切片功能
在这里只探索课程所需的基础知识有关更多详细信息,请参阅Slicing and Indexing

索引是指通过数组中某个元素的位置来引用该元素,切片意味着根据元素的索引从数组中获取元素的子集

NumPy从零开始索引,因此向量 a \mathbf{a} a 的第三个元素是a[2]

#vector indexing operations on 1-D vectors
a = np.arange(10)
print(a)

#access an element
print(f"a[2].shape: {a[2].shape} a[2]  = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end
# -2是8,倒着来
print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error
try:
    c = a[10]
except Exception as e:
    print("The error message you'll see is:")
    print(e)

输出如下

[0 1 2 3 4 5 6 7 8 9]
a[2].shape: () a[2]  = 2, Accessing an element returns a scalar
a[-1] = 9
The error message you'll see is:
index 10 is out of bounds for axis 0 with size 10

3.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组,只有start / stop也是有效的

#vector slicing operations
a = np.arange(10)
print(f"a         = {a}")

#access 5 consecutive elements (start:stop:step)
c = a[2:7:1];     print("a[2:7:1] = ", c)

# access 3 elements separated by two 
c = a[2:7:2];     print("a[2:7:2] = ", c)

# access all elements index 3 and above
c = a[3:];        print("a[3:]    = ", c)

# access all elements below index 3
c = a[:3];        print("a[:3]    = ", c)

# access all elements
c = a[:];         print("a[:]     = ", c)

输出如下

a         = [0 1 2 3 4 5 6 7 8 9]
a[2:7:1] =  [2 3 4 5 6]
a[2:7:2] =  [2 4 6]
a[3:]    =  [3 4 5 6 7 8 9]
a[:3]    =  [0 1 2]
a[:]     =  [0 1 2 3 4 5 6 7 8 9]

3.4.3 Single Vector Operations

有许多有用的operations涉及对单个向量的操作

a = np.array([1,2,3,4])
print(f"a             : {a}")
# negate elements of a
b = -a 
print(f"b = -a        : {b}")

# sum all elements of a, returns a scalar
b = np.sum(a) 
print(f"b = np.sum(a) : {b}")

b = np.mean(a)
print(f"b = np.mean(a): {b}")

b = a**2
print(f"b = a**2      : {b}")

输出如下

a             : [1 2 3 4]
b = -a        : [-1 -2 -3 -4]
b = np.sum(a) : 10
b = np.mean(a): 2.5
b = a**2      : [ 1  4  9 16]

3.4.4 Vector Vector Element-wise Operations

大多数NumPy算术、逻辑和比较运算也适用于向量,这些操作符对逐个元素进行操作,如
a + b = ∑ i = 0 n − 1 a i + b i \mathbf{a} + \mathbf{b} = \sum_{i=0}^{n-1} a_i + b_i a+b=i=0n1ai+bi

a = np.array([ 1, 2, 3, 4])
b = np.array([-1,-2, 3, 4])
print(f"Binary operators work element wise: {a + b}")

输出如下

Binary operators work element wise: [0 0 6 8]

为了保证运算正确,进行运算的向量必须是相同大小的

#try a mismatched vector operation
c = np.array([1, 2])
try:
    d = a + c
except Exception as e:
    print("The error message you'll see is:")
    print(e)

输出如下

The error message you'll see is:
operands could not be broadcast together with shapes (4,) (2,) 

3.4.5 Scalar Vector Operations

vectors可以通过标量值进行缩放,标量值只是一个数字,乘以vectors的所有元素

a = np.array([1, 2, 3, 4])

# multiply a by a scalar
b = 5 * a 
print(f"b = 5 * a : {b}")

输出如下

b = 5 * a : [ 5 10 15 20]

3.4.6 Vector Vector Dot Product

点积是线性代数和NumPy的主要内容,是本课程中广泛使用的一个操作

在这里插入图片描述

点积将两个vectors中的值逐元素相乘并对结果求和,要求两个vectors的尺寸相同
使用for循环,实现一个返回两个vectors点积的函数,the function to return given inputs a a a and b b b:
x = ∑ i = 0 n − 1 a i b i x = \sum_{i=0}^{n-1} a_i b_i x=i=0n1aibi
Assume both a and b are the same shape.

def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
print(f"my_dot(a, b) = {my_dot(a, b)}")

输出如下

my_dot(a, b) = 24

注意,点积应返回标量值
尝试使用np,dot来完成点积操作

# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ") 
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

输出如下

NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () 
NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = () 

结果相同

3.4.7 The Need for Speed: Vector vs For-loop

使用NumPy库是因为其提高了速度和内存效率,演示如下

np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

输出如下

np.dot(a, b) =  2501072.5817
Vectorized version duration: 1107.5144 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 4505.1224 ms 

因此在本例中,矢量化提供了很大的速度提升,这是因为NumPy在底层硬件对可用的数据并行性进行了更好地利用
GPU和现代CPU实现单指令多数据(SIMD)管道,允许并行发布多个操作,这在数据集通常非常大的机器学习中至关重要

3.4.8 Vector Vector Operations in Course 1

Vector Vector operations will appear frequently in course 1. 下面是原因:

  • 接下来,我们的例子将存储在一个数组中,X_train of dimension (m,n). 需要注意的是这是一个二维数组或矩阵
  • w will be a 1-dimensional vector of shape (n,).
  • 我们将通过循环遍历示例来执行操作,通过索引X来提取每个示例以单独处理,例如X[i]
  • X[i]返回 a value of shape (n,), a 1-dimensional vector. 因此涉及X[i]的运算通常是vector-vector.
# show common Course 1 example
X = np.array([[1],[2],[3],[4]])
w = np.array([2])
c = np.dot(X[1], w)

print(f"X[1] has shape {X[1].shape}")
print(f"w has shape {w.shape}")
print(f"c has shape {c.shape}")

输出如下

X[1] has shape (1,)
w has shape (1,)
c has shape ()

Matrices

4.1 Abstract

矩阵是二维数组,用大写粗体字母表示 X \mathbf{X} X,元素均为同一类型
在Lab中,m通常是行数,n通常是列数,矩阵中的元素可以用二维索引来引用
在这里插入图片描述

4.2 NumPy Arrays

Matrices have a two-dimensional (2-D) index [m,n]. 在Course 1中,2-D matrices用来保存训练数据 Training data is m examples by n features creating an (m,n) array. Course 1不直接对矩阵进行运算,但通常提取一个例子作为向量并对其进行运算

4.3 Matrix Creation

The same functions that created 1-D vectors will create 2-D or n-D arrays.
Below, the shape tuple is provided to achieve a 2-D result. 请注意NumPy是如何使用括号来表示每个维度的,在打印时,NumPy将每行分别打印一行

a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}") 

输出如下

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]
a shape = (2, 1), a = [[0.]
 [0.]]
a shape = (1, 1), a = [[0.44236513]]

也可以手动指定数据,尺寸是用额外的括号指定的,与上面打印的格式相匹配

# NumPy routines which allocate memory and fill with user specified values
a = np.array([[5], [4], [3]]);   print(f" a shape = {a.shape}, np.array: a = {a}")
a = np.array([[5],   # One can also
              [4],   # separate values
              [3]]); #into separate rows
print(f" a shape = {a.shape}, np.array: a = {a}")

输出如下

a shape = (3, 1), np.array: a = [[5]
 [4]
 [3]]
a shape = (3, 1), np.array: a = [[5]
 [4]
 [3]]

4.4 Operations on Matrices

Let’s explore some operations using matrices.

4.4.1 Indexing

矩阵包括第二个索引,这两个索引描述[row, column],访问可以返回一个元素,也可以返回一行/列

#vector indexing operations on matrices
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \na= {a}")

#access an element
print(f"\na[2,0].shape:   {a[2, 0].shape}, a[2,0] = {a[2, 0]},     type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row
print(f"a[2].shape:   {a[2].shape}, a[2]   = {a[2]}, type(a[2])   = {type(a[2])}")

输出如下

a.shape: (3, 2), 
a= [[0 1]
 [2 3]
 [4 5]]

a[2,0].shape:   (), a[2,0] = 4,     type(a[2,0]) = <class 'numpy.int32'> Accessing an element returns a scalar

a[2].shape:   (2,), a[2]   = [4 5], type(a[2])   = <class 'numpy.ndarray'>

最后一个例子,仅通过指定行访问将返回一个1-D vector

Reshape

The previous example used reshape to shape the array.
a = np.arange(6).reshape(-1, 2)
This line of code first created a 1-D Vector of six elements. It then reshaped that vector into a 2-D array using the reshape command. This could have been written:
a = np.arange(6).reshape(3, 2)
To arrive at the same 3 row, 2 column array.
The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.

4.4.2 Slicing

切片使用一组三个值(start: stop: step)创建一个索引数组,只有start / stop也是有效的

#vector 2-D slicing operations
a = np.arange(20).reshape(-1, 10)
print(f"a = \n{a}")

#access 5 consecutive elements (start:stop:step)
print("a[0, 2:7:1] = ", a[0, 2:7:1], ",  a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

#access 5 consecutive elements (start:stop:step) in two rows
print("a[:, 2:7:1] = \n", a[:, 2:7:1], ",  a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

# access all elements
print("a[:,:] = \n", a[:,:], ",  a[:,:].shape =", a[:,:].shape)

# access all elements in one row (very common usage)
print("a[1,:] = ", a[1,:], ",  a[1,:].shape =", a[1,:].shape, "a 1-D array")
# same as
print("a[1]   = ", a[1],   ",  a[1].shape   =", a[1].shape, "a 1-D array")

输出如下

a = 
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
a[0, 2:7:1] =  [2 3 4 5 6] ,  a[0, 2:7:1].shape = (5,) a 1-D array
a[:, 2:7:1] = 
 [[ 2  3  4  5  6]
 [12 13 14 15 16]] ,  a[:, 2:7:1].shape = (2, 5) a 2-D array
a[:,:] = 
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] ,  a[:,:].shape = (2, 10)
a[1,:] =  [10 11 12 13 14 15 16 17 18 19] ,  a[1,:].shape = (10,) a 1-D array
a[1]   =  [10 11 12 13 14 15 16 17 18 19] ,  a[1].shape   = (10,) a 1-D array

Congratulations!

In this lab you mastered the features of Python and NumPy that are needed for Course 1.

  • 18
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

gravity_w

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值