Python Basics with numpy
This exercise gives you a brief introduction to Python. Even if you’ve used Python before, this will help familiarize you with functions we’ll need. In this assignment you will:
• Learn how to use numpy.
• Implement some basic core deep learning functions such as the softmax, sigmoid, dsigmoid, etc…
• Learn how to handle data by normalizing inputs and reshaping images. • Recognize the importance of vectorization.
• Understand how python broadcasting works.
Building basic functions with numpy
Sigmoid function,np.exp()
s
i
g
m
o
i
d
(
x
)
=
1
/
(
1
+
e
−
x
)
sigmoid(x) = 1 /(1+e^{−x})
sigmoid(x)=1/(1+e−x) is sometimes also known as the logistic function. It is a non-linearfunction usednotonly inMachineLearning (LogisticRegression), but also in Deep Learning.
Instructions: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices…) are called numpy arrays. You don’t need to know more for now.
def sigmoid(x):
Computer sigmoid of x.
argument:
x--A scalar or numpy array of any size
return:
s--sigmoid(x)
s=1/(1+np.exp(-x))
return s
Sigmoid gradient
As you have seen in lecture,you will need to comupute gradients to optimize loss functions using backpropagation.
Let’s code your first gradient function. Let’s code your first gradient function.
Exercise: Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is:
s
i
g
m
o
i
d
d
e
r
i
v
a
t
i
v
e
(
x
)
=
σ
′
(
x
)
=
σ
(
x
)
(
1
−
σ
(
x
)
)
sigmoid_derivative(x)= σ′(x) = σ(x)(1−σ(x))
sigmoidderivative(x)=σ′(x)=σ(x)(1−σ(x))
You often code this function in two steps:
- Set to be the sigmoid of x.You might find your sigmoid(x) function useful.
- Computer σ ′ ( x ) = σ ( x ) ( 1 − σ ( x ) ) σ′(x) = σ(x)(1−σ(x)) σ′(x)=σ(x)(1−σ(x))
def signoide_derivative(x):
s=1/(1+np.exp(x))
ds=s*(1-s)
return ds
Reshaping arrays
Two common numpy functions used in deep learning are np.shape and np.reshape().
• X.shape is used to get the shape (dimension) of a matrix/vector X.
• X.reshape(…) is used to reshape X into some other dimension.
For example, in computer science, an image is represented by a 3D array of shape (length,height,depth = 3). However, when you read an image as the input of an algorithm you convert it to a vector of shape (length ∗ height ∗ 3,1). In other words, you “unroll”, or reshape, the 3D array into a 1D vector.
Exercise: Implement “image2vector()” that takes an input of shape (length, height, 3) and returns a vector of shape (lengthheight3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:
V=v.reshape((v.shape[0]*v.shape[1],v.shape[2))
def image2vector(image):
'''
:param image:a numpy array of shape(length,height,depth)
:return: v--a vector of shape(length*height*depth,1)
'''
V=image.reshape((image.shape[0]*image.shape[1]*image.shape[2]),1)
return V
Normalizing rows
Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to
x
/
∥
x
∥
x /∥x∥
x/∥x∥(dividing each row vector of x by its norm)
Exercise: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).
def normalizeRows(x):
'''
Implement a function that normalizes each row of the matrix x (to
have unit length)
:param x:
x---A numpy matrix of shape(n,m)
:return:
x--The normalized(by row) numpy matrix.You are allowed to modify x.
'''
x_norm=np.linalg.norm(x,axis=1,keepdims=True)
x=x/x_norm
return x
Broadcasting and the softmax function
A very important concept to understand in numpy is ”broadcasting”. It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official broadcasting documentation.
def softmax(x):
'''
Calculates the softmax for each row of the input x.
Your code should work for a row vector and also for matrices of shape(n,m).
:param x:
x--A numpy matrix of shape(n,m)
:return:
s--A numpy matrix equal to the softmax of x,of shape(n,m)
'''
x_exp=np.exp(x)
x_sum=np.sum(x_exp,axis=1,keepdims=True)
s=x_exp/x_sum
return s
What you need to remember:
• np.exp(x) works for any np.array x and applies the exponential function to every coordinate
• the sigmoid function and its gradient
• image2vector is commonly used in deep learning
• np.reshape is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
• numpy has efficient built-in functions
• broadcasting is extremely useful
Vectorization
The vectorized implementation is much cleaner and more efficient.For bigger vectors/matrices, the differences in running time become even bigger. Note that np.dot()performs a matrix-matrix or matrix-vector multiplication. This is different from np.multiply() and the * operator (which is equivalent to .* in Matlab/Octave), which performs an element-wise multiplication.
Implement the L1 and L2 loss functions
Exercise: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful. Reminder:
• The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions(y^)are from the true values(y). In deeplearning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
• L1 loss is defined as:
def L1(yhat,y):
'''
:param yhat: vector of size m (predicted labels)
:param y: vector of size m (true labels)
:return:
loss--the value of the L1 loss function defined above
'''
loss=sum(abs(yhat-y))
return loss
Exercise: Implement the numpy vectorized version of the
L
2
L2
L2 loss. There are several way of implementing the
L
2
L2
L2 loss but you may find the function
n
p
.
d
o
t
(
)
np.dot()
np.dot() useful. As a reminder, if
x
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
x = [x_1,x_2,...,x_n]
x=[x1,x2,...,xn],then,
n
p
.
d
o
t
(
x
,
x
)
=
np.dot(x,x)=
np.dot(x,x)=
∑
j
=
0
n
\displaystyle \sum_{j=0}^n
j=0∑n
x
j
2
x^2_j
xj2
L
2
L2
L2 loss is defined as
def L2(yhat,y):
'''
Implement the numpy vectorized version of the L2 loss. There
:param yhat:vector of size m (predicted labels)
:param y: vector of size m (true labels)
:return:
loss--the value of the L2 loss function defined above
'''
loss=np.dot(yhat-y,yhat-y)
return loss
What to remember:
• Vectorization is very important in deep learning. It provides computational efficiency and clarity.
• You have reviewed the L1 and L2 loss.
• You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc…