Machine Learning-Chapter 1

Chapter 1 : Introduction

1 Machine Learning Introduction

1.1 AI vs ML vs DL

  • AI: Enables machines to mimic human behavior
  • ML: Use statistical methods to enable machines to improve with experience
  • DL: A kind of ML which makes the multi-layer neural network feasible

1.2 Machine Learning Process

Data Collection -> Data preparation -> Training -> Evaluation -> Tuning

1.3 Machine Learning Approaches

  • Supervised Learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning

1.4 Supervised Learning

The goal is to learn the mapping between a set of inputs and outputs

1.4.1 Classification

The output could be a category.

1.4.2 Regression

The output could be a real-world scalar.

1.5 Unsupervised Learning

Only input data is provided and there are no labeled example outputs to aim for

1.5.1 Clustering

Most used and is the act of creating groups with different characteristics.

1.5.2 Association

Used for recommending or finding related items.

1.5.3 Anomaly Detection

Used to separate and detect strange occurrences.

1.5.4 Dimensionality Reduction

Aim to find the most important features to reduce the original features.

1.6 Semi-supervised Learning

A mix between supervised and unsupervised approaches

1.6.1 Generative Adversarial Networks

GANs use two neural networks,a generator and discriminator and by battling against each other they both become increasingly skilled

1.7 Reinforcement Learning

In this approach,occasional positive and negative feedback is used to reinforce behavior

2 Matrix Calculus

2.1 Matrix Calculus

2.1.1 Define Jacobi Matrix

f : R n → R m . y ⃗ = f ( x ⃗ ) , y ⃗ ∈ R m , x ⃗ ∈ R n f: R^{n} \rightarrow R^{m} . \quad \vec{y}=f(\vec{x}), \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n} f:RnRm.y =f(x ),y Rm,x Rn

∂ y ⃗ ∂ x ⃗ ⇒ \frac{\partial \vec{y}}{\partial \vec{x}} \Rightarrow x y Jacobi Matrix

  1. 2 dimensions:

f : R 2 → R , y = f ( x 1 , x 2 ) f: R^{2} \rightarrow R, \quad y=f\left(x_{1}, x_{2}\right) f:R2R,y=f(x1,x2)

∇ f ( x 1 , x 2 ) = [ ∂ f ( x 1 , x 2 ) ∂ x 1 , ∂ f ( x 1 , x 2 ) ∂ x 2 ] \left.\nabla f\left(x_{1}, x_{2}\right)=[ \frac{\partial f\left(x_{1}, x_{2}\right)}{\partial x_{1}}, \quad \frac{\partial f\left(x_{1}, x_{2}\right)}{\partial x_{2}}\right] f(x1,x2)=[x1f(x1,x2),x2f(x1,x2)]

f : R 2 → R 2 , y ⃗ = [ y 1 y 2 ] = [ f 2 ( x 1 , x 2 ) f 2 ( x 1 , x 1 ) ] f: R^{2} \rightarrow R^{2}, \quad \vec{y}=\left[\begin{array}{l}y_{1} \\ y_{2}\end{array}\right]=\left[\begin{array}{l}f_2\left(x_{1}, x_{2}\right) \\ f_{2}\left(x_{1}, x_{1}\right)\end{array}\right] f:R2R2,y =[y1y2]=[f2(x1,x2)f2(x1,x1)]

J x = [ ∇ f 1 ( x 1 , x 2 ) ∇ f 2 ( x 1 , x 2 ) ] = [ ∂ f 1 ( x 1 , x 2 ) ∂ x 1 , ∂ f 1 ( x 1 , x 2 ) ∂ x 2 ∂ f 2 ( x 1 , x 0 ) ∂ x 1 , ∂ f 2 ( x 1 x 1 ) ∂ x 2 ] J_{x}=\left[\begin{array}{l}\nabla f_{1}\left(x_{1}, x_{2}\right) \\ \nabla f_{2}\left(x_1, x_{2}\right)\end{array}\right]=\left[\begin{array}{ll}\frac{\partial f_{1}\left(x_{1}, x_{2}\right)}{\partial x_{1}} & , \frac{\partial f_{1}\left(x_{1}, x_{2}\right)}{\partial x_{2}} \\ \frac{\partial f_{2}\left(x_{1}, x_{0}\right)}{\partial x_{1}} & , \frac{\partial f_{2}\left(x_{1} x_{1}\right)}{\partial x_{2}}\end{array}\right] Jx=[f1(x1,x2)f2(x1,x2)]=[x1f1(x1,x2)x1f2(x1,x0),x2f1(x1,x2),x2f2(x1x1)]

  1. m dimensions:

y ⃗ ∈ R m , x ⃗ ∈ R n . \vec{y}\in R^{m}, \quad \vec{x} \in R^{n}. y Rm,x Rn.

J x = ∂ y ⃗ ∂ x ⃗ = [ ∇ f 1 ( x ⃗ ) ∇ f 2 ( x ⃗ ) ⋮ ∇ f m ( x ⃗ ) ] J_{x}=\frac{\partial \vec{y}}{\partial \vec{x}}=\left[\begin{array}{c}\nabla f_{1}(\vec{x}) \\ \nabla f_{2}(\vec{x}) \\ \vdots \\ \nabla f_{m}(\vec{x})\end{array}\right] Jx=x y =f1(x )f2(x )fm(x )

2.1.2 Vector Sum Reduction

y = ∑ i = 1 n f i ( x ⃗ ) : R n → R y=\sum_{i=1}^{n} f_{i}(\vec{x}): R^{n} \rightarrow R y=i=1nfi(x ):RnR , $\vec{x} \in R^{n} $ , ( y ⃗ = f ( x ⃗ ) ) ⇒ x ⃗ ∈ R n , y ⃗ ∈ R m , R n → R m (\vec{y}=f(\vec{x})) \Rightarrow \vec{x} \in R^{n}, \vec{y} \in R^{m}, \quad R^{n} \rightarrow R^{m} (y =f(x ))x Rn,y Rm,RnRm : J x : m × n J_{x}: m \times n Jx:m×n

R n → R : J x : 1 × n R^{n} \rightarrow R: \quad J_{x}: 1 \times n RnR:Jx:1×n

∂ y ∂ x ⃗ = [ ∂ y ∂ x 1 , ∂ y ∂ x 2 , ⋯   , ∂ y ∂ x n ] \frac{\partial y}{\partial \vec{x}}=\left[\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \cdots, \frac{\partial y}{\partial x_{n}}\right] x y=[x1y,x2y,,xny]

= [ ∂ ∂ x 1 ∑ i = 1 n f i ( x ⃗ ) , ∂ ∂ x 2 ∑ i = 1 n f i ( x ⃗ ) , . . . . , ∂ ∂ x n ∑ i = 1 n f i ( x ⃗ ) ] =[\frac{\partial}{\partial x_{1}} \sum_{i=1}^{n} f_{i}(\vec{x}),\frac{\partial}{\partial x_{2}} \sum_{i=1}^{n} f_{i}(\vec{x}),....,\frac{\partial}{\partial x_{n}} \sum_{i=1}^{n} f_{i}(\vec{x})] =[x1i=1nfi(x ),x2i=1nfi(x ),....,xni=1nfi(x )]

= [ ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x 1 , ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x 2 , . . . , ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x n ] =[\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{1}},\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{2}},...,\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{n}}] =[i=1nx1fi(x ),i=1nx2fi(x ),...,i=1nxnfi(x )]

2.1.3 Vector Chain Rules
1) Single-variable Chain Rule: d y d x = d y d u ⋅ d u d x \frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x} dxdy=dudydxdu
2) Single-variable total-derivative Chain Rule:

∂ f ( x , u 1 , u 2 , ⋯   , u n ) ∂ x = ∂ f ∂ x + ∂ f ∂ u 1 ∂ u 1 ∂ x + ∂ f ∂ u 2 ∂ u 2 ∂ x + ⋯ ⋯ + ∂ f ∂ u n ⋅ ∂ u n ∂ x \frac{\partial f\left(x, u_{1}, u_{2}, \cdots, u_{n}\right)}{\partial x}=\frac{\partial f}{\partial x}+\frac{\partial f}{\partial u_{1}} \frac{\partial u_{1}}{\partial x}+\frac{\partial f}{\partial u_{2}} \frac{\partial u_{2}}{\partial x}+\cdots \cdots+\frac{\partial f}{\partial u_{n}} \cdot \frac{\partial u_{n}}{\partial x} xf(x,u1,u2,,un)=xf+u1fxu1+u2fxu2++unfxun = ∂ f ∂ x + ∑ i = 1 n ∂ t ∂ u i ∂ u i ∂ x =\frac{\partial f}{\partial x}+\sum_{i=1}^{n} \frac{\partial t}{\partial u_{i}} \frac{\partial u_{i}}{\partial x} =xf+i=1nuitxui

3) Vector Chain Rules:

f : R → R 2 f: R \rightarrow R^{2} f:RR2 , y ⃗ = [ f 1 ( x ) f 2 ( x ) ] \vec{y}=\left[\begin{array}{c}f_1(x) \\ f_{2}(x)\end{array}\right] y =[f1(x)f2(x)]

y ⃗ = [ y 1 y 2 ] = [ f 1 ( x ) f 2 ( x ) ] = [ ln ⁡ ( x 2 ) sin ⁡ ( 3 x ) ] \vec{y}=\left[\begin{array}{l}y_{1} \\ y_{2}\end{array}\right]=\left[\begin{array}{l}f_{1}(x) \\ f_{2}(x)\end{array}\right]=\left[\begin{array}{l}\ln \left(x^{2}\right) \\ \sin (3 x)\end{array}\right] y =[y1y2]=[f1(x)f2(x)]=[ln(x2)sin(3x)]

g ⃗ = [ g 1 ( x ) g 2 ( x ) ] = [ x 2 3 x ] \vec{g}=\left[\begin{array}{l}g_{1}(x) \\ g_{2}(x)\end{array}\right]=\left[\begin{array}{l}x^{2} \\ 3 x\end{array}\right] g =[g1(x)g2(x)]=[x23x]

[ f 1 ( x ) f 2 ( x ) ] = [ f 1 ( g ⃗ ) f 2 ( g ⃗ ) ] = [ ln ⁡ ( g 1 ) sin ⁡ ( g 2 ) ] \left[\begin{array}{c}f_{1}(x) \\ f_2(x)\end{array}\right]=\left[\begin{array}{c}f_{1}(\vec{g}) \\ f_{2}(\vec{g})\end{array}\right]=\left[\begin{array}{c}\ln \left(g_{1}\right) \\ \sin \left(g_{2}\right)\end{array}\right] [f1(x)f2(x)]=[f1(g )f2(g )]=[ln(g1)sin(g2)]

∂ y ⃗ ∂ x R → R 2 : J x : 2 × 1 \frac{\partial \vec{y}}{\partial x} \quad R \rightarrow R^{2}: \quad J_{x}: 2 \times 1 xy RR2:Jx:2×1

∂ y ⃗ ∂ x = [ ∂ f 1 ( g ⃗ ) ∂ x ] ∂ f 2 ( g ⃗ ) ∂ x ] \frac{\partial \vec{y}}{\partial x}=\left[\begin{array}{l}\left.\frac{\partial f_1 (\vec{g})}{\partial x}\right] \\ \frac{\partial f_2 (\vec{g})}{\partial x}\end{array}\right] xy =[xf1(g )]xf2(g )]= [ ∂ f 1 ∂ g 1 ⋅ ∂ y 1 ∂ x + ∂ f 1 ∂ g 2 ∂ g 2 ∂ x ∂ f 2 ∂ g 1 ⋅ ∂ y 1 ∂ x + ∂ f 2 ∂ g 2 ⋅ ∂ g 2 ∂ x ] \left[\begin{array}{l}\frac{\partial f_{1}}{\partial g_{1}} \cdot \frac{\partial y_{1}}{\partial x}+\frac{\partial f_{1}}{\partial g_{2}} \frac{\partial g_{2}}{\partial x} \\ \frac{\partial f_{2}}{\partial g_{1}} \cdot \frac{\partial y_{1}}{\partial x}+\frac{\partial f_{2}}{\partial g_{2}} \cdot \frac{\partial g_{2}}{\partial x}\end{array}\right] [g1f1xy1+g2f1xg2g1f2xy1+g2f2xg2]= [ 1 g 1 2 x + 0 0 + cos ⁡ ( g 2 ) ⋅ 3 ] \left[\begin{array}{cc}\frac{1}{g_1} 2 x+0 \\ 0 & +\cos \left(g_{2}\right) \cdot 3\end{array}\right] [g112x+00+cos(g2)3]

∂ x f ⃗ ( g ⃗ ( x ) ) \frac{\partial}{x} \vec{f}(\vec{g}(x)) xf (g (x))= ∂ f ⃗ ∂ g ⃗ ⋅ ∂ g → ∂ x \frac{\partial \vec{f}}{\partial \vec{g}} \cdot \frac{\overrightarrow{\partial g}}{\partial x} g f xg

2.1.4 Matrix Differention


Proposition 5:

y ⃗ = A x ⃗ , y ⃗ ∈ R m , x ⃗ ∈ R n , A ∈ R m × n \vec{y}=\mathbb{A} \vec{x}, \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n}, \mathbb{A} \in R^{m \times n} y =Ax ,y Rm,x Rn,ARm×n , A \mathbb{A} A doesn’t depend on x ⃗ \vec{x} x

∂ y ⃗ ∂ x ⃗ = A \frac{\partial \vec{y}}{\partial \vec{x}}=\mathbb{A} x y =A

Proposition 6:

y ⃗ = A x ⃗ , y ⃗ ∈ R n , x ⃗ ∈ R n , A ∈ R m × n \vec{y}=\mathbb{A} \vec{x}, \quad \vec{y}\in R^{n}, \quad \vec{x} \in R^{n}, \quad \mathbb{A} \in R^{m \times n} y =Ax ,y Rn,x Rn,ARm×n , A \mathbb{A} A doesn’t depend on x ⃗ \vec{x} x , Suppose x ⃗ \vec{x} x is a function of $ \vec{z} $ , A \mathbb{A} A is independent of z ⃗ \vec{z} z ,

Then: ∂ y ⃗ ∂ z ⃗ = A ⋅ ∂ x ⃗ ∂ z ⃗ \frac{\partial \vec{y}}{\partial \vec{z}}=\mathbb{A} \cdot \frac{\partial \vec{x}}{\partial \vec{z}} z y =Az x

Pf : ∂ y ⃗ ∂ z ⃗ = ∂ y ⃗ ∂ x ⃗ ⋅ ∂ x ⃗ ∂ z ⃗ = A ⋅ ∂ x ⃗ ∂ z ⃗ \frac{\partial \vec{y}}{\partial \vec{z}}=\frac{\partial \vec{y}}{\partial \vec{x}} \cdot \frac{\partial \vec{x}}{\partial \vec{z}}=\mathbb{A} \cdot \frac{\partial \vec{x}}{\partial \vec{z}} z y =x y z x =Az x

Proposition 7:

α = y ⃗ ⊤ A x ⃗ , α ∈ R ′ , y ⃗ ∈ R m , x ⃗ ∈ R n \alpha=\vec{y}^{\top} \mathbb{A} \vec{x}, \quad \alpha \in R^{\prime}, \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n} α=y Ax ,αR,y Rm,x Rn , A ∈ R m × n \mathbb{A} \in R^{m \times n} ARm×n , A \mathbb{A} A is independent of x ⃗ , y ⃗ \vec{x},\vec{y} x ,y .

Then : ∂ ∂ ∂ x ⃗ = y ⃗ ′ A \frac{\partial \partial}{\partial \vec{x}}=\vec{y}^{\prime} \mathbb{A} x =y A , proposition 5 : y ⃗ ′ A = B \vec{y}^{\prime} \mathbb{A}=B y A=B , α = B x ⃗ \alpha =B \vec{x} α=Bx ⇒ \Rightarrow ∂ α ∂ x ⃗ = B = y ⃗ ′ A \frac{\partial \alpha}{\partial \vec{x}}=B=\vec{y}^{\prime} \mathbb{A} x α=B=y A

Then : ∂ α ∂ y ˙ = x ⃗ T ⋅ A \frac{\partial \alpha}{\partial \dot{y}}=\vec{x}^{T} \cdot \mathbb{A} y˙α=x TA

Pf : $\alpha =\vec{y}^{\top} \mathbb{A} \vec{x} \quad, \quad \alpha^{\top}=\alpha $ , α = α T = ( y ⃗ τ A x ⃗ ) T = x ⃗ ′ A T y ⃗ \alpha=\alpha^{T}=\left(\vec{y}^{\tau} \mathbb{A} \vec{x}\right)^{T}=\vec{x}^{\prime} \mathbb{A}^{T} \vec{y} α=αT=(y τAx )T=x ATy

∂ ∂ ∂ y = x ⃗ ⊤ A ⊤ \frac{\partial \partial}{\partial y}=\vec{x}^{\top} \mathbb{A}^{\top} y=x A

Proposition :

Let the scalar α \alpha α be defined by : α = y ⃗ T ⋅ x ⃗ , y ⃗ ∈ R n , x ⃗ ∈ R n \alpha=\vec{y}^{T} \cdot \vec{x}, \quad \vec{y} \in R^{n}, \quad \vec{x} \in R^{n} α=y Tx ,y Rn,x Rn , y ⃗ , x ⃗ \vec{y} , \vec{x} y ,x are function vector z ⃗ \vec{z} z , then : ∂ α ∂ z = x ⃗ ⊤ ∂ y ⃗ ∂ z ⃗ + y ⃗ ∂ x ⃗ ∂ z ⃗ \frac{\partial \alpha}{\partial z}=\vec{x}^{\top} \frac{\partial \vec{y}}{\partial \vec{z}}+\vec{y} \frac{\partial \vec{x}}{\partial \vec{z}} zα=x z y +y z x

Pf : ∂ α ∂ z = ∂ α ∂ y ⃗ ∂ y ⃗ ∂ z + ∂ α ∂ x ⃗ ∂ x ⃗ ∂ z ⃗ = x ⃗ ⊤ ∂ y ⃗ ∂ z ⃗ + ψ ⃗ ∂ x ⃗ ∂ z \frac{\partial \alpha}{\partial z}=\frac{\partial \alpha}{\partial \vec{y}} \frac{\partial \vec{y}}{\partial z}+\frac{\partial \alpha}{\partial \vec{x}} \frac{\partial \vec{x}}{\partial \vec{z}}=\vec{x}^{\top} \frac{\partial \vec{y}}{\partial \vec{z}}+\vec{\psi} \frac{\partial \vec{x}}{\partial z} zα=y αzy +x αz x =x z y +ψ zx

Proposition 8 :

α = x ⃗ ′ A ⋅ x ⃗ , x ⃗ ∈ R n , A ∈ R n x n \alpha=\vec{x}^{\prime} \mathbb{A} \cdot \vec{x}, \quad \vec{x} \in R^{n} , \quad \mathbb{A} \in R^{n x n} α=x Ax ,x Rn,ARnxn , A \mathbb{A} A doesn’t depend on x,then : ∂ α ∂ x = x ⃗ ⊤ ( A + A ⊤ ) \frac{\partial \alpha}{\partial x}=\vec{x}^{\top}\left(\mathbb{A}+\mathbb{A}^{\top}\right) xα=x (A+A)

Pf : α = x ⃗ ′ A ⋅ x ⃗ , y ⃗ = x ⃗ \alpha=\vec{x}^{\prime} \mathbb{A} \cdot \vec{x}, \quad \vec{y}=\vec{x} α=x Ax ,y =x

α = y ⃗ ′ A ⋅ x ⃗ , y ⃗ ⋅ x ⃗ , x ⃗ , x ⃗ \alpha=\vec{y}^{\prime} \mathbb{A} \cdot \vec{x} \quad, \quad \vec{y} \cdot \vec{x}, \quad \vec{x} , \vec{x} α=y Ax ,y x ,x ,x

Proposition 10 :

∂ α ∂ x ⃗ = ∂ α ∂ y ⃗ ⋅ ∂ y ⃗ ∂ x ⃗ + ∂ ∂ ∂ x ⃗ ⋅ ∂ x ⃗ ∂ x ˙ \frac{\partial \alpha}{\partial \vec{x}}=\frac{\partial \alpha}{\partial \vec{y}} \cdot \frac{\partial \vec{y}}{\partial \vec{x}}+\frac{\partial \partial}{\partial \vec{x}} \cdot \frac{\partial \vec{x}}{\partial \dot{x}} x α=y αx y +x x˙x

= ( A ⋅ x ⃗ ) ⊤ + y ⃗ ′ A = x ⃗ ⊤ A ⊤ + x ⃗ ′ A = x ⃗ r ( A + A ⊤ ) =(\mathbb{A} \cdot \vec{x})^{\top}+\vec{y}^{\prime}\mathbb{A}=\vec{x}^{\top}\mathbb{A}^{\top}+\vec{x}^{\prime} \mathbb{A}=\vec{x}^{r}\left(\mathbb{A}+\mathbb{A}^{\top}\right) =(Ax )+y A=x A+x A=x r(A+A)

Proposition 9 :

A \mathbb{A} A is symetric , then A = A ⊤ \mathbb{A}=\mathbb{A}^{\top} A=A

After,we can have Proposition 10 : = 2 x ⃗ ⋅ A 2 \vec{x} \cdot \mathbb{A} 2x A

Proposition 15 :

A − 1 A = I A^{-1} A=I A1A=I

A − 1 ∂ A ∂ α + ∂ A − 1 ∂ α A = 0 A^{-1} \frac{\partial A}{\partial \alpha}+\frac{\partial A^{-1}}{\partial \alpha} A=0 A1αA+αA1A=0

