Machine Learning-Chapter 1

Chapter 1 : Introduction

1 Machine Learning Introduction

1.1 AI vs ML vs DL

  • AI: Enables machines to mimic human behavior
  • ML: Use statistical methods to enable machines to improve with experience
  • DL: A kind of ML which makes the multi-layer neural network feasible

1.2 Machine Learning Process

D a t a C o l l e c t i o n − > D a t a p r e p a r a t i o n − > T r a i n i n g − > E v a l u a t i o n − > T u n i n g Data Collection -> Data preparation -> Training -> Evaluation -> Tuning DataCollection>Datapreparation>Training>Evaluation>Tuning

1.3 Machine Learning Approaches

  • Supervised Learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning

1.4 Supervised Learning

The goal is to learn the mapping between a set of inputs and outputs

1.4.1 Classification

The output could be a category.

1.4.2 Regression

The output could be a real-world scalar.

1.5 Unsupervised Learning

Only input data is provided and there are no labeled example outputs to aim for

1.5.1 Clustering

Most used and is the act of creating groups with different characteristics.

1.5.2 Association

Used for recommending or finding related items.

1.5.3 Anomaly Detection

Used to separate and detect strange occurrences.

1.5.4 Dimensionality Reduction

Aim to find the most important features to reduce the original features.

1.6 Semi-supervised Learning

A mix between supervised and unsupervised approaches

1.6.1 Generative Adversarial Networks

GANs use two neural networks,a generator and discriminator and by battling against each other they both become increasingly skilled

1.7 Reinforcement Learning

In this approach,occasional positive and negative feedback is used to reinforce behavior

2 Matrix Calculus

2.1 Matrix Calculus

2.1.1 Define Jacobi Matrix

f : R n → R m . y ⃗ = f ( x ⃗ ) , y ⃗ ∈ R m , x ⃗ ∈ R n f: R^{n} \rightarrow R^{m} . \quad \vec{y}=f(\vec{x}), \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n} f:RnRm.y =f(x ),y Rm,x Rn

∂ y ⃗ ∂ x ⃗ ⇒ \frac{\partial \vec{y}}{\partial \vec{x}} \Rightarrow x y Jacobi Matrix

Example:
  1. 2 dimensions:

f : R 2 → R , y = f ( x 1 , x 2 ) f: R^{2} \rightarrow R, \quad y=f\left(x_{1}, x_{2}\right) f:R2R,y=f(x1,x2)

∇ f ( x 1 , x 2 ) = [ ∂ f ( x 1 , x 2 ) ∂ x 1 , ∂ f ( x 1 , x 2 ) ∂ x 2 ] \left.\nabla f\left(x_{1}, x_{2}\right)=[ \frac{\partial f\left(x_{1}, x_{2}\right)}{\partial x_{1}}, \quad \frac{\partial f\left(x_{1}, x_{2}\right)}{\partial x_{2}}\right] f(x1,x2)=[x1f(x1,x2),x2f(x1,x2)]

f : R 2 → R 2 , y ⃗ = [ y 1 y 2 ] = [ f 2 ( x 1 , x 2 ) f 2 ( x 1 , x 1 ) ] f: R^{2} \rightarrow R^{2}, \quad \vec{y}=\left[\begin{array}{l}y_{1} \\ y_{2}\end{array}\right]=\left[\begin{array}{l}f_2\left(x_{1}, x_{2}\right) \\ f_{2}\left(x_{1}, x_{1}\right)\end{array}\right] f:R2R2,y =[y1y2]=[f2(x1,x2)f2(x1,x1)]

J x = [ ∇ f 1 ( x 1 , x 2 ) ∇ f 2 ( x 1 , x 2 ) ] = [ ∂ f 1 ( x 1 , x 2 ) ∂ x 1 , ∂ f 1 ( x 1 , x 2 ) ∂ x 2 ∂ f 2 ( x 1 , x 0 ) ∂ x 1 , ∂ f 2 ( x 1 x 1 ) ∂ x 2 ] J_{x}=\left[\begin{array}{l}\nabla f_{1}\left(x_{1}, x_{2}\right) \\ \nabla f_{2}\left(x_1, x_{2}\right)\end{array}\right]=\left[\begin{array}{ll}\frac{\partial f_{1}\left(x_{1}, x_{2}\right)}{\partial x_{1}} & , \frac{\partial f_{1}\left(x_{1}, x_{2}\right)}{\partial x_{2}} \\ \frac{\partial f_{2}\left(x_{1}, x_{0}\right)}{\partial x_{1}} & , \frac{\partial f_{2}\left(x_{1} x_{1}\right)}{\partial x_{2}}\end{array}\right] Jx=[f1(x1,x2)f2(x1,x2)]=[x1f1(x1,x2)x1f2(x1,x0),x2f1(x1,x2),x2f2(x1x1)]

  1. m dimensions:

y ⃗ ∈ R m , x ⃗ ∈ R n . \vec{y}\in R^{m}, \quad \vec{x} \in R^{n}. y Rm,x Rn.

J x = ∂ y ⃗ ∂ x ⃗ = [ ∇ f 1 ( x ⃗ ) ∇ f 2 ( x ⃗ ) ⋮ ∇ f m ( x ⃗ ) ] J_{x}=\frac{\partial \vec{y}}{\partial \vec{x}}=\left[\begin{array}{c}\nabla f_{1}(\vec{x}) \\ \nabla f_{2}(\vec{x}) \\ \vdots \\ \nabla f_{m}(\vec{x})\end{array}\right] Jx=x y =f1(x )f2(x )fm(x )

2.1.2 Vector Sum Reduction

y = ∑ i = 1 n f i ( x ⃗ ) : R n → R y=\sum_{i=1}^{n} f_{i}(\vec{x}): R^{n} \rightarrow R y=i=1nfi(x ):RnR , $\vec{x} \in R^{n} $ , ( y ⃗ = f ( x ⃗ ) ) ⇒ x ⃗ ∈ R n , y ⃗ ∈ R m , R n → R m (\vec{y}=f(\vec{x})) \Rightarrow \vec{x} \in R^{n}, \vec{y} \in R^{m}, \quad R^{n} \rightarrow R^{m} (y =f(x ))x Rn,y Rm,RnRm : J x : m × n J_{x}: m \times n Jx:m×n

R n → R : J x : 1 × n R^{n} \rightarrow R: \quad J_{x}: 1 \times n RnR:Jx:1×n

∂ y ∂ x ⃗ = [ ∂ y ∂ x 1 , ∂ y ∂ x 2 , ⋯   , ∂ y ∂ x n ] \frac{\partial y}{\partial \vec{x}}=\left[\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \cdots, \frac{\partial y}{\partial x_{n}}\right] x y=[x1y,x2y,,xny]

= [ ∂ ∂ x 1 ∑ i = 1 n f i ( x ⃗ ) , ∂ ∂ x 2 ∑ i = 1 n f i ( x ⃗ ) , . . . . , ∂ ∂ x n ∑ i = 1 n f i ( x ⃗ ) ] =[\frac{\partial}{\partial x_{1}} \sum_{i=1}^{n} f_{i}(\vec{x}),\frac{\partial}{\partial x_{2}} \sum_{i=1}^{n} f_{i}(\vec{x}),....,\frac{\partial}{\partial x_{n}} \sum_{i=1}^{n} f_{i}(\vec{x})] =[x1i=1nfi(x ),x2i=1nfi(x ),....,xni=1nfi(x )]

= [ ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x 1 , ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x 2 , . . . , ∑ i = 1 n ∂ f i ⋅ ( x ⃗ ) ∂ x n ] =[\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{1}},\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{2}},...,\sum_{i=1}^{n} \frac{\partial f_i \cdot(\vec{x})}{\partial x_{n}}] =[i=1nx1fi(x ),i=1nx2fi(x ),...,i=1nxnfi(x )]

2.1.3 Vector Chain Rules
1) Single-variable Chain Rule: d y d x = d y d u ⋅ d u d x \frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x} dxdy=dudydxdu
2) Single-variable total-derivative Chain Rule:

∂ f ( x , u 1 , u 2 , ⋯   , u n ) ∂ x = ∂ f ∂ x + ∂ f ∂ u 1 ∂ u 1 ∂ x + ∂ f ∂ u 2 ∂ u 2 ∂ x + ⋯ ⋯ + ∂ f ∂ u n ⋅ ∂ u n ∂ x \frac{\partial f\left(x, u_{1}, u_{2}, \cdots, u_{n}\right)}{\partial x}=\frac{\partial f}{\partial x}+\frac{\partial f}{\partial u_{1}} \frac{\partial u_{1}}{\partial x}+\frac{\partial f}{\partial u_{2}} \frac{\partial u_{2}}{\partial x}+\cdots \cdots+\frac{\partial f}{\partial u_{n}} \cdot \frac{\partial u_{n}}{\partial x} xf(x,u1,u2,,un)=xf+u1fxu1+u2fxu2++unfxun = ∂ f ∂ x + ∑ i = 1 n ∂ t ∂ u i ∂ u i ∂ x =\frac{\partial f}{\partial x}+\sum_{i=1}^{n} \frac{\partial t}{\partial u_{i}} \frac{\partial u_{i}}{\partial x} =xf+i=1nuitxui

3) Vector Chain Rules:

f : R → R 2 f: R \rightarrow R^{2} f:RR2 , y ⃗ = [ f 1 ( x ) f 2 ( x ) ] \vec{y}=\left[\begin{array}{c}f_1(x) \\ f_{2}(x)\end{array}\right] y =[f1(x)f2(x)]

y ⃗ = [ y 1 y 2 ] = [ f 1 ( x ) f 2 ( x ) ] = [ ln ⁡ ( x 2 ) sin ⁡ ( 3 x ) ] \vec{y}=\left[\begin{array}{l}y_{1} \\ y_{2}\end{array}\right]=\left[\begin{array}{l}f_{1}(x) \\ f_{2}(x)\end{array}\right]=\left[\begin{array}{l}\ln \left(x^{2}\right) \\ \sin (3 x)\end{array}\right] y =[y1y2]=[f1(x)f2(x)]=[ln(x2)sin(3x)]

g ⃗ = [ g 1 ( x ) g 2 ( x ) ] = [ x 2 3 x ] \vec{g}=\left[\begin{array}{l}g_{1}(x) \\ g_{2}(x)\end{array}\right]=\left[\begin{array}{l}x^{2} \\ 3 x\end{array}\right] g =[g1(x)g2(x)]=[x23x]

[ f 1 ( x ) f 2 ( x ) ] = [ f 1 ( g ⃗ ) f 2 ( g ⃗ ) ] = [ ln ⁡ ( g 1 ) sin ⁡ ( g 2 ) ] \left[\begin{array}{c}f_{1}(x) \\ f_2(x)\end{array}\right]=\left[\begin{array}{c}f_{1}(\vec{g}) \\ f_{2}(\vec{g})\end{array}\right]=\left[\begin{array}{c}\ln \left(g_{1}\right) \\ \sin \left(g_{2}\right)\end{array}\right] [f1(x)f2(x)]=[f1(g )f2(g )]=[ln(g1)sin(g2)]

∂ y ⃗ ∂ x R → R 2 : J x : 2 × 1 \frac{\partial \vec{y}}{\partial x} \quad R \rightarrow R^{2}: \quad J_{x}: 2 \times 1 xy RR2:Jx:2×1

∂ y ⃗ ∂ x = [ ∂ f 1 ( g ⃗ ) ∂ x ] ∂ f 2 ( g ⃗ ) ∂ x ] \frac{\partial \vec{y}}{\partial x}=\left[\begin{array}{l}\left.\frac{\partial f_1 (\vec{g})}{\partial x}\right] \\ \frac{\partial f_2 (\vec{g})}{\partial x}\end{array}\right] xy =[xf1(g )]xf2(g )]= [ ∂ f 1 ∂ g 1 ⋅ ∂ y 1 ∂ x + ∂ f 1 ∂ g 2 ∂ g 2 ∂ x ∂ f 2 ∂ g 1 ⋅ ∂ y 1 ∂ x + ∂ f 2 ∂ g 2 ⋅ ∂ g 2 ∂ x ] \left[\begin{array}{l}\frac{\partial f_{1}}{\partial g_{1}} \cdot \frac{\partial y_{1}}{\partial x}+\frac{\partial f_{1}}{\partial g_{2}} \frac{\partial g_{2}}{\partial x} \\ \frac{\partial f_{2}}{\partial g_{1}} \cdot \frac{\partial y_{1}}{\partial x}+\frac{\partial f_{2}}{\partial g_{2}} \cdot \frac{\partial g_{2}}{\partial x}\end{array}\right] [g1f1xy1+g2f1xg2g1f2xy1+g2f2xg2]= [ 1 g 1 2 x + 0 0 + cos ⁡ ( g 2 ) ⋅ 3 ] \left[\begin{array}{cc}\frac{1}{g_1} 2 x+0 \\ 0 & +\cos \left(g_{2}\right) \cdot 3\end{array}\right] [g112x+00+cos(g2)3]

∂ x f ⃗ ( g ⃗ ( x ) ) \frac{\partial}{x} \vec{f}(\vec{g}(x)) xf (g (x))= ∂ f ⃗ ∂ g ⃗ ⋅ ∂ g → ∂ x \frac{\partial \vec{f}}{\partial \vec{g}} \cdot \frac{\overrightarrow{\partial g}}{\partial x} g f xg

2.1.4 Matrix Differention

consistent

Proposition 5:

y ⃗ = A x ⃗ , y ⃗ ∈ R m , x ⃗ ∈ R n , A ∈ R m × n \vec{y}=\mathbb{A} \vec{x}, \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n}, \mathbb{A} \in R^{m \times n} y =Ax ,y Rm,x Rn,ARm×n , A \mathbb{A} A doesn’t depend on x ⃗ \vec{x} x

∂ y ⃗ ∂ x ⃗ = A \frac{\partial \vec{y}}{\partial \vec{x}}=\mathbb{A} x y =A

Proposition 6:

y ⃗ = A x ⃗ , y ⃗ ∈ R n , x ⃗ ∈ R n , A ∈ R m × n \vec{y}=\mathbb{A} \vec{x}, \quad \vec{y}\in R^{n}, \quad \vec{x} \in R^{n}, \quad \mathbb{A} \in R^{m \times n} y =Ax ,y Rn,x Rn,ARm×n , A \mathbb{A} A doesn’t depend on x ⃗ \vec{x} x , Suppose x ⃗ \vec{x} x is a function of $ \vec{z} $ , A \mathbb{A} A is independent of z ⃗ \vec{z} z ,

Then: ∂ y ⃗ ∂ z ⃗ = A ⋅ ∂ x ⃗ ∂ z ⃗ \frac{\partial \vec{y}}{\partial \vec{z}}=\mathbb{A} \cdot \frac{\partial \vec{x}}{\partial \vec{z}} z y =Az x

Pf : ∂ y ⃗ ∂ z ⃗ = ∂ y ⃗ ∂ x ⃗ ⋅ ∂ x ⃗ ∂ z ⃗ = A ⋅ ∂ x ⃗ ∂ z ⃗ \frac{\partial \vec{y}}{\partial \vec{z}}=\frac{\partial \vec{y}}{\partial \vec{x}} \cdot \frac{\partial \vec{x}}{\partial \vec{z}}=\mathbb{A} \cdot \frac{\partial \vec{x}}{\partial \vec{z}} z y =x y z x =Az x

Proposition 7:

α = y ⃗ ⊤ A x ⃗ , α ∈ R ′ , y ⃗ ∈ R m , x ⃗ ∈ R n \alpha=\vec{y}^{\top} \mathbb{A} \vec{x}, \quad \alpha \in R^{\prime}, \quad \vec{y} \in R^{m}, \quad \vec{x} \in R^{n} α=y Ax ,αR,y Rm,x Rn , A ∈ R m × n \mathbb{A} \in R^{m \times n} ARm×n , A \mathbb{A} A is independent of x ⃗ , y ⃗ \vec{x},\vec{y} x ,y .

Then : ∂ ∂ ∂ x ⃗ = y ⃗ ′ A \frac{\partial \partial}{\partial \vec{x}}=\vec{y}^{\prime} \mathbb{A} x =y A , proposition 5 : y ⃗ ′ A = B \vec{y}^{\prime} \mathbb{A}=B y A=B , α = B x ⃗ \alpha =B \vec{x} α=Bx ⇒ \Rightarrow ∂ α ∂ x ⃗ = B = y ⃗ ′ A \frac{\partial \alpha}{\partial \vec{x}}=B=\vec{y}^{\prime} \mathbb{A} x α=B=y A

Then : ∂ α ∂ y ˙ = x ⃗ T ⋅ A \frac{\partial \alpha}{\partial \dot{y}}=\vec{x}^{T} \cdot \mathbb{A} y˙α=x TA

Pf : $\alpha =\vec{y}^{\top} \mathbb{A} \vec{x} \quad, \quad \alpha^{\top}=\alpha $ , α = α T = ( y ⃗ τ A x ⃗ ) T = x ⃗ ′ A T y ⃗ \alpha=\alpha^{T}=\left(\vec{y}^{\tau} \mathbb{A} \vec{x}\right)^{T}=\vec{x}^{\prime} \mathbb{A}^{T} \vec{y} α=αT=(y τAx )T=x ATy

∂ ∂ ∂ y = x ⃗ ⊤ A ⊤ \frac{\partial \partial}{\partial y}=\vec{x}^{\top} \mathbb{A}^{\top} y=x A

Proposition :

Let the scalar α \alpha α be defined by : α = y ⃗ T ⋅ x ⃗ , y ⃗ ∈ R n , x ⃗ ∈ R n \alpha=\vec{y}^{T} \cdot \vec{x}, \quad \vec{y} \in R^{n}, \quad \vec{x} \in R^{n} α=y Tx ,y Rn,x Rn , y ⃗ , x ⃗ \vec{y} , \vec{x} y ,x are function vector z ⃗ \vec{z} z , then : ∂ α ∂ z = x ⃗ ⊤ ∂ y ⃗ ∂ z ⃗ + y ⃗ ∂ x ⃗ ∂ z ⃗ \frac{\partial \alpha}{\partial z}=\vec{x}^{\top} \frac{\partial \vec{y}}{\partial \vec{z}}+\vec{y} \frac{\partial \vec{x}}{\partial \vec{z}} zα=x z y +y z x

Pf : ∂ α ∂ z = ∂ α ∂ y ⃗ ∂ y ⃗ ∂ z + ∂ α ∂ x ⃗ ∂ x ⃗ ∂ z ⃗ = x ⃗ ⊤ ∂ y ⃗ ∂ z ⃗ + ψ ⃗ ∂ x ⃗ ∂ z \frac{\partial \alpha}{\partial z}=\frac{\partial \alpha}{\partial \vec{y}} \frac{\partial \vec{y}}{\partial z}+\frac{\partial \alpha}{\partial \vec{x}} \frac{\partial \vec{x}}{\partial \vec{z}}=\vec{x}^{\top} \frac{\partial \vec{y}}{\partial \vec{z}}+\vec{\psi} \frac{\partial \vec{x}}{\partial z} zα=y αzy +x αz x =x z y +ψ zx

Proposition 8 :

α = x ⃗ ′ A ⋅ x ⃗ , x ⃗ ∈ R n , A ∈ R n x n \alpha=\vec{x}^{\prime} \mathbb{A} \cdot \vec{x}, \quad \vec{x} \in R^{n} , \quad \mathbb{A} \in R^{n x n} α=x Ax ,x Rn,ARnxn , A \mathbb{A} A doesn’t depend on x,then : ∂ α ∂ x = x ⃗ ⊤ ( A + A ⊤ ) \frac{\partial \alpha}{\partial x}=\vec{x}^{\top}\left(\mathbb{A}+\mathbb{A}^{\top}\right) xα=x (A+A)

Pf : α = x ⃗ ′ A ⋅ x ⃗ , y ⃗ = x ⃗ \alpha=\vec{x}^{\prime} \mathbb{A} \cdot \vec{x}, \quad \vec{y}=\vec{x} α=x Ax ,y =x

α = y ⃗ ′ A ⋅ x ⃗ , y ⃗ ⋅ x ⃗ , x ⃗ , x ⃗ \alpha=\vec{y}^{\prime} \mathbb{A} \cdot \vec{x} \quad, \quad \vec{y} \cdot \vec{x}, \quad \vec{x} , \vec{x} α=y Ax ,y x ,x ,x

Proposition 10 :

∂ α ∂ x ⃗ = ∂ α ∂ y ⃗ ⋅ ∂ y ⃗ ∂ x ⃗ + ∂ ∂ ∂ x ⃗ ⋅ ∂ x ⃗ ∂ x ˙ \frac{\partial \alpha}{\partial \vec{x}}=\frac{\partial \alpha}{\partial \vec{y}} \cdot \frac{\partial \vec{y}}{\partial \vec{x}}+\frac{\partial \partial}{\partial \vec{x}} \cdot \frac{\partial \vec{x}}{\partial \dot{x}} x α=y αx y +x x˙x

= ( A ⋅ x ⃗ ) ⊤ + y ⃗ ′ A = x ⃗ ⊤ A ⊤ + x ⃗ ′ A = x ⃗ r ( A + A ⊤ ) =(\mathbb{A} \cdot \vec{x})^{\top}+\vec{y}^{\prime}\mathbb{A}=\vec{x}^{\top}\mathbb{A}^{\top}+\vec{x}^{\prime} \mathbb{A}=\vec{x}^{r}\left(\mathbb{A}+\mathbb{A}^{\top}\right) =(Ax )+y A=x A+x A=x r(A+A)

Proposition 9 :

A \mathbb{A} A is symetric , then A = A ⊤ \mathbb{A}=\mathbb{A}^{\top} A=A

After,we can have Proposition 10 : = 2 x ⃗ ⋅ A 2 \vec{x} \cdot \mathbb{A} 2x A

Proposition 15 :

A − 1 A = I A^{-1} A=I A1A=I

A − 1 ∂ A ∂ α + ∂ A − 1 ∂ α A = 0 A^{-1} \frac{\partial A}{\partial \alpha}+\frac{\partial A^{-1}}{\partial \alpha} A=0 A1αA+αA1A=0

Statistical Reinforcement Learning: Modern Machine Learning Approaches Masashi Sugiyama Taylor & Francis, 16 Mar 2015 - Business & Economics - 206 pages Reinforcement learning is a mathematical framework for developing computer agents that can learn an optimal behavior by relating generic reward signals with its past actions. With numerous successful applications in business intelligence, plant control, and gaming, the RL framework is ideal for decision making in unknown environments with large amounts of data. Supplying an up-to-date and accessible introduction to the field, Statistical Reinforcement Learning: Modern Machine Learning Approaches presents fundamental concepts and practical algorithms of statistical reinforcement learning from the modern machine learning viewpoint. It covers various types of RL approaches, including model-based and model-free approaches, policy iteration, and policy search methods. Covers the range of reinforcement learning algorithms from a modern perspective Lays out the associated optimization problems for each reinforcement learning scenario covered Provides thought-provoking statistical treatment of reinforcement learning algorithms The book covers approaches recently introduced in the data mining and machine learning fields to provide a systematic bridge between RL and data mining/machine learning researchers. It presents state-of-the-art results, including dimensionality reduction in RL and risk-sensitive RL. Numerous illustrative examples are included to help readers understand the intuition and usefulness of reinforcement learning techniques. This book is an ideal resource for graduate-level students in computer science and applied statistics programs, as well as researchers and engineers in related fields.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值