机器学习——吴恩达

静妮子i

已于 2022-09-27 17:05:17 修改

阅读量2.5k

点赞数 1

分类专栏：吴恩达系列课程

于 2022-09-26 19:58:08 首次发布

本文链接：https://blog.csdn.net/qq_39848541/article/details/127021123

版权

吴恩达系列课程专栏收录该内容

1 篇文章 0 订阅

订阅专栏

induction

what is machine learning？
Task T ; Experience E ; Performance P

classes contain:
在这里插入图片描述

Supervised learning

“right answer” given
对于数据集中的每个样本，我们需要预测并得出正确答案
两类问题：回归，分类
example
在这里插入图片描述

房价预测问题
监督学习：给定实际size与price（right answer）
回归问题：预测连续值

分类问题：预测离散值输出

Unsupervised learning

聚类问题

model

example

房价预测模型

房价训练集：training set of housing prices
目标：从训练集中学习如何预测房价
m:训练集大小
x:输入变量/特征
y:输入变量/预测的目标变量
(x,y)：一个训练样本
训练特定样本时使用 $x^{(i)},y^{(i)})$ 表示：第i个训练样本

how to work

在这里插入图片描述
to learing a function: x到y的映射
which called hypothesis

How to represent h?
一种回归问题的假设函数如下
在这里插入图片描述

define cost function

linear regression example
在这里插入图片描述

模型参数，如何选择？
minimize （ h(x)-y ）

最小化训练集中预测值与真实值的差的平方和

cost function——平方误差代价函数（回归中常用）

cost function
在这里插入图片描述
固定一个参数，以研究cost function
得到simplified hypothesis function and its cost function

训练集：（1，1），（2，2），（3，3）
$\theta_{1}$ 分别取1，0.5 , 0…
$J (1) = 0; J (0.5) = 0.68; J (0) = 2.3$ …
得到 $J(\theta)$ 图像：对于每个 $\theta$ 对应着一个不同的假设函数和损失值
线性回归的目标：minimize $J(\theta)$

保留两个参数的cost function研究
假设函数如下，训练集如下得到所示图像
在这里插入图片描述在这里cost function有两个自变量，通过计算可以得到3维空间中图像：使用等高线图contour plots/figures表示3D图像

每一圈椭圆上的点的J值相同
靠近最小值的点，拟合效果更好

gradient descent for minimizing the cost function

以两个参数进行举例
for function $J(\theta_{0},\theta_{1})$ ——>want $\underset{\theta_{0},\theta_{1}}{min} J(\theta_{0},\theta_{1})$

梯度下降的过程

start with some $\theta_{0},\theta_{1}$ （通常初始化为0，0）
keep changing $\theta_{0},\theta_{1}$ to reduce $J(\theta_{0},\theta_{1})$ until we hopefully end up at a minimum.

在这里插入图片描述
gradient descent algorithm

:= 赋值
$\alpha$ ：learning rate控制梯度下降的速度
$\theta_{0},\theta_{1}$ 需要同时更新

$\alpha的影响$

在这里插入图片描述
local optima局部最优点

Gradient descent for linear regression

在这里插入图片描述

linear hypothesis and squared error cost function
apply gradient descent to minimize squared error cost function

在这里插入图片描述 convex function for regression which doesn’t have any local optima（没有局部最优解，只有一个全局最优解）
batch Gradient descent
Batch：Each step of gradient descent uses all the training examples

Matrix and Vector

定义

matrix
在这里插入图片描述 vector

运算

在这里插入图片描述
matrix-vector multiplication
matrix-matrix multiplication

matrix multiplication properties

不满足交换律，满足结合律
在这里插入图片描述 inverse and transpose

将没有逆矩阵的矩阵近似看成0

不存在逆矩阵的矩阵术语称为奇异矩阵

在这里插入图片描述

multiple feature linear regression

在这里插入图片描述多特征值的回归模型
多元线性回归模型的梯度下降法

practical tricks for gradient descent

feature scaling
在这里插入图片描述特征缩放使之更快的收敛
mean normalization
均值归一化

在这里插入图片描述 learning rate

400次迭代后已收敛

if 损失函数上升，说明梯度下降not working
may use a smaller learning rate
在这里插入图片描述

vectorization

在这里插入图片描述

classification

logistic regression

在这里插入图片描述 decision boundary
是假设函数的一个属性，取决于其参数取值（取决于dataset）
non-linear decision boundaries

决策边界不是训练集的属性，而是假设本身及其参数的属性，只要给定了参数向量 $\theta$ ,就能够确定决策边界。
而训练集是为了拟合参数向量 $\theta$ 的

how to fit parameters theta for logistic regression

问题背景
在这里插入图片描述
定义损失函数
使用平方误差损失函数，得到会是一个非凸函数（有很多局部最小值，使用梯度下降法难以得到最优解）
logistic regression cost function

简化代价函数

在这里插入图片描述
gradient descent

虽然线性回归和逻辑回归的梯度下降法公式相似，但其假设函数是不同的（h(x)）

multiclass classification

拟合伪二分类器，分别学习各个类别
每个分类器都针对其中一种情况进行训练
在这里插入图片描述
预测：将输入x带入各个分类器并找出h(x)（概率）最大的类别

overfitting

什么是过拟合

线性回归中的过拟合
在这里插入图片描述 逻辑回归中的过拟合

如何解决过拟合问题

在这里插入图片描述

regularized linear regression

在这里插入图片描述

在线性回归损失函数的基础上增加正则化项

gradient descent

在这里插入图片描述

与未正则化的损失函数相比
每一次更新 $\theta$ ，都要先把 $\theta$ 乘上一个略小于1的数（ $1-\alpha \frac{\lambda}{m}$ 这是正则化后特有的）再进行下降

normal equation(正规方程求解)
在这里插入图片描述

regularized logistics regression

在这里插入图片描述
gradient descent

neural network

model

请添加图片描述

使用简单模型模拟单个神经元活动

前向传播的向量化计算

请添加图片描述

P47

。。。。

gradient checking梯度检测

假设有一个代价函数 $J(\theta)$ 图像如下，我们要估计在 $\theta$ 点的梯度
请添加图片描述

从数值上逼近求解：首先计算出 $\theta+\epsilon$ 和 $\theta-\epsilon$ ,并连接两点，则该条直线的斜率即为我们所要求解的导数近似值，可以通过计算 $\frac{J(\theta+\epsilon)-J(\theta-\epsilon)}{2\epsilon}$ 得到（双侧差分可以得到更准确的结果）

$\frac{J(\theta+\epsilon)-J(\theta)}{\epsilon}$ 单侧差分