Coursera 学习笔记｜Machine Learning by Stanford University - 吴恩达 / Week 1 - 2 /

最新推荐文章于 2024-09-17 02:13:06 发布

想念小8的第1621天

最新推荐文章于 2024-09-17 02:13:06 发布

阅读量969

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/yuuko_Z/article/details/123959663

版权

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

Chapter 1 - Introduction
Chapter 2 - Linear Regression 线性回归

博客园 - 链接： Coursera 学习笔记｜Machine Learning by Standford University - 吴恩达

Chapter 1 - Introduction

1.1 Definition

Arthur Samuel
The field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

1.2 Concepts

1.2.1 Classification of Machine Learning

Supervised Learning 监督学习：given a labeled data set; already know what a correct output/result should look like
- Regression 回归：continuous output
- Classification 分类：discrete output
Unsupervised Learning 无监督学习：given an unlabeled data set or an data set with the same labels; group the data by ourselves
- Clustering 聚类：group the data into different clusters
- Non-Clustering 非聚类
Others: Reinforcement Learning, Recommender Systems…

1.2.2 Model Representation

Training Set 训练集

$\begin{matrix} x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\ddots&\vdots&&\vdots\\ x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)} \end{matrix}$
符号说明
$m =$ the number of training examples 训练样本的数量 - 行数
$n =$ the number of features 特征数量 - 列数
$x =$ input variable/feature 输入变量/特征
$y =$ output variable/target variable 输出变量/目标变量
$(x^{(i)}_j,y^{(i)})$ ：第 $j$ 个特征的第 $i$ 个训练样本，其中 $i = 1, . . ., m$ ， $j = 1, . . ., n$

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 线性回归

$\begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\ \\ \theta_0&\theta_1&\theta_2&\cdots&\theta_n&& \end{matrix}$

2.1 Linear Regression with One Variable 单元线性回归

Hypothesis Function

$h_{\theta}(x)=\theta_0+\theta_1x$
Cost Function - Square Error Cost Function 平方误差代价函数
$J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$
Goal

$\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1)$

2.2 Multivariate Linear Regression 多元线性回归

Hypothesis Function

$KaTeX parse error: Undefined control sequence: \ at position 92: …atrix} \right],\̲ ̲x= \left[ \begi…$

$\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\ &=\theta^Tx \end{aligned}$
Cost Function

$J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2$
Goal

$\min_{\theta^T}J(\theta^T)$

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

算法过程
Repeat until convergence(simultaneous update for each $j = 1, . . ., n$ )
$\begin{aligned} \theta_j &:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\ &:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j \end{aligned}$
Feature Scaling 特征缩放
对每个特征 $x_j$ 有 $x_j={{x_j-\mu_j}\over{s_j}}$
其中 $\mu_j$ 为 $m$ 个特征 $x_j$ 的平均值， $s_j$ 为 $m$ 个特征 $x_j$ 的范围（最大值与最小值之差）或标准差。
Learning Rate 学习率

2.3.2 Normal Equation(s) 正规方程（组）

令
$KaTeX parse error: Undefined control sequence: \ at position 212: …atrix} \right],\̲ ̲y=\left[ \begin…$

其中 $X$ 为 $m\times(n+1)$ 维矩阵， $y$ 为 $m$ 维的列向量。则

$\theta=(X^TX)^{-1}X^Ty$

如果 $X^TX$ 不可逆（noninvertible），可能是因为：

Redundant features 冗余特征：存在线性相关的两个特征，需要删除其中一个；
特征过多，如 $m\leq n$ ：需要删除一些特征，或对其进行正规化（regularization）处理。

2.4 Polynomial Regression 多项式回归

If a linear $h_\theta(x)$ can’t fit the data well, we can change the behavior or curve of $h_\theta(x)$ by making it a quadratic, cubic or square root function(or any other form).
e.g.

$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}$