本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址:
https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=1
本文中的白色背景视频截图来自Andrew Ng的视频脚程, 思维导图为原创总结。
ML基础介绍,单变量的线性回归:
- Introduce
ML: grew out of work in AI, new capability for computers;
Examples:
-database mining
large datasets from growth of automation/web.
e.g.: web click data, medical records, biology, engineering
-Application can’t program by hand.
e.g.: autonomous helicopter, handwriting recognition, most of NLP,
Computer Vision
-self-customizing programs
e.g.:, Amazon, Netflix product recommendations
-Understanding human learning (brain, real AI).
-
What's ML
Arthur Samuel(1959): learning without being explicitly programmed
Tom Mitchell(1998): experience E, task T, performance measure P, performance on T, as measured by P, improves with experience E.
Spam email:
ML learning algorithms:
-Major tow: supervised learning/ unsupervised learning
-Others: reinforcement learning, recommender systems
- Introduction supervised learning
Regression problem 回归问题
Deal continuous value
Classification problem 分类问题
Deal with discrete values
How to allow computer to deal with an infinite number of features.
- Introduction unsupervised learning
No label data, find some structure, deal clustering
just like google news, cluster news topic
like genes analogy
Used to :
-organize computing clusters
-social network analysis
-market segmentation
-astronomical data analysis
also used to :
-cocktail party problem
Used to first build algorithm prototype in Octave
When this algorithm work, will migrate it to C++ or Java
总结: - Linear regression with one variable - model representation
some sym:
m = Number of training examples
x’s = ‘input’ variable/features
y’s = ‘output’ variable /’target’ variable
(x, y) = one training example
(xi, yi) = the ith training example
-
Linear regression with one variable - Cost function
cost function to find the minimize feature.
The normal cost function is squared error function. MSE – Mean-Square Error
Others: RMSE MAE R-Squared -
Linear regression with one variable - Cost function intuition
首先简化问题,减少特征,图形化假设函数及成本函数try Ø to h Ø (x), and J(Ø)
然后看看两个特征量的假设函数和成本函数的图形化分析(碗形图和等高线图):Bowl like shape
Contour plots
-
Linear regression with one variable - Gradient descent
:= assignment, = truth assertion
∂ learning rate
梯度下降,下降是指将特征的值进行修改,每次修改的大小是学习效率与对成本函数中的特征求偏导的乘积,这样的到的新特征的值带入成本函数,如果能降低成本函数值(即减少偏差),就认为接近我们的目标了。
can use gradient descent to minimize any cost function J.
Gradient descent + cost function to get an algorithm for linear regression
Get partial derivatives of θ0,θ1
Cost function of one variable linear regression is Convex function(bowl shape), have no local optimum and only have global optimum.
this gradient descent function called Batch : Each step of gradient descent uses all the training example.
总结: -
Linear Algebra review - Matrices and vectors
为了将简单的线性回归(一个特征)扩展到多个特征,首先要具备一些线性代数的知识。
Matrix – Rectangular array of numbers
张量Tensor
张量是一组数字,排列在一个规则的网格上,具有不同数量的轴。张量有三个指标,第一个指向行,第二个指向列,第三个指向轴。例如,V232指向第二行,第三列和第二个轴。这指的是右图张量中的值0,如下图所示:
Upper case used to refer to a matrix, and lower case used to refer to either numbers or just raw numbers or scalars or vectors.
-
Linear Algebra review - Addition and scalar multiplication
Matrix Addition
Scalar multiplication:
-
Linear Algebra review - Matrix-vector multiplicaation
change problem to matrix times vectors
can use one line in Octave
can simplify the code and more efficient
-
Linear Algebar review - matrix-matrix multiplication
First matrix’s column number = second matrix’s row number
矩阵Am*n, 矩阵Bn*o, A*B的结果是矩阵Cm*o,即所得矩阵的行列数量分别等于A矩阵的行数与B矩阵的列数
房价估计问题用矩阵表示会更精炼些:
By constructing these two matrices, can apply three hypotheses to all four house sizes.
To get all twelve predicted prices output.
-
Linear Algebra review - matrix multiplication properties
Matrix multiplication is not commutative
matrix multiplication is associative
Identity matrix 单位矩阵 is commutative
补充:
1.交换标量乘法是可交换的,但不是矩阵乘法。这意味着当我们乘以标量时,7 * 3与3 * 7相同。但是当我们将矩阵彼此相乘时,A * B与B * A不一样。
2.关联
标量和矩阵乘法都是关联的。这意味着标量乘3(5 * 3)与(3 * 5)3相同并且矩阵乘A(B * C)与(A * B)C相同。
3.分配
标量和矩阵乘法也是分布式的。这意味着3(5 + 3)与3 * 5 + 3 * 3相同,并且A(B + C)与A * B + A * C相同。
4.单位矩阵
单位矩阵是一种特殊的矩阵,但首先,我们需要定义什么是单位。数字1是一个单位,因为你与1相乘的所有东西都等于它自己。因此,与单位矩阵相乘的每个矩阵都等于它自己。例如,矩阵A乘以其单位矩阵等于A.
你可以通过以下事实来发现单位矩阵:它沿对角线有一个,而其他每个值都为零。它也是一个“平方矩阵”,意思是它的行数与列数相匹配。
-
Linear Algebra review - inverse and transpose
Inverse 逆运算
Transpose 转制
Only square matrices have inverses. Mean rows = columns -> square matrices. Am*m
Matrices that don’t have an inverse are ‘singular’ or ‘degenerate’
Singular matrix奇异矩阵
Degenerate matrix 退化矩阵
Transpose - flip the matrix along that 45 degree axis
Aij = Bji