Machine Learning on Coursera

最新推荐文章于 2022-03-29 00:11:51 发布

ab2296939808

最新推荐文章于 2022-03-29 00:11:51 发布

阅读量141

点赞数

分类专栏： NLP Coursera 文章标签： MachineLearning Coursera

本文链接：https://blog.csdn.net/ab2296939808/article/details/84750194

版权

NLP Coursera 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Week Six

F Score

$\begin{aligned} P &= &\dfrac{2}{\dfrac{1}{P}+\dfrac{1}{R}}\\ &= &2 \dfrac{PR}{P+R} \end{aligned}$

Week Seven

Support Vector Machine

Cost Function

$\begin{aligned} &\min_{\theta}\lbrack-\dfrac{1}{m}{\sum_{y_{i}\in Y, x_{i} \in X}{y_{i} \log h(\theta^{T}x_{i})}+(1-y_{i})\log (1-h(\theta^{T}x_{i}))+\dfrac{\lambda}{2m} \sum_{\theta_{i} \in \theta}{\theta_{i}^{2}}}\rbrack\\ &\Rightarrow \min_{\theta}[-\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\dfrac{\lambda}{2}\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\ &\Rightarrow\min_{\theta}[C\sum_{y_{i} \in Y,x_{i} \in X}{y_{i} \log{h(\theta^{T}x_{i})}+(1-y_{i})\log(1-h(\theta^{T}x_{i}}))+\sum_{\theta_{i} \in \theta }{\theta^2_{i}}]\\ \end{aligned}$
C is somewhat $\dfrac{1}{\lambda}$ .

Large C:
- lower bias, high variance
Small C:
- Higher bias, low variance
Large $\sigma^2$ : Features $f_{i}$ vary more smoothly.
- Higher bias, low variance
Small $\sigma^2$ : Features $f_{i}$ vary more sharply.
- Lower bias, high variance.
  $\begin{aligned} & \dfrac{1}{2} \sum_{\theta_{i} \in \theta}{\theta_{i}^2}\\ &s.t&\theta^{T}x_{i} \geq 1, if\ y_{i} = 1&\\ &&\theta^{T}x_{i} \leq -1, if\ y_{i} = 0& \end{aligned}$

PS

If features are too many related to m, use logistic regression or SVM without a kernel.

If n is small, m is intermediate, use SVM with Gaussian kernal.

If n is small, m is large, add more features and use logistic regression or SVM without a kernel.

Week Eight

K-means

Cost Function

It try to minimize
$\min_{\mu}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)} - \mu_{c^{(i)}}}||^2$
For the first loop, minimize the cost function by varing the centorid. For the second loop, it minimize the cost funcion with cetorid fixed and realign the centorid of every x in the training set.

Initialize

Initialize the centorids randomly. Randomly select k samples from the training set and set the centorids to these random selected samples.

It is possible that K-meas fall into the local minimum, So repeat to initialize the centorids randomly until the cost(distortion) is suitable for your purposes.

K-means converge all the time and it will not increase the cost during the training processs. More centoirds will decease the cost, if not, the k-means must fall into the local minimum and reinitialize the centorid until the cost is less.

PCA (Principal Component Analysis)

Restruct x from z meeting the below nonequation
$1-\dfrac{\dfrac{1}{m} \sum_{i=1}^{m}||x^{(i)}-x^{(i)}_{approximation}||^2}{\dfrac{1}{m} \sum_{i=1}^{m} ||x^{(i)}||^2} \geq 0.99$
PS:
the nonequation can be equal to the below
$\begin{aligned} [U, S, D] &= svd(sigma)\\ U_{reduce} &= U(:, 1:k)\\ z &= U_{reduce}' * x\\ x_{approximation} &= U_{reduce} * x\\\\ S &= \left( \begin{array}{ccc} s_{11}&0&\cdots&0\\ 0&s_{22}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&s_{nn} \end{array} \right)\\\\ \dfrac{\sum_{i=1}^{k}s_{ii}^2}{\sum_{i=1}^{m} s_{ii}^2} &\geq 0.99 \end{aligned}$

Week Nine

Anomaly Detection

Gaussian Distribution

Multivariate Gaussian Distribution takes the connection of different variants into account
$\dfrac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)}$
Single variant Gaussian Distribution is a special example of Multivariate Gaussian Distribution, where
$\Sigma = \left(\begin{array}{ccc} \sigma_{11}&&&&\\ &\sigma_{22}&&&\\ &&\ddots&&\\ &&&\sigma_{nn}&\\ \end{array}\right)$
When training the Anomaly Detection, we can use Maximum Likelihood Estimation
$\begin{aligned} \mu &= \dfrac{1}{m} \sum_{i=1}^{m}x^{(i)}\\ \Sigma &= \dfrac{1}{m} \sum_{i=1}^{m} (x^{(i)}-\mu)(x^{(i)}-\mu)^{T} \end{aligned}$
When we use single variant anomaly detection, the numerical cost is much cheaper than multivariant. But may need to add some new features to distinguish the normal and non-normal.

Recommender System

Cost Function

$\begin{aligned} J(X,\Theta) = \dfrac{1}{2} \sum_{(i,j):r(i,j)=1}((\theta^{(j)})^{T}x^{(i)}-y^{(i,j)})^2 + \dfrac{\lambda}{2}[\sum_{i=1}^{n_{m}}\sum_{k=1}^{n}(x_k^{(i)})^2 + \sum_{j=1}^{n_\mu} \sum_{k=1}^n(\theta_{k}^{(j)})^2]\\ J(X,\Theta) = \dfrac{1}{2}Sum\{(X\Theta'-Y).*R\} + \dfrac{\lambda}{2}(Sum\{\Theta.^2\} + Sum\{X.^2\}\\ \end{aligned}$
$\begin{aligned} \dfrac{\partial J}{\partial X} = ((X\Theta'-Y).*R) \Theta + \lambda X\\ \dfrac{\partial J}{\partial \Theta} = ((X\Theta'-Y).*R)'X + \lambda \Theta \end{aligned}$

ab2296939808

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning on Coursera

Week SixF ScoreP=21P+1R=2PRP+R\begin{aligned} P &amp;amp;amp;= &amp;amp;amp;\dfrac{2}{\dfrac{1}{P}+\dfrac{1}{R}}\\ &amp;amp;amp;= &amp;amp;amp;2 \dfrac{PR}{P+R}\end{aligned}P==P1+R122P+RPRWeek Seve...
复制链接

扫一扫

专栏目录