机器学习随记【day01】

最新推荐文章于 2023-07-16 18:14:12 发布

多读书好嘛

最新推荐文章于 2023-07-16 18:14:12 发布

阅读量669

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/ztyLOVElearning/article/details/122443316

版权

内容主要出自吴恩达机器学习网课

机器学习

Grew out of in AI 起于AI

New capabilitiy for computers 计算机开发的新功能

部分应用

Data mining 数据挖掘

Applications cant program by hand 人无法手写的程序

Self-customizing programs 私人定制程序

Understanding human learning(brain,real AI) 理解人类学习与大脑

定义

Arthur Samuel (1959)：Machine Learning: Field ofstudy that gives computers the ability to learnwithout being explicitly programmed.

在没有明确设置的情况下，使计算机具有学习能力的研究领域

Tom Mitchell（1998）：Well-posed Learning Problem：A computer program is said to learn from experience E with respect to some task T and some performance measure P，if its performance on T，as measured by P，improves with experience E.

一个适当的学习问题定义如下：计算机程序从经验 E （程序与自己下上万次跳棋）中学习，解决某一任务 T （玩跳棋），进行某一性能度量 P （与新对手玩跳棋时赢的概率），通过 P 测定在 T 上的表现因经验 E 而提高。

机器学习算法

最常用的监督学习与无监督学习

监督学习(Supervised learning)

定义
“right answer” given
我们给算法一个数据集，其中包含了正确答案，即这些数据是带有标签的，算法的目的就是给出更多的正确答案

回归问题 Regression
Predict continuous valued output 预测连续的数值输出

分类问题 Classification
Discrete valued output(0 or 1) 预测离散值输出(板书好像少了Predict？)
离散值输出值不一定仅为0,1（实际问题中大多为多个）

无监督学习(Unsupervised learning)

定义
数据集没有标签，算法自动对数据集处理分类等
应用
Organize computing clusters 组织大型的计算集群
Social network analysis 社交网络分析
Market segmentation 市场细分
Astronomical data analysis 天文数据分析

Cocktail party problem 鸡尾酒会问题

问题描述
有一个宴会，一屋子的人，因为有许多人在同时说话，有许多声音混杂在一起，你几乎很难听清你面前的人说的话。假设一个鸡尾酒会只有两个人，同时说话，我们将两个麦克风放在房间里，因为两个麦克风与两个人的距离不同，每个麦克风记录了来自两人声音的不同组合。我们能做的就是把这两个录音交给一种无监督学习算法，称为 "鸡尾酒会算法 "，让算法帮你找出数据的结构，该算法就会分离出这两个被混叠在一起的声音。
问题算法

[w,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*×');

我第一遍还想把两个音频当作两个系数不同二元方程 aX+bY-c=0
联立求解，看来算法好像更复杂。。。

其他

Reinforcement learning 强化学习
recommender systems 推荐系统

课堂问题

P1

Suppose your email program watches which emails you do or do not mark as spam，and based on that learns how to better filter spam. What is the task T in this setting?

A. Classifying emails as spam or not spam.

B. Watching you label email as spam or not spam.

C. The number(or fraction) of eamils correctly classified as spam/not spam.

D. None of the above–this is not a machine learning problem

A是任务T ；B是经验E ；C是性能度量P

P2

You’re running a company, and you want to develop learning algorithms to address eachof two problems.
Problem a: You have a large inventory of identical items. You want to predict how manyof these items will sell over the next 3 months.

Problem b: You’d like software to examine individual customer accounts, and for eachaccount decide if it has been hacked/compromised.

Should you treat these as classification or as regression problems?

A.Treat both as classification problems.
B.Treat problem 1 as a classification problem, problem 2 as a regression problem.
C.Treat problem 1 as a regression problem, problem 2 as a classification problem.
D.Treat both as regression problems.

很容易看出来Problem a 是回归问题，Problem b 是分类问题，故选C

P3
Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.)

A. Given email labeled as spam/not spam, learn a spam filter.
B. Given a set of news articles found on the web, group them into set of articles about the same story.
C. Given a database of customer data, automatically discover market segments and group customers into different market segments.
D. Given a database of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

A:辨别是否为垃圾邮件，一眼错，监督学习问题
B:将网上的新闻按主题分类，正确
C:市场细分例子，正确
D:糖尿病预测，错误