3.1 Machine learning
3.1.1-1.3 Machine learning definition
Spam/not spam
Arthur Samuel(1959). Machine learning: field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell(1998). Well-posed learning problem: a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P improves with experience E.
Supervised learning
Unsupervised learning
Reinforcement learning
Recommender system
3.1.4 Supervised learning
Example:
Regression problem: our goal is a continuous valued output, housing price prediction
Classification problem: the goal is to predict a discrete value output, breast cancer (malignant, benign)
The term supervised learning refers to the fact that we gave the algorithm a data set in which the “right answers” were given.
3.1.5 5Unsupervised learning
Clustering algorithm: google news, genes
Cocktail party problem
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x’)
Octave
We are given data they don’t have any labels, or no label. “here’s the dataset, can you find some structure of the data?”
Arrange large aggregations: organize large computer clusters (how to put machines together to make it more efficiently), social network analysis (identify the coherence of group), market segmentation (grouping customers into different market segments), astronomical data analysis (theories how galaxies occur)
3.2 Model representation
3.2.1 Linear regression
Housing prices (Portland, OR)
Training set—learning algorithm—h (hypothesis), function
Hypothesis
h_θ (x)=θ_0+θ_1 x
h(x)
Univariate linear regression (1 variable)
Notation
n Number of features
m Number of training examples
x’s Input variable/features
y’s Output variable/features
(x,y) One training example
(x^((i) ),y^((i) )) Ith training example (ith row)
x_j^((i)) Value of feature j in Ith training example (ith row)
h Maps from x to y
θ_i parameters
3.2.2-2.4 Cost function
J(θ_i)
Idea, choose θ_i so that h(