types of machine learning:
- predictive or supervised learning
- descriptive or unsupervisoed learning
- reinforcement learning
supervised learning:
classification:
inputs: x
outputs: y
where , with C being the number of classes (if C=2: binary classification; C>2: multiclass classification)
goal: learn the unknown function f(), given the labeled training data set, and then make predictions using (the hat symbol here used to denote an estimate).
real-world applications:
- document classification and email spam filtering
- classifying flowers
- image classifcation and handwriting recognition
- face detection and recognition
regression:
regression is just like classification except the response variable is continuous.
unsupervised learning:
in unsupervised learning, only the outputs are given without any inputs. --> knowledge discovery
applications:
- discovering clusters
- discovering latent factors
- discovering graph structure
- image inpainting
basic concepts in ML:
- parametric vs non-parametric models: if a model has a fixed number of parameters, then it's a parametric model; if the number of parameters grows with the amount of data, then it's non-parametric model.
parametric model: faster to use, but under the strong assumption about the nature of the data distributions.
non-parametric model: more flexible but also more computations
- linear regression: a linear response to the inputs
- logistic regression: this is to generalize linear regression to the (binary) classification setting by making two changes: first we replace the Gaussian distribution for y with a Bernoulli distribution (more suitbale for cases when response is binary, y={0,1}):
second, we compute a linaer combination of the inputs, but then we pass this through a (nonlinear) function that ensures by defining:
so the logistic regression is obtained as (note: though it's called regression, it's still a form of classification):
- overfitting: avoid trying to model every minor variation in the input since this is more likely to be noise than true signals.