Notes of Machine Learning
Ⅰ.Main steps
Step 1. Model
Create some functions and put them in a function set. For example, we create these following functions:
b: bias
w: weight
y: predicted output with the current input-xcp
So, now there are many functions in our function set, we'll test them later to find out which one is the best.
Step 2. Goodness of Function
Collecting training data.
Test the fuctions created previously by Loss Function.
y hat n: actual value of the nth output.
Estimation error: the more smaller, the more better.
Step 3. Best Function (Gradient Descent)
If Loss fuction is
differentiable, we can use
Gradient Descent.
How to do?
Consider Loss Function with one parameter w:
1. (Randomly)Pick an initial value w⁰
2. Compute
(1)if negative, increase w;
(2)if positive, decrease w;
How much will we increase (step size)?
It depends on the
current differential value and
η (positive correlation).
So
η is called
"learning rate", a value that was setted previously, it determines the
updating range of the parameter.
3. Compute
4. Many
iteration...until we find
global optimal (there is no local optimal in Linear Regression (the loss function is
convex),don't worry about that), and we can have a preliminary fuction.
5. Compute
Average Error on Training data.
6. Collect another n data as testing data and compute
average error on testing data.
7. Observe the data and testing the other functions in our function set (maybe more complicated model?),
usually a more complex model yields lower error on training data, but error on testing data maybe higher(overfitting), so we should select a
suitable model.
8. Collect more testing data, if our function doesn't fit it well,
Back to Step 1, redesign it.
9.
Back to Step 2,
regularization(prevent overfitting)
λ: a value ajusted by yourself. Larger λ, considering the training error less.
The function with smaller wi are better (the smoother image), we prefer smooth function, but don't be too smooth.
Ⅱ.Analysis of Error
1. Sources of error
Error due to
"bias" and
"variance".
A simple model has a Large Bias but Small Variance, a complex model has a Small Bias but a Large Variance.
Overfitting: error mainly from variance;
Underfitting: error mainly from bias.
2. Diagnosis
If our model cannot even fit the
training examples, then we have large bias (underfitting);
If it can fit the training data, but large error on
testing data, then we probably have large variance (overfitting).
3. Solution
For
large bias, we should redesign our model:
· Add more features as input
· A more complex model
For
large variance:
· Collect more data, a very effective way, but not always pratical, or we can make some fake data, like voice of female to male etc.
· Regularization. Make the fuctional image smoother, but may destroy bias.
4. Model Selection
· There is usually a
trade-off between bias and variance.
· Select a model that balances two kinds of error to minimize total error.
Use
Cross Validation or
N-fold Validation
Ⅲ.Gradient Descent
Ⅳ.Classification
If the observed data are truly sampled from theGenerative Model, then fitting the parameters of the generative model to maximize the data likelihood is a common method.However, since most statistical models are only approximations to thetrue distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using aDiscriminative Model, although application-specific details will ultimately dictate which approach is most suitable in any particular case .
1.Generative Model
Generative model is a model that we assumed by observing the training data It also has three steps similar to ML.
Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from aprobability density function), or as an intermediate step to forming aconditional probability density function. Generative models are typically probabilistic, specifying ajoint probability distribution over observation and target (label) values. A conditional distribution can be formed from a generative model throughBayes' rule.
Step 1. Function Set(Model)
Use Bayes
If it > 0.5, output: Class 1;
Otherwise, output: Class 2.
There are many different kinds of probability distribution in the function set, such as: Gaussion Distribution,Bernoulli Distribution etc. It is chosen by ourselves, for example, for binary features, we can assume they are from Bernoulli Distribution.
Step 2. Goodness of a Function
Take Gaussion Distribution as an example. In this distribution, what we should evaluate are mean
μ and convarience matrix
Σ.
Here we use
Maximum Likelihood.
(
μ∗,
Σ∗)is the maximum likelihood of all the
(
μ
,
Σ
).
And it's easy to find out , we can take the partial with respect to
(
μ
,
Σ
)
of
L(μ,Σ), and evaluate the
(
μ
,
Σ
) where partial differential value is zero.
If we assume all the dimensions are independent, then we are using
Naive Bayes Classifier.
Step 3. Best Function
In Generative Model, we estimate
N₁,N₂...,μ¹,μ²..., Σ.
Then have w and b.
Here you may wonder, if our purpose is to evaluate w and b, why not do it directly?
Actually, we have such a way-
Discrinimative Model.
2.Discriminative Model
Discriminative models, also called conditional models, are a class of models used in machine learningfor modeling the dependence of unobserved (target) variablesyon observed variablesx. Within aprobabilistic framework, this is done by modeling the conditional probability distribution P ( y | x ), which can be used for predictingyfromx.
In Discriminative Model, we don't have to assume what the model will be like, instead, we directly evaluatew andb.
Step 1. Function Set(Model)
Visualize function set:
Step 2. Goodness of a Function (Loss Function)
H(p,q) is
cross entropy between distribution p and q, cross entropy discribes how
close two distributions are, if they are the same, the value of H(p,q) is
zero.
Compared with Linear Regression:
Step 3. Best Function
We also use Gradient Descent.
For Logistic Regression, the value of target
ŷ n is
0 or 1, the value of output is the value
between 0 and 1.
3.Generative v.s. Discriminative
In probability andstatistics, agenerative model is a model for generating all values for a phenomenon, both those that canbe observed in the world and "target" variables that can only be computed from those observed. By contrast,discriminative modelsprovide a model only for the target variable(s), generating them by analyzing the observed variables. In simple terms, discriminative models infer outputs based on inputs, while generative models generate both inputs and outputs, typically given somehidden parameters.
Usually, the w and b generated from these two different models are different.
Benefit of Generative Models:
· With the assumption of probability distribution, less traininig data is needed;
· With the assumption of probability distribution, more robust to the noise;
· Priors and class-dependent probabilities can be estimated from different sources.For example, phonetic recognition, Generative Model is the main method, Discriminative Model is only a part of it, because we still need a Prior probability-the probability of a sentence being spoken, and this process don't need any
audio data.
Benefit of Discriminative Models:
Discriminative models, as opposed to generative models, do not allow one to generate samples from the joint distribution of observed and target variables. However, for tasks such as classification and regression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute). On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently supervised and cannot easily support unsupervised learning. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.
Keep studying, keep stepping forward.