we derive the standard linear model here from Bayesian prospective of view (MAP), with training set
=(X,y)={(xi,yi)|i=1,...,n}
,
x
denotes input vector of dimension
D
and
The likelihood function
further specify the prior over the parameter
w
, set as zero mean Gaussian with covariance matrix
Σp
, i.e.,
w∼(0,Σp)
, the Bayesian linear model (MAP) is
with w¯=σ−2n(σ−2nXXT+Σ−1p)−1Xy , then we have,
make predictions for
x=x∗
as
in fact, p(f∗|x∗,X,y)∼(1σ2nxT∗(σ−2nXXT+Σ−1p)−1Xy,xT∗(σ−2nXXT+Σ−1p)−1x∗)
In a non-Bayesian setting, the prior is usually thought as a penalty term (Ridge Regression, L2 regularization) 12wTΣ−1pw