1.1.9 Orthogonal Matching Pursuit
OMP即正交匹配追踪算法。匹配追踪[Matching Pursuit]算法在稀疏表达领域是一个很常用的算法,而OMP在分解的每一步对所选择的全部原子进行正交化处理,这使得在精度要求相同的情况下,它的收敛速度更快。
1.1.10 Bayesian Regression
1.1.10.1 Bayesian Ridge Regression
Bayesian Ridge Regression代码,如下所示:
from sklearn import linear_model
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
clf = linear_model.BayesianRidge()
clf.fit(X, Y)
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
normalize=False, tol=0.001, verbose=False)
clf.predict ([[1, 0.]])
array([ 0.50000013])
clf.coef_
array([ 0.49999993, 0.49999993])
说明:除了贝叶斯岭回归,还有贝叶斯线性回归,贝叶斯逻辑回归等。
1.1.10.2 Automatic Relevance Determination
解析:ARD即自动相关确定回归。
1.1.11 Logistic regression
逻辑回归是一个线性模型,用来做分类而不是回归的。L2阶正则化项逻辑回归的损失函数。L1阶正则化项逻辑回归的损失函数。如下所示:
说明:one-versus-rest:训练时依次把某个类别的样本归为一类,其它剩余的样本归为另一类。
在类LogisticRegression中实现的Solver包括“liblinear”,“newton-cg”,“lbfgs”和“sag”。如下所示:
Examples: L1 Penalty and Sparsity in Logistic Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
digits = datasets.load_digits()
X, y = digits.data, digits.target
X = StandardScaler().fit_transform(X)
# classify small against large digits
y = (y > 4).astype(np.int)
# Set regularization parameter
for i, C in enumerate((100, 1, 0.01)):
# turn down tolerance for short training time
clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01)
clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01)
clf_l1_LR.fit(X, y)
clf_l2_LR.fit(X, y)
coef_l1_LR = clf_l1_LR.coef_.ravel()
coef_l2_LR = clf_l2_LR.coef_.ravel()
# coef_l1_LR contains zeros due to the
# L1 sparsity inducing norm
sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100
sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100
print("C=%.2f" % C)
print("Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR)
print("score with L1 penalty: %.4f" % clf_l1_LR.score(X, y))
print("Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR)
print("score with L2 penalty: %.4f" % clf_l2_LR.score(X, y))
l1_plot = plt.subplot(3, 2, 2 * i + 1)
l2_plot = plt.subplot(3, 2, 2 * (i + 1))
if i == 0:
l1_plot.set_title("L1 penalty")
l2_plot.set_title("L2 penalty")
l1_plot.imshow(np.abs(coef_l1_LR.reshape(8, 8)), interpolation='nearest',
cmap='binary', vmax=1, vmin=0)
l2_plot.imshow(np.abs(coef_l2_LR.reshape(8, 8)), interpolation='nearest',
cmap='binary', vmax=1, vmin=0)
plt.text(-8, 3, "C = %.2f" % C)
l1_plot.set_xticks(())
l1_plot.set_yticks(())
l2_plot.set_xticks(())
l2_plot.set_yticks(())
plt.show()
输出结果,如下所示:
C=100.00
Sparsity with L1 penalty: 6.25%
score with L1 penalty: 0.9098
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.9098
C=1.00
Sparsity with L1 penalty: 9.38%
score with L1 penalty: 0.9104
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.9093
C=0.01
Sparsity with L1 penalty: 85.94%
score with L1 penalty: 0.8603
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.8915
解析:
[1]fit_transform(X, y=None, **fit_params)
1.1.12 Stochastic Gradient Descent - SGD
解析:梯度下降法包括SGD,BGD和MBGD,如下所示:
[1]随机梯度下降法[SGD]:每个样本迭代更新一次。
[2]批量梯度下降法[BGD]:所有样本迭代更新一次。
[3]小批量梯度下降法[MBGD]:部分样本迭代更新一次。
1.1.13 Perceptron
解析:在机器学习中,感知机[Perceptron]是二分类的线性分类模型,属于有监督学习算法。它是神经网络和支持向量机的基础,属于判别模型。
参考文献:
[1] MP算法和OMP算法及其思想:http://blog.csdn.net/scucj/article/details/7467955