2015年03月_lxqlxq21

原创 Machine Learning（big data）

batch gradient：对于所有data example计算gradient Stochastic gradient：对于单个data计算gradient small group gradient：介于两者之间，对于10-100个data计算gradient当数据量较大的时候，不适合使用batch gradient，因为运算速度太慢。Stochastic gradient converge

2015-03-08 20:47:29 725

原创 Machine Learning 混合高斯模型

Fraud detection的data set 分配： 10000 negative data（好的零件）=6000 training set+2000 CV +2000 test 20 positive data（出错零件） =0 training set 10 CV+10 test *门限ϵ\epsilon可以使用CV来确定。Anomaly detection(unsupervised)

2015-03-08 20:26:23 690

原创 Machine Learning PCA

σ=1/m∑mi=1(xi)(xi)T\sigma=1/m\sum_{i=1}^{m}(x^{i})(x^{i})^{T} [U,S,V]=svd(Sigma); Ureduce=U(:,1:k); z=Ureduce’*x;参数选择： 1.k：1m∑mi=1||xi−xiapprox||21m∑mi=1||xi||2≤0.01\frac{\frac{1}{m}\sum_{i=1}^{m}|

2015-03-08 11:03:29 324

原创 Logistic Regression VS. SVM

n=NO. of features m=NO. of training examples 相对于m来说，n比较大：use logistic regression or SVM with linear kernel n is small, m is intermediate: Use SVM with Gaussian kernel n is small, m is large: Create mo

2015-03-07 18:33:35 374

原创 Machine Learning SVM

parameter: c=1λc=\frac{1}{\lambda}: large C: lower bias, high variance Small C: Higher bias, low varianceσ2\sigma^{2} large: Higher bias, lower variance small: Lower bias, higher variance *使用高斯核之前别忘记

2015-03-07 18:12:22 358

原创 Machine Learning门限tradeoff

Precision=real.positivereal.positive+false.positive\frac{real.positive}{real.positive+false.positive}Recall=real.positivereal.positive+false.negative\frac{real.positive}{real.positive+false.negative}Fs

2015-03-07 16:58:16 281

原创 machine learning参数确定

需要确定的参数： 1. training set的大小m。 2. regularization parameter λ\lambda。 3. 多项式的项数θ\theta的数目n。优化方法： 1. 把data set分成三部分，比例是training set: 60%，CV set: 20%，test set: 20%。 2. 用training set确定最优θ\theta。 3. 用C

2015-03-07 12:19:13 299

lxqlxq21的专栏