1.SVM
- Lagrangian Theory
Given an optimization problem with the objective function f(w) and equality constrains g i g_i gi(w)=0
L ( w , α ) = f ( w ) − ∑ i = 1 n a i g i ( w ) L ( w , b , a ) = 1 2 ∥ w ∥ 2 − ∑ a i ( y i ( w ⋅ x i + b ) − 1 ) i = 1 N L(w,\alpha)=f(w)-\sum_{i=1}^na_ig_i(w)\\\;L(w,b,a)=\frac12\parallel w\parallel^2-\overset N{\underset{i=1}{\sum a_i(y_i(w\cdot x_i+b)-1)}} L(w,α)=f(w)−i=1∑naigi(w)L(w,b,a)=21∥w∥2−i=1∑ai(yi(w⋅xi+b)−1)N
- Karush-Kuhn-tacker(KKT) constraint
For a solution in nonlinear programming, consider the following nonlinear minimization or maximization problem
∂ L ( w , α ) ∂ w = 0 g i ( w ∗ ) ≥ 0 α i ≥ 0 α i g i ( w ∗ ) = 0 \begin{array}{l}\frac{\partial L(w,\alpha)}{\partial w}=0\\g_i(w^\ast)\geq0\\\alpha_i\geq0\\\alpha_ig_i(w^\ast)=0\;\end{array} ∂w∂L(w,α)=0gi(w∗)≥0αi≥0αigi(w∗)=0
- Support Vector
So if the points not on the boundary do not contribute and the α i \alpha_i αiwill be 0
L ( w , b , α ) = ∑ i = 1 n α i − 1 2 ∑ i , j = 1 n α i α j x i x j y i y j L(w,b,\alpha)=\sum_{i=1}^n\alpha_i-\frac12\;\sum_{i,j=1} ^n\alpha_i\alpha_jx_ix_jy_iy_j\\\\ L(w,b,α)=i=1∑nαi−21i,j=1∑nαiαjxixjyiyj
- Kernel Trick
- To solve the problem calculate in high-dimension: Project data to much higher dimension space so that a hyper-plane can be found/eliminate the computationally expensive operations in the higher dimension
- For example: Polynomial equivalent to project to six-dimensional space
2 Random forest
- Bagging - Bootstrap Aggregation
- The training dataset is a difference while using the same algorithms
- The random forest takes advantage of this by allowing each individual tree to randomly sample from the dataset with replacement, resulting in different trees. This process is known as bagging.
- Random Forest
- The training dataset is the same while the algorithms are different
- Each tree in a random forest can pick only from a random subset of features. This forces even more variation amongst the trees in the model and ultimately results in lower correlation across trees and more diversification.
- the basic theory of random forest
- The reason for this wonderful effect is that the tree protect each other from their individual error
- Two prerequisites: actual signal in feature and prediction made by the individual have low correlation with each other
3 Tips
- Conduct validation
- Conduct cross-validation : the training is small
- Conduct leave-one-out: Training with all the data except one and repeat the process