1. Optimization Objective
- logistic regression
let
- support vector machine
2. Large Margin Intuition
3. The mathematics behind large margin classification (optional)
- vector inner product
- SVM decision boundary
4. Kernels I
- Given , compute new feature depending on proximity to landmarks
5. Kernels II
- SVM with kernels
- how to choose ?
- how to compute ?
- Reference: http://blog.csdn.net/abcjennifer/article/details/7849812
- SVM parameters
- Large : lower bias, higher variance
- Small : Higher bias, low variance
6. Using an SVM
- Use SVM software package (e.g. liblinear, libsvm, ...) to solve for parameters , needs to specify:
- choice of parameter
- choice of kernel (similarity function)
- e.g. no kernel ("linear kernel")
- Gaussian kernel
- Many off-the-shelf kernels available
- polynomial kernel
- more esoteric: string kernel, chi-square kernel, histogram intersection kernel, ...
- Multi-class classification
- many SVM packages already have built-in multi-class classification functionality
- otherwise, use one-vs.-all method
- logistic regression vs. SVM
- = number of features (), = number of training examples
- if is large (relative to ), use logistic regression, or SVM without a kernel ("linear kernel").
- if is small, is intermediate, use SVM with Gaussian kernel.
- if is small, is large, create/add more features, then use logistic regression or SVM without a kernel.
- Neural Network likely to work well for most of these settings, but may be slower to train.
- if is large (relative to ), use logistic regression, or SVM without a kernel ("linear kernel").
- = number of features (), = number of training examples