this airtcle is wrote by Sean Owen,Director ,Data Science @Cloudera
I think one needs to have a competent knowledge of 1-2 techniques in:
- Regression
- Classification
- Clustering
- Collaborative filtering
- (Bonus) Inference via graphical models
Certainly, it's valuable and important to understand simple Linear regression .
Gradient descent is important because it underpins common classifier techniques like Logistic regression . Also: the Support vector machine .
I also strongly enc ourage people to have a working knowl edge of Random forest classification / regression. It's inherently an ensemble method, effective, and has different properties from the above.
K-means++ clustering is a must.
For collaborative fi ltering, neigh borhood metho ds are simple enough that almost don't deserve me n tion. I would try t o understand latent fa cto r models based on low-ran k matrix facto rizati on like the Singular value decomposition or simple alternating least squares ( http://yifanhu.net/PUB/c
Bonus: MCMC methods ( Markov chain Monte Carlo ) for graphical models.