Scaling Up Machine Learning相对于研究来说,其实在实际应用中更为迫切一点。因为实际应用中存在太多的数据,造成了严重的效率问题,如何在有效地时间内,并且尽量最大化的利用好手里的资源来解决问题,是一个迫在眉睫的问题。
这里有一个KDD 2011的Tutorial,大家看看。
Scaling Up Machine Learning, the Tutorial,KDD 2011
Ron Bekkerman, Misha Bilenko and John Langford
Part I slides (Powerpoint) Introduction
Part II.a slides (Powerpoint) Use of Trees
Part II.b slides (Powerpoint) Graphical models
Part III slides (Summary + GPU learning + Terascale linear learning)
This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., speech recognition and object recognition in vision).
The tutorial is based on (but not limited to) the material from our upcoming Cambridge U. Press edited bookwhich is currently in production and will be available in December 2011.
Presenters
Ron Bekkerman is a senior research scientist at LinkedIn where he develops machine learning and data mining algorithms to enhance LinkedIn products. Prior to LinkedIn, he was a researcher at HP Labs. Ron completed his PhD in Computer Science at the University of Massachusetts Amherst in 2007. He holds BSc and MSc degrees from the Technion---Israel Institute of Technology. Ron has published on various aspects of clustering, including multimodal clustering, semi-supervised clustering, interactive clustering, consensus clustering, one-class clustering, and clustering parallelization.
Misha Bilenko is a researcher in Machine Learning and Intelligence group at Microsoft Research, which he joined in 2006 after receiving his PhD from the University of Texas at Austin. His current research interests include large-scale machine learning methods, adaptive similarity functions and personalized advertising.
John Langford is a senior researcher at Yahoo! Research. He studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and received his PhD from Carnegie Mellon University in 2002. Previously, he was affiliated with the Toyota Technological Institute and IBM's Watson Research Center. He is the author of the popular Machine Learning weblog, hunch.net. John's research focuses on the fundamentals of learning, including sample complexity, learning reductions, active learning, learning with exploration, and the limits of efficient optimization.
=========
另外还有Blei的关于LDA以及其他的东西的Tutotial
many of you have asked for the slides from the tutorial at KDD. i
posted them on this page:
http://www.cs.princeton.edu/~
the link to the PDF of the slides is
http://www.cs.princeton.edu/~
if any of you have comments or suggestions, please email me. i hope
to use these slides again.