Courses of Data Mining & Machine Learning & Pattern Recognition

Data Mining

The subject of Knowledge Discovery and Data Mining (KDD) concerns the extraction of useful information from data. Since this is also the essence of many sub-areas of computer science, as well as the field of statistics, KDD can be said to lie at the intersection of statistics, machine learning, data bases, pattern recognition, information retrieval and artificial intelligence.

The subject matter of data mining is vast, making the task of task of learning about the subject itself a task of data mining! The one-semester course that I teach emphasizes the theory and algorithms of data mining. Such algorithms are concerned with deriving global models/ local patterns, visualization, and retrieval by content. Related courses that I teach are pattern recognition and machine learning which cover many other topics related to data mining.

Textbook: Principles of Data Mining by D. Hand, H. Mannila and P. Smith, MIT Press, 2001.

Lectures

The following are the presentation slides used in a classroom setting. They are given here as links to pdf files. Since these slides are continually updated you may wish to revisit them. The course is being taught in Spring 2010.

Introduction to Data Mining
1. Introduction to Data Mining
Measurement and Data
1. Measurements and Distances
Visualization of Data
1. Summarizing Data, Histograms, Scatter Plots
2. Principal Components Analysis and Multidimensional Scaling
Data Analysis and Uncertainty
1. Random Variables
2. Estimation
3. Hypothesis Testing/Sampling
Systematic Overview of Data Mining Algorithms
1. Decision Trees and MLP
2. Association Rules and Text Retrieval
Models and Patterns
1. Prediction Models
2. Probability Models and Graphical Models
3. Structured Data: Markov Models
4. Pattern Structures
Content-Based Information Retrieval

Data Mining

============================================================================================

Machine Learning

Machine learning is an exciting topic about designing machines that can learn from examples. The course covers the necessary theory, principles and algorithms for machine learning. The methods are based on statistics and probability-- which have now become essential to designing systems exhibiting artificial intelligence. The course emphasisizes Bayesian techniques and probabilistic graphical models (PGMs). The material is complementary to a course on Data Mining where statistical concepts are used to analyze data for human, rather than machine, use.

The textbooks for different parts of the course are "Pattern Recognition and Machine Learning" by Chris Bishop (Springer 2006) and "Probabilistic Graphical Models" by Daphne Koller and Nir Friedman (MIT Press 2009).

Lecture Slides for Machine Learning and Probabilistic Graphical Models

Following are course topics with pointers to lecture overhead slides and some lecture video files. Previously taught as a single semester course it is now divided into two successive courses taught during Fall and Spring semesters.

Note about slides and video: The slides for Sections 1-7 are from Fall 2010. Slides for Sections 8, 9 and 11 are from Spring 2011. The videos are from Fall 2008 for Sections 1-7 and the videos for Sections 8,9, and 11 are from Spring 2011. Thus the correspondence may be somewhat off. Lecture slides are frequently updated as the course progresses. Chapters 1-14 (Topic titles in Red) are more recently taught versions.

Introduction
1. Machine Learning-Overview(3MB)
2. Text Classification Example(225KB)
3. Regression Example(562KB)
4. Probability Theory(950KB)
5. Decision-Theory(212KB)
6. Information Theory(160KB)
7. MATLAB Introduction(347KB)
Probability Distributions
1. Discrete Distributions(291KB)
2. Gaussian Distribution(833KB)
Linear Models for Regression
1. Regression with Basis Functions(1.2MB)
2. Bias-Variance(616KB)
3. Bayesian Regression(1MB)
4. Bayesian Model Comparison(300KB)
5. Evidence Approximation(650KB)
Linear Models for Classification
1. Introduction(88KB)
2. Discriminant Functions(2.4MB)
3. Generative Models(1.4MB)
4. Logistic Regression(1.4MB)
5. Laplace Approximation (519KB)
6. Bayesian Logistic Regression(845KB)
Neural Networks
1. Introduction(551KB)
2. Training(848KB)
3. Error Backpropagation(821KB)
4. The Hessian Matrix(288KB)
5. Regularization in Neu Networks(1.2MB)
6. Mixture Density Networks (634KB)
7. Bayesian Neural Networks(716KB)
Kernel Methods
1. Kernel Methods(1MB)
2. Radial Basis Function Networks(549KB)
3. Gaussian Processes(1.4 MB)
Sparse Kernel Machines
1. Support Vector Machines(958KB)
Probabilistic Graphical Models (Directed)
1. Bayesian Networks(1.4MB)
2. Querying Probability Distributions(460KB)
3. Genetic Inheritance Example(207KB)
4. Graphs and Distributions(305KB)
5. Reasoning Patterns & D-Separation(393KB)
6. Conditional Independence(830KB)
7. Semantics of Bayesian Networks(385KB)
Probabilistic Graphical Models (Undirected)
1. Undirected Graphical Models(690KB)
2. Independencies in Markov Networks(526KB)
3. Constructing Markov Networks(251KB)
4. Alternate Parameterizations of MNs(2.1MB)
5. MRFs in Computer Vision(1.2MB)
6. From BNs to MNs(144KB)
7. Partially Directed Models & CRFs(916KB)
Inference in Graphical Models
1. Introduction(1MB)
2. Factor Graphs(1.1MB)
3. Max Sum Algorithm(734KB)
4. Loopy Belief Propagation(78KB)
Learning Graphical Models
1. Learning PGMs: Overview(695KB)
2. Learning as Optimization(398KB)
3. Parameter Estimation(1MB)
4. Bayesian Estimation in Bay.Nets(980KB)
Mixture Models and EM
1. K-means Clustering(1.1MB)
2. Mixtures of Gaussians(1MB)
3. Latent Variable View of EM(516KB)
Approximate Inference
1. Approximate Inference(3.2MB)
Sampling Methods
1. Basic Sampling Methods(375KB)
2. Monte Carlo Methods(426KB)
Continuous Latent Variables
1. Principal Components Analysis
2. Nonlinear Latent Variable Models
Sequential Data
1. Markov Models(433KB)
2. Hidden Markov Models(1.3MB)
3. Extensions to HMMs(287KB)
4. Linear Dynamical Systems(217KB)
5. Conditional Random Fields(1.6MB)
Combining Models
1. Boosting(pdf, 156KB)
Concept Learning
1. Hypothesis Space (pdf, 111KB)
2. Candidate Elimination (pdf,236KB)
Decision Trees
1. Information Gain and ID3(pdf, 286KB)
2. Data Sets and Data Mining(pdf, 332KB)
3. Overfitting and Pruning(pdf, 536KB)
Computational Learning Theory
1. PAC Learning(pdf, 98KB)
2. VC Dimension(pdf, 321KB)
3. Mistake Bound(pdf, 51KB)

Lec 1.1 video(zip, 138MB)

Lec 1.3 video(zip, 134MB)
Lec 1.4 video(zip, 133MB)
Lec 1.5 video(zip, 135MB)
Lec 1.6 video(zip, 135MB)
Lec 1.7 video(zip, 277MB)

Lec 3 video(zip, 315MB)

Lec 4 vid: 1 2(zip,268 286MB)

Lec 5.1 video(zip, 140MB)

Lec 5.3 video(zip, 144MB)

Lec 5.5 video(zip, 138MB)

Lec 6.1 video(zip, 145MB)

Lec 7.1 video(zip, 311MB)

Lec 8.1 video(zip, 134MB)
Lec 8.2 video(zip, 127MB)
Lec 8.3/8.4 video(zip, 140MB)
Lec 8.3/8.4 video(zip, 140MB)
Lec 8.5 video(zip, 135MB)
Lec 8.6 video(zip, 150MB)

Lec 10.1 video(zip, 132MB)

Lec 12.3 video(zip, 152MB)

Lec 13.1 video(zip, 108MB)

Lec 14.1 video(zip, 145MB)

See Data Mining Course Slides

Machine Learning

============================================================================================

Pattern Recognition

This is the website for a course on pattern recognition as taught in a first year graduate course (CSE555). The material presented here is complete enough so that it can also serve as a tutorial on the topic.

Pattern recognition techniques are concerned with the theory and algorithms of putting abstract objects, e.g., measurements made on physical objects, into categories. Typically the categories are assumed to be known in advance, although there are techniques to learn the categories (clustering). Methods of pattern recognition are useful in many applications such as information retrieval, data mining, document image analysis and recognition, computational linguistics, forensics, biometrics and bioinformatics. You may find the websites of related courses that I teach on Data Mining and Machine Learning useful as supplementary material.

Much of the topics concern statistical classification methods. They include generative methods such as those based on Bayes decision theory and related techniques of parameter estimation and density estimation. Next come discriminative methods such as nearest-neighbor classification, support vector machines. Artificial neural networks, classifier combination and clustering are other major components of pattern recognition.

A course in probability is helpful as a pre-requisite.

Applications of pattern recognition techniques are demonstrated by projects in fingerprint recognition, handwriting recognition and handwriting verification.

Reference Textbooks:
(i) Pattern Classification (2nd. Edition) by R. O. Duda, P. E. Hart and D. Stork, Wiley 2002,
(ii) Pattern Recognition and Machine Learning by C. Bishop, Springer 2006, and
(iii) Statistics and the Evaluation of Evidence for Forensic Scientists by C. Aitken and F. Taroni, Wiley, 2004.

Lectures

Following are the lecture overheads used in class as pdf files.
The lectures slides are frequently updated. This course was last taught in Spring 2007.