Pattern Discovery Basic Concepts
@(Pattern Discovery in Data Mining)[Pattern Discovery]
本文介绍了基本的模式挖掘的概念
Pattern: A set of items, subsequences, or substructures that occur
frequently together (or strongly correlated) in a data set.
Motivation to do pattern discovery in data:
* To find what may be bought after one/some goods by customer;
* To find what code segment may likely contain copy/paste bugs;
* To find what kind of events may happen after some news posted;
* What products were often purchased together?
* What are the subsequent purchases after buying an iPad?
* What code segments likely contain copy-and-paste bugs?
* What word sequences likely form phrases in this corpus?
* …
In conclusion, pattern discovery is important because
* Finding inherent regularities in a data set
* Foundation for many essential data mining tasks
* Association, correlation, and causality analysis
* Mining sequential, structural (e.g., sub-graph) patterns
* Pattern analysis in spatiotemporal, multimedia, time-series, and stream data
* Classification: Discriminative pattern-based analysis
* Cluster analysis: Pattern-based subspace clustering
* Broad applications
* Market basket analysis, cross-marketing, catalog design, sale campaign analysis, Web log analysis, biological sequence analysis
TODO: 上述具体应用
Frequent Pattern and Association Rule
Itemset: A set of one or more items
k-itemset: X=x1,...