关联式规则

(gender=male) and (wealth=rich) → (age=old)

BODY HEAD

(LHS, “Left-Hand-Side”, antecendent) (RHS, consequent)

如果没有body，也可以有default head， the most common value (MCV)

定义

SUPPORT: 数据中符合body规则的记录数，或者同时符合body和head
HITS: 数据中同时符合body和head (=SUPPORT(B ^ H))
SCORE: 用于评估rule的有效性（utility）的一个hits和support的函数，也叫做confidence（评价rule有多大可能是对的）

最简单的Score函数

score = fraction = hits / (body support)，也就是P(head|body)通常叫做confidence of the rule

缺点：不能区分：

Rule1: support=10, hits=6
Rule2: support=1000, hits=600

所以改进为(置信区间Score方法

以上为一个Rule，如何自动得到一系列rules

A PRIORI algorithm wiki

1. Find support values for all item-sets of Size=1

假设有一个数据库D，其中有4个事务记录，分别表示为：

TID	Items
T1	I1,I3,I4
T2	I2,I3,I5
T3	I1,I2,I3,I5
T4	I2,I5

项集	支持度计数
{I1}	2
{I2}	3
{I3}	3
{I4}	1
{I5}	3

2. Prune item-sets of current size with support < minimal_support (user-defined)

这里预定最小支持度minSupport=2

项集	支持度计数
{I1}	2
{I2}	3
{I3}	3
{I5}	3

3. If no item-sets of the current size survive pruning, then stop; Otherwise continue
4. Increment item-set size: Size=Size+1升维，变成二维

项集
{I1,I2}
{I1,I3}
{I1,I5}
{I2,I3}
{I2,I5}
{I3,I5}

5. Find support values for all available item-sets of the current Size (while doing so do not waste time checking
item-sets consisting of already pruned components)