Apriori算法

1.关联学习

关联分析是在大规模数据集中寻找关系的任务,这种关系有两种形式频繁项和关联规则。频繁项集是经常出现在一起的物品的集合。关联规则暗示两种物品直接按可能存在很强俄关系。

一个项集的支持度被定义为数据集中包含该项集的记录所占的比例

可信度是针对一条诸如{尿布}——{葡萄酒}的关联规则定义的。可信度为支持度{尿布,葡萄酒}/支持度{尿布}

2.Apriori算法原理

Apriori算法告诉我们,如果某个项集是频繁的,他的所有子集也是频繁的。反过来如果某个项集非频繁,那么它的所有超集非频繁

from numpy import *
#创建数据
def loadDataSet():
    return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]
#创建只函一个数的集合
def createC1(dataSet):
    c1 = []
    for transaction in dataSet:
        for item in transaction:
            if not [item] in c1:
                c1.append([item])
    c1.sort()
    return map(frozenset,c1)
#对只含一个数据的集合进行抽查
def scanD(D, Ck, minSupport):
    ssCnt = {}
    for tid in D:
        for can in Ck:
            if can.issubset(tid):
                if not ssCnt.has_key(can):
                    ssCnt[can] = 1
                else: ssCnt[can] += 1
    numItems = float(len(D))
    retList = []
    supportData = {}
    for key in ssCnt.keys():
        support = ssCnt[key] / numItems
        if support >= minSupport:
            retList.insert(0,key)
        supportData[key] = support
    return retList, supportData
#把集合合并
def aprioriGen(Lk, k):
    retList = []
    lenLk = len(Lk)
    for i in range(lenLk):
        for j in range(i+1,lenLk):
            l1 = list(Lk[i])[:k-2]
            l2 = list(Lk[j])[:k-2]
            l1.sort()
            l2.sort()
            if l1 == l2:
                retList.append(Lk[i]|Lk[j])
    return retList
#apriori算法
def apriori(dataSet, minSupport = 0.5):
    C1 = createC1(dataSet)
    D = map(set, dataSet)
    L1, supprtData = scanD(D, C1, minSupport)
    L = [L1]
    k = 2
    while (len(L[k-2]) > 0):
        CK = aprioriGen(L[k-2], k)
        LK, supK = scanD(D, CK, minSupport)
        supprtData.update(supK)
        L.append(LK)
        k += 1
    return L, supprtData

3.关联规则

如果某条规则不满足最小可信度要求,那么该规则的子集也不满足最小可信度要求。如过规则{0,1,2}--3不满足最小可信度要求,那么任何以{0,1,2}子集为左部的规则也不满足要求

def calcConf(freqSet, H, supportData, brl, minConf=0.7):
    prunedH = []
    for consq in H:
        conf = supportData[freqSet]/supportData[freqSet - consq]
        if conf >= minConf:
            print freqSet-consq,'--->',consq,'conf:',conf
            brl.append((freqSet-consq, consq,conf))
            prunedH.append(consq)
    return prunedH
    
def rulesFromConseq(freqSet, H, supportData, brl, minConf = 0.7):
    m = len(H[0])
    if (len(freqSet) > (m + 1)):
        Hmp1 = aprioriGen(H, m+1)
        Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)
        if (len(Hmp1) > 1):
            rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)    
    
def generateRules(L, supportData, minConf = 0.7):
    bigRulerList = []
    for i in range(1,len(L)):
        for freqSet in L[i]:
            H1 = [frozenset([item]) for item in freqSet]
            if i > 1:
                rulesFromConseq(freqSet, H1, supportData, bigRulerList, minConf)
            else:
                calcConf(freqSet, H1, supportData, bigRulerList, minConf)
    return bigRulerList



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值