apriori算法实例python_Python 实现Apriori算法 | 学步园

本文介绍了使用Python实现Apriori算法,用于从交易数据中发现频繁项集和关联规则。首先加载数据,然后通过Apriori算法递归地生成更高频率的项集,并检查原始数据以验证规则。主要涉及数据预处理、频繁项集计算和关联规则挖掘。
摘要由CSDN通过智能技术生成

import os

import types

import sys

sys.setrecursionlimit(1000000)

'''

author liuzhenhua

date   20131113

the apriori algorithm for python

'''

original = {}

mp = {}

def loaddata(dgree):

f = open("D:/apriori.txt","r")

i = 0;

while True:

st = f.readline()

if len(st) == 0: break

strs = st.split("\t")

lvals = strs[1].strip("\n").split("#")

#print lvals

original[strs[0]] = lvals

i = i+1

for  word in lvals:

if mp.has_key(word):

mp[word] = mp[word] + 1

else:

mp[word] = 1

for h in mp.keys():

if mp[h] == dgree:

del mp[h]

f.close()

'''

count = 0

pm = {}

for e in  mp.keys():

count = count +1

pm[e] = mp[e]

if count == int(len(mp) * dgree):break

'''

return mp

def isContain(list1,list2):

flag = True

for m in list2:

for n in list1:

#print m,n

if m == n:

flag = True

break

flag = False

if flag == False:break

return flag

def isOriginal(ori,list1):

flag = False;

for xxx in ori.keys():

#print  "original:",original[xxx]

if isContain(ori[xxx],list1):

flag = True

break;

return flag

'''

sz  is the size of frequency items

dgree duplicate the item is not property for the frequency items

'''

def apriori(dic,dicty,ori,sz,dgree):

jie = {}

dup = {}

kys = dic.keys()

kys2 = dicty.keys()

for a in range(0,len(kys2)):

for b in range(0,len(kys)):

if(isContain(list(kys[b]),list(kys2[a]))): continue

tem = kys2[a]+ kys[b];

#print original,list(tem)

#if isOriginal(ori,list(tem)):

teml=list(tem)

teml.sort()

tem = ''.join(teml)

if(dup.has_key(tem)): continue

else:

dup[tem] = 1

nu = 0

for cc in ori.keys():

if isContain(ori[cc],teml):

if jie.has_key(tem):

jie[tem] = jie[tem] + 1

else:

jie[tem] = 1

for d in jie.keys():

if jie[d] == dgree:

del jie[d]

nu = stop(jie)

print "frequency item:",nu,"items:",jie

if nu == sz:

return jie

else:

return apriori(jie,dicty,ori,sz,dgree)

def stop(res):

a = 0

for rh in res.keys():

a = len(rh)

break

return a

bp = loaddata(1)

print "the original data:",original

print "frequency item:",1,"items:",bp

apriori(bp,bp,original,3,1)

apriori.txt数据:

10A#C#D

20B#C#E

30A#B#C#E

40B#E

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值