python实现apriori_Apriori算法的Python实现

Apriori算法是数据挖掘中频发模式挖掘的鼻祖,从60年代就开始流行,其算法思想也十分简单朴素,首先挖掘出长度为1的频繁模式,然后k=2

将这些频繁模式合并组成长度为k的频繁模式,算出它们的频繁次数,而且要保证其所有k-1长度的子集也是频繁的,值得注意的是,为了避免重复,合并的时候,只合并那些前k-2个字符都相同,而k-1的字符一边是少于另一边的。

以下是算法的Python实现:__author__ = 'linfuyuan'

min_frequency = int(raw_input('please input min_frequency:'))

file_name = raw_input('please input the transaction file:')

transactions = []

def has_infrequent_subset(candidate, Lk):

for i in range(len(candidate)):

subset = candidate[:-1]

subset.sort()

if not ''.join(subset) in Lk:

return False

lastitem = candidate.pop()

candidate.insert(0, lastitem)

return True

def countFrequency(candidate, transactions):

count = 0

for transaction in transactions:

if transaction.issuperset(candidate):

count += 1

return count

with open(file_name) as f:

for line in f.readlines():

line = line.strip()

tokens = line.split(',')

if len(tokens) > 0:

transaction = set(tokens)

transactions.append(transaction)

currentFrequencySet = {}

for transaction in transactions:

for item in transaction:

time = currentFrequencySet.get(item, 0)

currentFrequencySet[item] = time + 1

Lk = set()

for (itemset, count) in currentFrequencySet.items():

if count >= min_frequency:

Lk.add(itemset)

print ', '.join(Lk)

while len(Lk) > 0:

newLk = set()

for itemset1 in Lk:

for itemset2 in Lk:

cancombine = True

for i in range(len(itemset1)):

if i < len(itemset1) - 1:

cancombine = itemset1[i] == itemset2[i]

if not cancombine:

break

else:

cancombine = itemset1[i] < itemset2[i]

if not cancombine:

break

if cancombine:

newitemset = []

for char in itemset1:

newitemset.append(char)

newitemset.append(itemset2[-1])

if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:

newLk.add(''.join(newitemset))

print ', '.join(newLk)

Lk = newLk

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值