参考了博客(https://www.cnblogs.com/infaraway/p/6774521.html)中的Eclat算法,经过部分修改,得到了dEclat算法的Python代码,特作分享:
def dEclat(prefix, diffsets, transaction_num, min_support, freq_items):
'''
Input:
prefix: a list of prefixs, type: list
diffsets: a list-tuple of difference transaction ID of items,
type: dict(), example: {item_1:[diffid1,diffid4],...}
transaction_num: the number of transactions, type: int
min_support: the minimum support, type: float
freq_items: frequent items,
type: dict(), example: {items1:[sup,relative_sup],...}
Output:
freq_items: frequent items,
type: dict(), example: {items1:[sup,relative_sup],...}
'''
while diffsets:
# fetch an item and its diffset
item, diffset = diffsets.pop()
#print(item,': ',diffset)
# calculate the support of this item, which is corresponding the length of its tidset
key_support = transaction_num - len(diffset)
if key_support >= min_support:
# add and its support to the set of ferquent items
freq_items[frozenset(sorted(prefix+[item]))] = [key_support,round(key_support/transaction_num,2)]
suffix = [] # list of suffixes
for other_item, other_diffset in diffsets:
# calculate the diffset of the current item in combination with other items
new_diffset = diffset.union(other_diffset)
# when the support of the combination is greater than or equal to the minimum
# support, add the combination to the candidate set
if (transaction_num - len(diffset)) >= min_support:
suffix.append((other_item,new_diffset))
# find frequent items starting with the current item
dEclat(prefix+[item], sorted(suffix, key=lambda diffset: len(diffset[1]), reverse=True), transaction_num, min_support, freq_items)
return freq_items
注:传入函数的数据除了diffset不一样以外,其他的可以参考上述所提参考博文。