python市场_大交易数据的python市场篮子分析

最新推荐文章于 2023-01-16 09:33:26 发布

weixin_39812142

最新推荐文章于 2023-01-16 09:33:26 发布

阅读量197

点赞数

文章标签： python市场

apriori算法接收一个列表列表，其中每个列表都是一个事务。你在传递交易清单吗？例如：transactions = [['milk', 'bread', 'water'],['coffe', 'sugar' ],['burgers', 'eggs']]

这里有一个交易清单。然后你可以把它传给apriori。在

^{pr2}$

关于最小支持度阈值，以及apriori算法给出结果所需的时间，在最小支持值较小的情况下，我们会有很多关联规则。因此，算法需要时间来计算它们。众所周知，这种算法的局限性之一就是。在

您可以找到here关于apriori算法如何工作的全面解释，其中一些亮点是：Apriori uses a "bottom-up" approach, where frequent subsets are

extended one item at a time (known as candidate generation). Then

groups of candidates are tested against the data. The algorithm

terminates when no further successful extensions are found.

Apriori uses breadth-first search and a Hash tree structure to count

candidate item sets efficiently. It generates candidate itemsets of

length k from itemsets of length k-1. Then it prunes the candidates

who have an infrequent subpattern. According to the downward closure

lemma, the candidate set contains all frequent k-length item sets.

After that, it scans the transaction database to determine frequent

itemsets among the candidates.

如我们所见，对于具有大量频繁项或支持值较低的数据集，候选项集总是非常大。在

这些大数据集需要大量内存来存储。此外，apriori算法还多次查看数据库的各个部分，计算k项集中的项集的频率。因此，apriori算法可能会非常缓慢和低效，主要是在内存容量有限、事务量较大的情况下。在

例如，我尝试了apriori算法，其中包含25900个事务，min_支持值为0.004。该算法大约花了2.5个小时才能给出输出。在

weixin_39812142

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。