分析apriori 算法的 trie实现

最新推荐文章于 2022-11-24 21:06:06 发布

wineceramic

最新推荐文章于 2022-11-24 21:06:06 发布

阅读量1.2k

点赞数

文章标签：算法数据库数据分析 delete tree 聊天

本文链接：https://blog.csdn.net/wineceramic/article/details/666207

版权

简单来说trie 就是一个 ordered tree 排列依据可以是 alpha 也可以是数值。并且是递归的。这样的方式即可以大量压缩同前缀的串，也可以可容易作到子树的融合。生成apiori的candidate 与接下来的删枝就可以在一个树上做了。算法和代码，论文来源是《A fast APRIORI implementation》 Ferenc Bodon。本来他在另一片文章中说，数据库读入内存后也用trie来存的这样在每次生成k-item 之后就可以删减样本数据库的大小了，可是因为lack of time 就么有做了-_-! 嗯，怎么说呢，这就是学校的科研啊，出论文是王道啊，实现？咳咳，该招个研究生了吧？

我之所以又回头看看这个实现是因为跟同学聊天到现在市面上已经有了 2T 内存 3T硬盘的笔记本了，进XP 是瞬间。差不多1万刀。那么硬件那么贱价了拿来干什么好呢？那就算东西吧！以前在政府 Bureaucracy 或者大学才有能力做的大规模数据分析工作，现在一个hacker在家也能倒弄了。1万刀虽然贵但是比小型机可是便宜太多了。设想一下，拿一个T来存数据库的trie ，设数据压比例达到十，（如果是人名的话不希奇）那就有10995116277760byte 也就是说能存下1万亿个字，做什么分析都行了。。。那么从这个角度而言，计算机的发展的局限也就越来越体现在软件上了。。。。

A better solution would be to apply
trie, because map does not make use of the fact that two
baskets can have the same prexes. Hence insertion of a
basket would be faster, and the memory need would be
smaller, since the same prexes would be stored just once.
Because of the lack of time trie-based basket storing was
not implemented and we do not delete a reduced basket
from the map if it did not contain any candidate during some scan.

Our APRIORI implementation can be further improved
if trie is used to store reduced basket, and a reduced basket
is removed if it does not contain any candidate.

wineceramic

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分析apriori 算法的 trie实现

简单来说trie 就是一个 ordered tree 排列依据可以是 alpha 也可以是数值。并且是递归的。这样的方式即可以大量压缩同前缀的串，也可以可容易作到子树的融合。生成apiori的candidate 与接下来的删枝就可以在一个树上做了。算法和代码，论文来源是《A fast APRIORI implementation》 Ferenc Bodon。本来他在另一片文章中说，数据库
复制链接

扫一扫