大数据解决方案核心的核心 —— Map技术

对于c++程序来说 map的使用无处不在。影响程序性能的瓶颈也往往是map的性能。尤其在大数据情况下,以及业务关联紧密而无法实现数据分
发和并行处理的情况。map的性能就成了最关键的技术。

在电信行业和信息安全行业的工作经历,我都是和底层大数据打交道,尤其信息安全行业数据最复杂,都离不开map。
比如:ip表、mac表,电话号码表、域名解析表、身份证号码表的查询、病毒木马的特征码的云查杀等等。
stl库的map采用二分查找,性能最差。Google的哈希map性能和内存目前是最优的,但是有重复碰撞的机率。现在大数据 基本上不用有碰撞
几率的map。

现在我把pwwMap算法发布出来。大家可以测试对比发现,我的算法属于零碰撞的几率,但是性能比哈希算法还优。就是普通map的性能也和google相差无几。

程序使用我的map 最直接的效益就是 原来需要十个服务器解决的方案 现在只需要一个服务器。

下载地址:第二个实时更新代码
http://download.csdn.net/detail/pww71/9379828
http://sourceforge.net/projects/pwwhashmap/?source=navbar

pwwMap包含三种map。
1、memmap  支持插入,修改,删除,查询等日常操作。适合任何场合下使用。采用独特的索引技术,性能和内存比stl库的map高百倍以上。
2、hashmap  仅仅支持查询。适合高速查询的场合。完美哈希算法,无碰撞几率。性能比目前的google哈希算法快100倍。
3、diskmap 支持插入,修改,删除,查询等日常操作,nosql单机版的功能。其数据量可以高达百亿,查询性能依旧强劲。对比google的leveldb技术,优势明显。
个人认为:1和2两种map是核心,3的map主要是采用了1的map做索引,所以性能才超越google的levedb。

我的算法是完美哈希算法,键的索引以及压缩算法的原理是独树一帜与众不同的,关键是结构完全不同,所以键索引压缩就根本性不同。大家
可以参考以下文章:
http://blog.csdn.net/chixinmuzi/article/details/1727195

欢迎加入qq技术群:14773392
Title: The core of the big data solutions -- Map. Author: pengwenwei address: No.17-18 of XiangGangbatang Community, Xiangtan City of Hunan Province, China. Language: c++ Platform: Windows, linux Technology: Perfect hash algorithm Level: Advanced Description: A high performance map algorithm Section MFC c++ map stl SubSection c++ algorithm License: (GPLv3) Map is widely used in c++ programs. Its performance is critical to programs' performance. Especially in big data and the scenarios which can't realize data distribution and parallel processing. I have been working on big data analysis for many years in telecommunition and information security industry. The data analysis is so complicated that they can't work without map. Especially in information security industry, the data is much more complicated than others. For example, ip table, mac table, telephone numbers table, dns table etc. Currently, the STL map and Google's hash map are the most popular maps. But they have some disadvantages. The STL map is based on binary chop, which causes a bad performance. Google Hash map has the best performance at present, but it has probability of collision. For big data analysis, the collision probability is unacceptable. Now I would like to publish pwwMap. It includes three different maps for different scenarios: 1. Memory Map(memMap): It has a good access speed. But its size is limited by memory size. 2. Harddisk Map(diskMap): It utilizes hard disk to store data. So it could accept much more data than memory map. 3. Hashmap(hashMap): It has the best performance and a great lookup speed, but it doesn't have 'insert' and 'delete' functionality. MemMap and diskMap could be converted to hashMap by function memMap2HashMap and diskMap2HashMap. According to the test result, my algorithms' collision probability is zero. About performance, memMap has a comparable performance with google, and hashMap's performance is 100 times better than Google's hashmap. In summary, pwwhash are perfect hash algorithms with zero collision probability. You can refer to following artical to find the key index and compress algorithm theory: http://blog.csdn.net/chixinmuzi/article/details/1727195 Source code and documents: https://sourceforge.net/projects/pwwhashmap/files/?source=navbar
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值