Anthill: 一种基于MapReduce的分布式DBMS

MapReduce is a parallel computing model proposed by Google for large data sets, it’s proved to have high availability, good scalability and fault tolerance, and has been widely used in recent years. However, there is a voice from traditional database community, D. J. DeWitt et al argue that MapReduce loses the performance and efficiency from DBMS, and it’s a step backwards of data analysis techniques. After that, HadoopDB article presents a new type of MapReduce based distributed database implementation, but it has the following disadvantages: (1) It doesn’t add any query execution engine for Hadoop client, HadoopDB is only a pilot project; (2) HadoopDB uses PostgreSQL as its underlying database, where redundancies and backups which Hadoop’s underlying file system HDFS provides is short of, thus it loses Hadoop’s original high availability; (3) There is not a complete partition mechanism, its tables are manually partitioned, this is not practical; (4) Table joining in HadoopDB is assumed in the ideal state, that two tables’ partitions are on the same node, but it’s not the case in the real world.
This paper overcomes the above problems, our implementation named Anthill can keep both advantages of shared-nothing MPP architecture databases and the MapReduce. Experiments show that: (1) Anthill provides better scalability than MPP databases, it can deploy more than 100 nodes, even 500 nodes; (2) Since Anthill uses the column-oriented database MonetDB as its underlying database, where I/O is effectively reduced, it achieves better performance than Hadoop and HadoopDB; (3) Anthill adds a complete query execution engine with parser, optimizer and planner, thus it can more intelligently handle the diversity of data and produce better query plans compared to HadoopDB. In short, Anthill is a new type of distributed database system with high commercial values.

Keywords: Distributed Database; Anthill; MapReduce; Hadoop

[flash=425,355]http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=anthill-100511101418-phpapp01&stripped_title=anthill-a-distributed-dbms-based-on-mapreduce[/flash]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值