hbase本身已经提供了实时查询功能了,如bloom filters等.如果加入mr的话,那么好处是什么呢?
如果使用mr对hbase中的数据进行非实时性的统计分析,这是可行的,但这样已经有了替代方案:hive.
这正如pig基于hadoop一样,将mr的业务抽取出来了.
references:
http://hadoop.nchc.org.tw/phpbb/viewtopic.php?f=7&t=2320
用MR(MapReduce)查询hbase数据-用到TableMapper和Scan
hbase doc:
HBase provides:
- Linear and modular scalability.
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
- Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX