Tutorials for HBase: concepts, architecture, mapreduce, etc.

最新推荐文章于 2022-12-31 23:23:10 发布

macyang

最新推荐文章于 2022-12-31 23:23:10 发布

阅读量715

点赞数

分类专栏： distributed system 文章标签： mapreduce hbase tutorials cassandra schema performance

本文链接：https://blog.csdn.net/macyang/article/details/6151570

版权

distributed system 专栏收录该内容

76 篇文章 0 订阅

订阅专栏

I still remember my 'column family' aha moment two years ago. It's been a quite challenging journey to travel from RDBMS to BigTable. Here are some good materials to get you started:

Treat ColumnFamily as multi-dimensional maps is a great way to migrate existing knowledge to new field. I especially like his way to explain how rowkey , family , qualifier works.
HBase schema design model : another concrete examples comparing solving the same data model using RDBMS and HBase.
WTF is a SuperColumn? An Intro to the Cassandra Data Model

More in-depth information to get started with HBase

HBase shell and 0.18 programming API : A bit out-of-date usage of API but the concepts were still valid.
Official HBase Architecture : different from the one below, this one focus on physical design of data location, etc. A must read for serious hbase performance tuning and a sound schema design. The "descending" byte order of the physical layout is a key to understand "pagination" link below.
HBase Archtiecture: Storage : In-depth article on how hbase uses hdfs and region server communication details.
HBase pagination like SQL's LIMIT/OFFSET : the key is to create the composite key and use a scanner to show the results within a range using old-faithful counter.

Use HBase with Hadoop mapreduce: