I still remember my 'column family' aha moment two years ago. It's been a quite challenging journey to travel from RDBMS to BigTable. Here are some good materials to get you started:
- Treat ColumnFamily as multi-dimensional maps is a great way to migrate existing knowledge to new field. I especially like his way to explain how
rowkey
,family
,qualifier
works. - HBase schema design model : another concrete examples comparing solving the same data model using RDBMS and HBase.
- WTF is a SuperColumn? An Intro to the Cassandra Data Model
More in-depth information to get started with HBase
- HBase shell and 0.18 programming API : A bit out-of-date usage of API but the concepts were still valid.
- Official HBase Architecture : different from the one below, this one focus on physical design of data location, etc. A must read for serious hbase performance tuning and a sound schema design. The "descending" byte order of the physical layout is a key to understand "pagination" link below.
- HBase Archtiecture: Storage : In-depth article on how hbase uses hdfs and region server communication details.
- HBase pagination like SQL's LIMIT/OFFSET : the key is to create the composite key and use a scanner to show the results within a range using old-faithful counter.
Use HBase with Hadoop mapreduce: