RDBMS
Surface down to deep inside:
- an ecosystem with many tools (jdbc, jpa ...)
- Language support - SQL.
- Structured data (table, column, etc)
- Normalized to improve the integrity and save space (join)
- Transaction consistency (multi tables/objects commit)
All-In-One box, save a lot of effort managing ACID.
RDBMS Scaling
- Move to dedicated Database Server
- Too Many Read: Add Cache to Reduce the pressure from read. (Read is no longer ACID)
- Too Many Write: Adding more hardware in;
- Feature Getting Complicated: complex join -> denormalize the data to reduce join;
- Write getting slower and slower: drop index and trigger
- Partition/Sharding
Yet, hard to scale out. Even with so-called sharding/partition, significant effort and thinking has to be taken into consideration, in order to support the functions that RDBMS has provided natively
- Finding the right owner to operate (partition routing)
- Retrieve all necessary information (data locality, master/meta data, application level join)
- Transaction consistency (try to avoid cross-partition transaction, or implement distributed transaction)
This simply means that taking what RDBMS offers and reimplement them on your own
HBase
https://mapr.com/blog/in-depth-look-hbase-architecture/
http://hbase.apache.org/book.html
Column Family Oriented Database
- Table -> Row Keys Partition -> Regions
- Region -> Split -> Regions
- Region -> Column Families -> HFiles -> HDFS
- HFiles contains Cells and metadata
- Cells = Row + (Column Family, Column Qualifier, Timestamp) -> Value; (Key, Value)
Architectural Components
HBase RegionServers
Serve data for reads and writes and is coloated with HDFS DataNode;
HBase HMaster
Region assignments, DDL operations are handled by HBase HMaster
Zookeeper
Part of HDFS and maintains a live cluster state.