Stonebraker在redbook里面说数据仓库现在只需要列存数据库了。不知道是不是真的。好像没有说理由。
行存数据库,每一行被serialized。然后用Row:Column来对某一列做index,便于查找。
primary key是row id
列存数据库,每一列加上row id,被serialized。有一点点像index。
primary key是data
对稀疏表,还是列存比较合适,很多空列直接跳过。
Partitioning, indexing, caching, views, OLAP cubes, and transactional systems such as write-ahead logging or multiversion concurrency control all dramatically affect the physical organization of either system. That said, online transaction processing (OLTP)-focused RDBMS systems are more row-oriented, while online analytical processing (OLAP)-focused systems are a balance of row-oriented and column-oriented.
Quote from:wiki page