今天看了一下Google BigTable 的ppt ( 2005-10-18: Jeff Dean gave a talk at the University of Washington about Big Table - their system for storing large amounts of data in a semi-structured manner),感觉没什么新鲜的
1。水平分割table 一般的并行数据库的方法
2。一个cell里是多维信息,实际是一个信息在多个时间点的snapshot
3。family column和一般column分开存储,因为family column的列是可变的(?)
4。定位tablet的方法,这个很赞 我觉得有点象Linux的多级索引
5。大部分都是分布式数据库的方法 如lock service, master 什么的
6。压缩 我不太知道,没什么好说的
有个地方没搞明白,When a machine goes down, the master redistributes its log chunks to other machines to process (and these machines store the processed results locally). The machines that pick up the tablets then query the master for machine for their data. recently acquired tablet) and then go directly to the the location of the processed results (to update their
如果一个down了,master会把他的log分发出去。master又没有那个机子的log,master怎么分发?难道是所有的log放在一起?但ppt上说是一个机子(可以有多个tablets)一个log。