Happy Apache Cassandra 2: File Store Format

原创 2011年11月28日 14:37:57

h3. Model Review

Cassandra data model

Column Family



Super Column Family



h3. File Format

h4. Keyspace

  • Each Keyspace(Eg. Lobs)in separated directory
  • Each ColumnFamily(Eg. object) in separated sstable files
    • ColumnFamilyName-version-#-Data.db
    • ColumnFamilyName-version-#-Index.db
    • ColumnFamilyName-version-#-Filter.db

h4. Index File 

  • Each entry is [Key, Position], order by key
  • Index Summary(pre-loaded into memory), every 1 entry in 128 entries
  • Key Cache, cache the entry after access it


h4. data file

  • Each entry is [Row], order by key
  • Each Row contains columns, order by column name
  • Each Row contains column index (in range)


h4. others

  • Bloom filter file
    • Given {an element}, answer {is contained in a set} .
    • Given a {key}, answer {is contained in current sstable file}
  • Commit log file
    • Used in write, avoid data lost 
  • Statistical file

h3. CRUD

h4. write

BigTable

  • First write to a disk commit log (sequential) 
  • Update to appropriate memtables
  • Memtables are flushed to disk into SSTable ( immutable)


h4. read

  • Check row cache
  • Cassandra will read all the SSTables for that Column Family
    • Bloom Filter for each SSTable to determine whether this SSTable contains the key
    • Use index in SSTable to locate the data (check key cache)
    • Read from data file

h4. update

  • Same sequence with write
    • SSTable is immutable
    • Update is write into new SSTable
  • Column Reconcile
    • Read columns from all the SSTables
    • Merge/Reduced columns with same name, CollationController
      • Use timestamp, return latest column

h4. delete

  • Same sequence with write
    • SSTable is immutable
    • Delete is write into new SSTable
  • Delete a column
    • Column flag is set to delete flag
  • Delete a Row
    • markedForDeleteAt  is set to delete timestamp
    • Column timestamp is compared with markedForDeleteAt when reducing columns in read

h4. drop

  • Drop a Column Family
    • Take snapshots (move to snapshot dir)
    • Remove column family definition
  • Drop a Keyspace
    • Take snapshots (move to snapshot dir)
    • Remove keyspace definition

h4. Compaction

Periodically data files are merged sorted into a new file (and creates new index)
  • Merge keys 
  • Combine columns 
  • Discard tombstones


References

http://jonathanhui.com/how-cassandra-read-persists-data-and-maintain-consistency

相关文章推荐

M2TS file format

  • 2009年10月30日 16:15
  • 2.79MB
  • 下载

Apache Cassandra Learning Step by Step (2): Core Concepts

====15 Feb 2012, by Bright Zheng (IT进行时)==== 3. Core Concepts 3.1.  Keyspace 3.1.1.  Intro A key...

Unity64 AStarPath 寻路失效 Bug解决 IOS64 IL2CPP - Bad date/time format in the zip file

把游戏项目迁移到IOS64 上面又出现了自动寻路无效的BUG,在XCode Console中有提示AStarpath异常, Bad date/time format in the zip file 在...

apache-cassandra-0.7.6-2

  • 2011年06月04日 12:25
  • 9.59MB
  • 下载

store file in jpeg

  • 2007年09月05日 16:05
  • 138KB
  • 下载

Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc?

1. If you partition your data at the application level, MySQL scalability isnt an issue. Facebook re...
  • macyang
  • macyang
  • 2011年06月07日 14:01
  • 624

Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, or CouchDB?

If you partition your data at the application level, MySQL scalability isn't an issue. Facebook repo...
  • wiksys
  • wiksys
  • 2015年08月03日 00:29
  • 290
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Happy Apache Cassandra 2: File Store Format
举报原因:
原因补充:

(最多只允许输入30个字)