Hypertable - 概述

最新推荐文章于 2014-10-27 12:55:26 发布

guxch

最新推荐文章于 2014-10-27 12:55:26 发布

阅读量1.8k

点赞数

分类专栏：分布式计算 C/C++

C/C++ 同时被 2 个专栏收录

67 篇文章 15 订阅

订阅专栏

分布式计算

24 篇文章 2 订阅

订阅专栏

【注：近来需要使用分布式数据库，就把Hypertable看了一下，现将官方网站的有关文档翻译出来，供大家参考。】

Overview

（http://hypertable.com/documentation/）

概述

Hypertable is a high performance, opensource, massively scalable database modeled after Bigtable, Google'sproprietary, massively scalable database. This page provides a brief overview of Hypertable, comparing it with arelational database, highlighting some of its unique features, and illustratinghow it scales.

Hypertable是取材于Goolge的Bigtable的一个高性能、开源的、高度可伸缩的数据库。本文是对Hypertable的一个简介，并与关系数据库进行了比较，着重介绍了它的独有特性以及它是如何进行扩展的。

Comparison to aRelational Database

与关系数据库的比较

Hypertable is similar to a relational databasein that it represents data as tables of information, with rows and columns, butthat's about as far as the analogy goes. The following is a list of some of the main differences

Row keys are UTF-8 strings
No support for data types, values are treated as opaque bytesequences
No support for joins
No support for transactions

Tables in Hypertable can be thought of asmassive tables of data, sorted by a single primary key, the row key.

Hypertable与关系数据库一样，都将信息存在表中，都有行和列，但它们仅是逻辑上相似。以下是一些主要的不同：

行索引是UTF-8字符串
不支持数据类型，数据都是不透明的字节串
不支持join操作
不支持事务

Hypertable中的表可以认为大量数据表的集合，它们依据一个主键排序，这个主键就是行索引。

Physical Layout

物理结构

Arelational database assumes that each column defined in the table schema willhave a value for each row that is present in the table. NULL values are usually represented with aspecial marker (e.g. \N). The primarykey and column identifier are implicitly associated with each cell based on itsphysical position within the layout. Thefollowing diagram illustrates how a relational database table might be laid outon disk.

关系数据库中，定义在表中的列在每一行都有一个值，NULL是一个特别的标志（例如用\N表示）。根据数据单元的存在的位置，主键、列和每个数据单元都隐式地关联着。下图显示了关系库中的表是如何存放在磁盘上的。

Hypertable (and Bigtable) takes its designfrom the Log Structured Merge Tree.pdf. It flattens out the table structure into a sorted list of key/valuepairs, each one representing a cell in the table. The key includes the full row and columnidentifier, which means each cell is provided complete addressinginformation. Cells that are NULL aresimply not included in the list which makes this design particularlywell-suited for sparse data. Thefollowing diagram illustrates how Hypertable stores table data on-disk.

Hypertable(及Bigtable)的设计来源于” theLog Structured Merge Tree.pdf”。它将表结构平面化，变成一系列key/value对，每一个key/value对代表关系库表中的一个单元，其中key包含了行和列的标识，也就是说，key包含了完整的查找定位信息。此时，NULL就不是必需的，因此这种设计特别适合于稀疏数据。下图表示出Hypertable中表中的数据时如何保存在磁盘上的。

Though there can be a fair amount ofredundancy in the row keys and column identifiers, Hypertable employskey-prefix and block data compression which considerably mitigates thisproblem.

尽管由行键和列标识组成的key中有大量的冗余，Hypertable利用加前缀和数据块压缩来消除这一问题。

Cell Versions

单元版本

Hypertable extends the traditionaltwo-dimensional table model by adding a third dimension: timestamp. This timestamp dimension can be thought of as representing differentversions of each table cell, as illustrated in the following diagram.

Hypertable扩展了传统的二维数据表模型，增加一个第三维：时间戳。时间戳维度可以被认为给单元格数据加了一个版本标识，如下图所示。

Whenqueried, the most recent cell version is returned first. By default, all cell versions are retainedfor each column, but the number of versions retained can be capped byspecifying the MAX_VERSIONS option to the column specification in the CREATETABLE statement. The timestamp can besupplied by the application at insert time, or can be auto-generated (default).

在查询的时刻，最新的数据被首先返回。缺省情况下，单元数据的所有版本都被保留，但是，这也可以通过在CREATE TABLE语句中设置MAX_VERSIONS选项而改变。时间戳可以在插入时由应用程序指定，也可以自动生成（缺省情况）。

Column Qualifiers

列标识

This feature provides a way for users tointroduce sparse column data that can be easily selected with Hypertable QueryLanguage (HQL) or any of the other query interfaces.

采用HQL或其它任何查询接口，可以很容易地检索稀疏列中的数据。

Acolumn specification in the Hypertable CREATE TABLE statement actually definesa set of related columns known as a column family. Users may supply an optional column qualifierand specify the qualified column as family:qualifier. The qualifier is a NUL-terminatedstring. For example, if a column familytag is specified in a CREATE TABLE statement, as shown below,

在Hypertable的CREATE TABLEy的语句中，定义一个列，实际上是定义了若干相关的列,它们可称为一个列族（column family）.用户可以提供一个可选的列标识，并指定这个合法的列作为列族标识。这个标识是一个以NUL结尾的字符串。例如，如果在CREATE TABLE语句中一个列族tag的定义如下：

CREATE TABLE Info (

tag

);

thenqualified columns such as the following may be created/inserted into the table.

则如下的列都是合法的，它们都可以被创建/插入到表中。

tag:bigtable

tag:nosql

tag:bigdata

Namespaces

命名空间

Namespaces provide a way to logically grouptables together and are analogous to the directory hierarchy in a modernfilesystem. Namespaces allow you toorganize your tables into related groups, keeping table names simple, as tablenames need only be unique within the namespace in which they are created. All Hypertable instances have a built-indefault root namespace "/". The following diagram illustrates an example namespace hierarchy.

命名空间提供了一种将表进行逻辑上组合的方法，就如同现代文件系统中的目录层次结构一样。命名空间允许你把表分类到相关的组，这样尽量使表名简单，因为它只要在所在的命名空间中唯一就可以了。所有Hypertable实例有一个内建的缺省的根命名空间”/”。下图显示了一个命名空间层次结构的例子。

How Scaling Works

系统扩展

Thissection illustrates how Hypertable scales. Let's say the system has been loaded with the following two tables, asession ID table and a crawl database table.

这一节介绍Hypertable的扩展。举一个例子，假如系统载入了连个表：session和crawl。

Overtime, Hypertable will break these tables into ranges and distribute them towhat are known as RangeServer processes. These processes manage ranges of table data and run on all slave servermachines in the cluster. For example,assuming there are three slave servers, the following diagram shows what thesystem might look like over time. As canbe seen by the diagram, the three servers are filled to capacity.

过了段时间，Hypertable会把这些表分割成区段（range），并把这些range分配到叫做RangeServer的处理进程中。这些处理进程运行在集群中的从属服务器（slave server）中，管理表数据的各个区段。例如有3个从属服务器，一段时间后，系统可能看起来像下图的样子。从图中，可以看出，这3个服务器都填满了数据，达到了它们的容量。

Adding more capacity is a simple matter ofadding new commodity class servers and starting RangeServer processes on thenew machines. Hypertable will detectthat there are new servers available with plenty of spare capacity and willautomatically migrate ranges from the overloaded machines onto the new ones.

增加处理能力很容易，加一个商业级的服务器，在它上面启动RangeServer处理进程就可以了。Hypertable会检测到新的服务器，发现有足够的空闲空间，就会自动把过载机器上的range迁移到新机器上去。