Hypertable - 安装－MapR

最新推荐文章于 2020-05-21 06:15:52 发布

guxch

最新推荐文章于 2020-05-21 06:15:52 发布

阅读量3.4k

点赞数

分类专栏：分布式计算 C/C++

C/C++ 同时被 2 个专栏收录

67 篇文章 15 订阅

订阅专栏

分布式计算

24 篇文章 2 订阅

订阅专栏

【 Hypertable有几种安装方式，入下表：

1. 单机：安装于单机，采用本地文件系统

2. Hadoop：分布式安装，在Hadoop之上(HDFS)

3. MapR：分布式安装，在MapR之上

4. ThriftBroker：在应用服务器上安装ThriftBroker】

MapR

（http://hypertable.com/documentation/installation/mapr/）

MapR is a scalable filesystem written in C++ thatis modeled after the Google File System and is 100% API compatible withApache Hadoop. Hypertable ships with a MapR filesystem broker thatcommunicates with the MapR servers directly and allows Hypertable toefficienly run on top of the MapR filesystem for maximumperformance. This document describes how to get Hypertable up andrunning on top of the MapR filesystem.

MapR是一个用C++编写的采用GFS结构模型的可扩展的文件系统，它的API100%与Apache Hadoop兼容。Hypertable带着一个MapR文件系统的代理，它可以直接与MapR服务器通讯，因此Hypertable可以最高效地运行在MapR之上。本文描述怎样使Hypertable运行在MapR之上。

Prerequisites

Step 1 - Install MapR

Step 2 - Install Capistrano

Step 3 - Edit Capistrano Capfile

Step 4 - Install Hypertable Binaries

Step 5 - FHS-ize Installation

Step 6 - Create and Distributehypertable.cfg

Step 7 - Set "current" link

Step 8 - Synchronize Clocks

Step 9 - Start Hypertable

Step 10 - Verify Installation

Step 11 - Stop Hypertable

What Next?

Prerequisites

先决条件

Before you getstarted with the installation, there are some general system requirements thatneed to be satisfied before proceeding. These requirements are describedin the following list.

admin machine - You should designate one of the machines in your Hypertable cluster as the admin machine (admin1 in examples below). This is the machine from which you will be administering the cluster. It can be the same machine as the master or any machine of your choosing. There are no special hardware requirements for this machine, but it needs to have Internet access (at least temporarily) to get the recommended cluster management tool, Capistrano, installed on it. It is possible to install Capistrano without Internet access, but its challenging and could take you half a day to get it working.
password-less ssh - For ease of administration, we recommend usingCapistrano, which requires password-less ssh login access from the admin machine to all other machines in the cluster (masters, hyperspace replicas, range servers, etc). SeePassword-less SSH Login for details on how to set this up.
ssh MaxStartups - sshd on the admin machine needs to be configured to allow simultaneous connections from all of the machines in the Hypertable cluster. The default simultaneous connection limit, MaxStartups, defaults to 10. SeeSSH Connection Limit for details on how to increase this limit.
firewall - The Hypertable processes use TCP and UDP to communicate with one another and with client applications. Firewalls can block this traffic and prevent Hypertable from operating properly. Any firewall that blocks traffic between the Hypertable machines should be disabled or the appropriate ports should be opened up to allow Hypertable communication. SeeHypertable Firewall Requirements for instructions on how to do this.

在开始安装以前，有一些基本的系统要求必须满足，这些条件如下：

管理（admin）机器 -.你要指派Hypertable集群中的一台机器作为管理机器（以下例子中为admin1）。你从这台机器管理集群中其他机器。它可以和Master是一台机器，或任何一台你选择的机器。硬件上对这台机器没有特别要求，但是它需要有Internet访问权限（至少是暂时的），下载推荐的集群管理工具Capistrano，安装它。虽然没有Internet访问权限也可以安装Capistrano，但这将是一件具有挑战意义的事情，有可能花费你半天时间才能做好。
password-less ssh -.为了减轻管理难度，我们建议使用Capistrano，它要求使用password-less ssh【译者注：一种不使用password，而使用公钥/私钥对的登录方式】从管理机器登录到集群中的其他机器（masters, hyperspace replicas, range servers, 等）。关于如何使Password-less SSH Login工作，请参阅http://hypertable.com/documentation/misc/password_less_ssh_login/
ssh MaxStartups -.admin机器上的sshd需要设置成运行Hypertable集群中的所有机器都能同时连接上，缺省的同时连接数目MaxStartups为10。关于如何增加SSH的连接数目限制，请参阅http://hypertable.com/documentation/misc/ssh_maxstartups/

防火墙– Hypertable进程间以及与客户应用间采用TCP和UDP通讯。防火墙可能阻止这种通讯，使Hypertable不能正常工作。任何阻止这种通讯或关闭特定端口的防火墙都不应该打开，Hypertable必需通过特定的端口通讯。关于如何做请参阅“Hypertable Firewall Requirements”（http://hypertable.com/documentation/misc/firewall_requirements/）。

打开文件的限制–大多数操作系统对一个进程所能同时打开的最多文件的数目都有限制。如果你计划将大量文件载入Hypertable，你可能需要修改这个限制值。关于增加此限制的细节参考“Open File Limit”（http://hypertable.com/documentation/misc/how_to_increase_open_file_limit/）。

Step 1 - InstallMapR

步骤1–安装MapR

The first step ingetting Hypertable up and running on top of MapR is to install MapR andconfigure it to work with Hypertable. The Hypertable 64-bit packagescontain a MapR broker that allows Hypertable to run on top of the MapRfilesystem (MapR is currently unavailable for 32-bit platforms). FollowtheMapRInstallation Instructions and be sure to co-locate the MapR mfsprocesses with the Hypertable RangeServer machines.

To configure MapRto work properly with Hypertable, you will need to tell thewardenprocess to launch the mfs processes with a smaller heap size than thedefault. To do this, add the following line to warden configuration file(/opt/mapr/conf/warden.conf):

启动Hypertable并运行在MapR上的第一步是安装MapR，并配置它，使之与Hypertable一起工作。Hypertable的64位包中包含了一个MapR代理，允许Hypertable运行在MapR文件系统之上(MapR目前没有在32位平台上运行的版本)。遵循MapR安装指南（http://www.mapr.com/doc/display/MapR/Installation+Guide）并保证MapR的mfs进程与Hypertable RangeServer在同一机器上。

为正确配置MapR，使之与Hypertable一起工作，你需要告诉warden进程设置一个比缺省值更小的堆来启动mfs进程，为此，在warden配置文件(/opt/mapr/conf/warden.conf)中添加如下行：

service.command.mfs.heapsize.percent=20

Then launch theMapR filesystem. Once you have it up and running, create a volumecalled "hypertable" and mount it at /hypertable, use the following:

然后启动MapR文件系统，一旦它起来并运行了，用如下命令，创建一个叫“hypertable”的卷，并把它挂载到/hypertable：

$ maprcli volume create -name hypertable -path/hypertable

Step 2 - InstallCapistrano

步骤2–安装Capistrano

The Hypertable distribution comes with anumber of scripts to start and stop the various servers that make up aHypertable cluster. You can use your own cluster management tool to launchthese scripts and deploy new binaries. However, if you're not already using acluster management tool, we recommendCapistrano. The distribution comes with a Capistrano config file(conf/Capfile.cluster) that makes deploying and launching Hypertable a breeze.

Capistrano is a simple tool for automatingthe remote execution of tasks. It uses ssh to do the remote execution. To easedeployment, you should have password-less ssh access (i.e. public key) to allof the machines in your cluster. Installing Capistrano is pretty simple. Onmost systems you just need to execute the following command (Internet accessrequired):

Hypertable的安装包带有一系列启停构成Hypertable集群的各种服务的脚本，你可以用自己的集群管理工具来运行这些脚步和部署新的二进制包。但是如果你还没有一个集群管理工具，建议采用Capistrano。Hypertable安装包所带的Capistrano配置文件(conf/Capfile.cluster)，会使部署和启动Hypertable非常容易。

Capistrano是一个自动化远程任务运行的小工具，它用ssh来实现远程运行。为简化部署，你应该使集群中的所有机器的ssh都采用password-less访问方式（例如，采用公钥）。安装Capistrano很简单，在远程机器上，你只需运行如下命令（需要Internet访问权限）:

$ sudo gem update

$ sudo gem install capistrano

After this installation step you should nowhave the cap program in your path:

本步安装完成后，在你的路径下，你应该有如下的cap程序：

$ cap --version

Capistrano v2.9.0

Step 3 - EditCapistrano Capfile

步骤3–编辑Capistrano Capfile

Once you have Capistrano installed, copy theconf/Capfile.cluster that comes with the Hypertable distribution to yourworking directory (e.g. home directory) on admin1, rename it to Capfile, andtailor it for your environment. The cap command reads the file Capfile in thecurrent working directory by default. There are some variables that are set atthe top that you need to modify for your particular environment. The followingshows the variables at the top of the Capfile that need modification:

一旦你完成了Capistrano的安装，在admin1上拷贝随Hypertable带的conf/Capfile.cluster到你的工作目录（例如home目录），重新命名为Capfile，并根据你的环境裁剪它。缺省情况下cap命令读取工作目录的Capfile文件。对于你特定的环境，头部有几个变量需要修改，下面显示了这几个你需要修改的变量：

set :source_machine, "admin1"

set :install_dir, "/opt/hypertable"

set :hypertable_version, "0.9.5.5"

set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"

set :default_dfs, "mapr"

set :default_config, "/root/hypertable.cfg"

Here's a brief description of each variable:

以下是每个变量的简介。

Table 2. Hypertable Capistrano Variables
Variable	Description
source_machine	machine from which you will build the binaries, distribute them to the other machines, and launch the service.
install_dir	directory on source_machine where you have installed Hypertable. It is also the directory on the remote machines where the installation will get rsync'ed to.
hypertable_version	version of Hypertable you are deploying
default_pkg	Path to binary package file (.dmg, .rpm, or .tar.bz2) on source machine
default_dfs	distributed file system you are running Hypertable on top of. For MapR use "mapr'
default_config	location of the default Hypertable configuration file that plan to use

表2 Hypertable Capistrano变量
变量	描述
source_machine	你要构建二进制包、分发它们并启动服务的机器
install_dir	source_machine上你安装Hypertable的目录。它也是安装程序镜像到远程机器上的目录。
hypertable_version	部署的Hypertable的版本
default_pkg	source_machine机器上指向二进制包的路径(.dmg, .rpm, or .tar.bz2)
default_dfs	Hypertable运行的依赖的分布式文件系统，对MapR，为”mapr”
default_config	Hypertable缺省的配置文件的位置

In addition to the above variables, you alsoneed to define three roles, one for the machine that will run the masterprocesses, one for the machines that will run the Hyperspace replicas, and onefor the machines that will run the RangeServers. Edit the following lines:

除了以上变量，你也需要定义3个角色，一个是针对运行master有关进程的机器，一个是针对运行Hypersapce复制的机器，一个是针对运行运行RangeServer的机器，编辑以下行：

role :source, "admin1"

role :master, "master"

role :hyperspace, "hyperspace001","hyperspace002", "hyperspace003"

role :slave, "slave001", "slave002", "slave003","slave004", "slave005", "slave006","slave007", "slave008"

role :localhost, "admin1"

role :thriftbroker

role :spare

The following table describes each role.

下表描述每个角色。

Table 3. Hypertable Capistrano Roles
Role	Description
source	The machine from which you will be distributing the binaries (admin1 in this example).
master	The machine that will run the Hypertable master process as well as a DFS broker. Ideally this machine is high quality and somewhat lightly loaded (e.g. not running a RangeServer). Typically you would have a high quality machine running the Hypertable master, a Hyperspace replica, and the HDFS NameNode
hyperspace	The machines that will run Hyperspace replicas. There should be at least one machine defined for this role. The machines that take on this role should be somewhat lightly loaded (e.g. not running a RangeServer)
slave	The machines that will run RangeServers. Hypertable is designed to run on a filesystem like HDFS. In fact, the system works best from a performance standpoint when the RangeServers are run on the same machines as the HDFS DataNodes. This role will also launch a DFS broker and a ThriftBroker.
localhost	The name of the machine that you're administering the cluster from (admin1 in this example).
thriftbroker	Additional machines that will be running a ThriftBroker (e.g. web servers). NOTE: You do not have to add the slave machines to this role, since a ThriftBroker is automatically started on each slave machine to support MapReduce.
spare	Machines that will act as standbys. They will be kept current with the latest binaries.

表3. Hypertable Capistrano角色
Role	Description
source	你准备分发二进制包到其他机器的机器(本例中是admin1).
master	这台机器中将运行master进程和DFS代理。理想情况下，这台机器的质量很好并且是轻载的(例如它不运行RangeServer)。典型情况下，这台好质量的计算机运行Hypertable master，Hyperspace replica和HDFS NameNode。
hyperspace	这台机器将运行Hyperspace replicas。至少有一台计算机被定义成这个这个角色。承担这个角色的机器应该是那种轻载的机器 (例如不运行 RangeServer)
slave	这台机器运行RangeServer。Hypertable被设计成可以运行在诸如HDFS之上，从最佳性能的观点，RangeServer与HDFS DataNodes应在一台机器上。这个角色也将启动DSF broker和ThriftBroker。
localhost	你从这台机器管理集群中的其他机器（本例中是admin1）
thriftbroker	另外的运行ThriftBroker的机器（例如web server）。注：你不必将slave机器加入到这个角色，因为为了支持MapReduce，ThriftBroker会在slave机器上自动启动。
spare	备份机。它们保存有最新的二进制包。

Step4 - Install Hypertable Binaries

步骤4–安装Hypertable二进制包

The Hypertable binaries can either bedownloaded prepackaged, or you can compile them from source code. To installthe prepackaged version,download the Hypertablepackage (.dmg, .rpm, or .tar.bz2) you want to install and put it somewhereaccessible on the source machine (admin1 in this example). Modify thehypertable_version and default_pkg variables at the top of the Capfile tocontain the version of Hypertable you are installing and the absolute path tothe package file on the source machine, respectively. For example, ifyou're upgrading to version 0.9.5.5 and using the RPM package, set thevariables as follows.

Hypertable的二进制包可以是下载的预编译包或从源代码自己编译而成，为安装预编译包，下载它们(.dmg, .rpm, or .tar.bz2)，并把它们放到source machine（本例中的admin1）能访问的地方，在Capfile文件头部，修改hypertable_version和default_pkg变量，使它们分别为Hypertable的版本和source机器上的包文件的绝对路径。例如，如果你安装的版本为0.9.5.5，并且使用RPM包，则变量的设置如下：

set :hypertable_version, "0.9.5.5"

set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"

To distribute and install the binary packageon all necessary machines, issue the following command. This command willcause the package to get rsync'ed to all participating machines and installedwith the appropriate package manager (rpm, dpkg, or tar) depending on thepackage type.

为分发并安装二进制包到所有必要的机器上，发出如下命令。这个命令能根据包的类型(rpm, dpkg, or tar)，使用合适的包管理器，使包镜像到所有参与的机器上。

$ cap install_package

If you prefer compiling the binariesfrom source, you can use Capistrano to distribute the binaries with rsync.On admin1 be sure Hypertable is installed in the location specified by theinstall_dir variable at the top of the Capfile and that the hypertable_versionvariable at the top of the Capfile matches the version you are installing(/opt/hypertable and 0.9.5.5 in this example). Then distribute thebinaries with the following command:

如果你喜欢从源代码编译而成的二进制包，你可以用Capistrano，采用rsync命令分发二进制包。在admin1机器上，请确保Hypertable已被安装在Capfile头部install_dir变量所指定的位置，hypertable_version变量的值与你要安装的版本一致（本例中分别为opt/hypertable和0.9.5.5），然后用下面命令分发二进制包。

$ cap dist

Step 5 - FHS-izeInstallation

步骤5– FHS-ize安装

See Filesystem Hierarchy Standard for an introduction to FHS. If you're running as a user other than root,first create the directories /etc/opt/hypertable and /var/opt/hypertable on allmachines in the cluster and change ownership to the user account under whichthe binaries will be run. For example:

FHS的介绍参阅“Filesystem Hierarchy Standard”（http://hypertable.com/documentation/misc/filesystem_hierarchy_standard_fhs/）。如果你的运行账户不是root，首先在集群的所有机器上创建两个目录/etc/opt/hypertable和 /var/opt/hypertable，把它们的所有者修改成你的运行账户，例如

$ sudo cap shell

cap> mkdir /etc/opt/hypertable /var/opt/hypertable

cap> chown chris:staff /etc/opt/hypertable/var/opt/hypertable

Then FHS-ize the installation with the following command:

接着，用如下命令完成FHS-ize安装。

$ cap fhsize

Step6 - Create and Distribute hypertable.cfg

步骤6–创建和分发Hypertable.cfg

The next step is to create a hypertable.cfgfile that is specific to your deployment. A basic hypertable.cfg file canbe found in the conf/ subdirectory of your hypertable installation which can becopied and modified as needed. The following table shows the minimum set ofrequired and recommended properties that you need to modify.

下一步是创建关于你特定部署的hypertable.cfg。在Hypertable安装目录的conf/子目录下一个基本的hypertable.cfg文件，你需要的话可以拷过来修改后使用。下表描述了你可能需要修改的最少的和推荐的属性集合。

Table 1. Recommended and Required Properties
Property	Description
Hyperspace.Replica.Host	Hostname of Hyperspace replica
Hypertable.RangeServer.Monitoring.DataDirectories	This property is optional, but recommended. It contains a list of directories that are the mount points of the HDFS data node storage volumes. By setting this property appropriately, the Hypertable monitoring system will be able to provide accurate disk usage information.

表1.推荐和必需的属性
属性	描述
Hyperspace.Replica.Host	Hyperspace replica的宿主名字
Hypertable.RangeServer.Monitoring.DataDirectories	该属性是可选的，但推荐设置。它包含一系列目录，这些目录是HDFS数据存储卷的挂载点。合适地设置此值，Hypertable监控系统就能够提供准确的磁盘使用信息。

You can leave all other properties at theirdefault values. Hypertable is designed to adapt to the hardware on whichit runs and to dynamically adapt to changes in workload, so no specialconfiguration is needed beyond the basic properties listed in the above table. For example, the following shows the changes we made to thehypertable.cfg file for our test cluster.

你可以不管其他的属性，就用它们的缺省值。Hypertable能够适应它运行的硬件，并且会根据负载进行调整，所以除了上面表中列出的基本属性外，其他配置项不需要特别的值。作为例子，下面给出了hypertable.cfg中我们做的修改，它用于我们的测试集群。

Hyperspace.Replica.Host=hyperspace001

Hyperspace.Replica.Host=hyperspace002

Hyperspace.Replica.Host=hyperspace003

Hypertable.RangeServer.Monitoring.DataDirectories="/data/1,/data/2,/data/3,/data/4"

See hypertable-example.cfg

请参阅hypertable-example.cfg（http://www.hypertable.org/pub/hypertable-example-cfg.txt）

Once you've created the hypertable.cfg filefor your cluster, put it on the source machine (admin1) and set the absolutepathname referenced in the default_config Capfile variable to point to thisfile (e.g. /root/hypertable.cfg). Then distribute the custom config files withthe following command.

一旦你创建了自己集群的hypertable.cfg，把它发到source机器上（admin1），设置Capfile的变量default_config，为指向这个文件的绝度路径（例如/etc/opt/hypertable/hypertable.cfg），然后用以下命令分发这个定制的配置文件。

$ cap push_config

If you ever need to make changes to theconfig file, make the changes, re-run cap push_config, and then restartHypertable (see sections 9 and 11, below).

如果你需要修改这个配置文件，修改它，然后重新用cap push_config分发它，再重启Hypertable(见第9,11节)

Step 7 - Set "current" link

步骤7–设置“current”链接

To make the latest version of Hypertablereferenceable from a well-known location, create a "current" link topoint to the latest installation. This can be accomplished with thefollowing command.

为使Hypertable能从一个公开的位置得到最新的版本，建议设置一个”current”，指向最新版Hypertable的安装位置。采用如下命令可完成此任务。

$ cap set_current

Step 8 -Synchronize Clocks

步骤8–同步时钟

The system cannot operate correctly unlessthe clocks on all machines are synchronized. Use theNetwork TimeProtocol (ntp) to ensure that the clocks get synchronizedand remain in sync. Run the 'date' command on all machines to make sure theyare in sync. The following Capistrano shell session show the output of acluster with properly synchronized clocks.

如果所有机器上的时钟不同步，系统将不能正确运行。采用网络时钟协议(NTP)来保证所有的时钟同步。在所有的机器上运行“date”命令来确保它们一致。下面的Capistrano对话显示了一个集群在时间同步后的输出。

cap> date

[establishing connection(s) to master, hyperspace001,hyperspace002, hyperspace003, slave001, slave002, slave003, slave004, slave005,slave006, slave007, slave008]

** [out ::master] Sat Jan 3 18:05:33 PST 2009

** [out ::hyperspace001] Sat Jan 3 18:05:33 PST2009

** [out ::hyperspace002] Sat Jan 3 18:05:33 PST2009

** [out ::hyperspace003] Sat Jan 3 18:05:33 PST2009

** [out ::slave001] Sat Jan 3 18:05:33 PST 2009

** [out ::slave002] Sat Jan 3 18:05:33 PST 2009

** [out ::slave003] Sat Jan 3 18:05:33 PST 2009

** [out ::slave004] Sat Jan 3 18:05:33 PST 2009

** [out ::slave005] Sat Jan 3 18:05:33 PST 2009

** [out ::slave007] Sat Jan 3 18:05:33 PST 2009

** [out ::slave008] Sat Jan 3 18:05:33 PST 2009

Step 9 - StartHypertable

步骤9–启动Hypertable

The following commands should be run from thedirectory containing the Capfile. To start all of the Hypertable servers:

下面的命令应该在包含有Capfile的目录下运行。为启动所有的Hypertable服务器，采用：

$ cap start

If you want to launch the service using adifferent config file than the default (e.g. /home/chris/alternate.cfg):

如果你不想采用缺省的配置文件，而采用另一个（例如/home/chris/alternate.cfg）来启动，采用：

$ cap -S config=/home/chris/alternate.cfg start

You'll need to specify the same config filewhen running Hypertable commands such as the command shell, for example:

运行Hypertable命令时，你可能需要指定同样的配置文件，例如，运行命令shell:

$ /opt/hypertable/current/bin/ht shell--config=/home/chris/alternate.cfg

Step 10 - VerifyInstallation

步骤10–验证安装

Create a table.

创建表。

echo "USE '/'; CREATE TABLE foo ( c1, c2 ); GETLISTING;" \

|/opt/hypertable/current/bin/ht shell --batch

The output of this command should look like:

它的输出应该像这样：

foo

sys (namespace)

Load some data.

载入数据。

echo "USE '/'; INSERT INTO foo VALUES('001','c1', 'very'), \

('000','c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \

|/opt/hypertable/current/bin/ht shell --batch

Dump the table.

导出数据。

echo "USE '/'; SELECT * FROM foo;" \

| /opt/hypertable/current/bin/htshell --batch

The output of this command should look like:

它的输出像这样：

000 c1 Hypertable

000 c2 is

001 c1 very

001 c2 easy

Step 11 - StopHypertable

步骤11–停止Hypertable

To stop theservice, shutting down all servers:

为停止服务，用如下命令停止所有服务器：

$ cap stop

If you want towipe your database clean, removing all namespaces and tables:

如果你想清理数据库，删除所有空间和表，采用：

$ cap cleandb

What Next?

下一步

Congratulations! Now that you have successfully installed Hypertable, we recommend thatyou walk through theHQL Tutorial to get familiar with using the system

恭喜！现在你已经成功安装了Hypertable，我们建议你继续HQL之旅。