目录
4 Prepare to Start the Hadoop Cluster
6 Pseudo-Distributed Operation
1 Purpose 目的
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
本文档介绍如何设置和配置单节点Hadoop安装,以便您可以使用Hadoop MapReduce和Hadoop分布式文件系统(HDFS)快速执行简单操作。
Important: all production Hadoop clusters use Kerberos to authenticate callers and secure access to HDFS data as well as restriction access to computation services (YARN etc.).
重要提示:所有生产Hadoop集群都使用Hadoop来验证调用者和保护对HDFS数据的访问,以及限制对计算服务(YARN等)的访问。
These instructions do not cover integration with any Kerberos services, -everyone bringing up a production cluster should include connecting to their organisation’s Kerberos infrastructure as a key part of the deployment.
这些说明不包括与任何服务的集成,每个提出生产集群的人都应该包括连接到其组织的服务基础设施作为部署的关键部分。
See Security for details on how to secure a cluster.
有关如何保护群集的详细信息,请参阅安全性。
2 Prerequisites 先决条件
Supported Platforms
支持的平台
- GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
GNU/Linux作为开发和生产平台受到支持。Hadoop已经在具有2000个节点的GNU/Linux集群上进行了演示。
Required Software
所需软件
Required software for Linux include:
Linux所需的软件包括:
-
Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
必须安装Java™。推荐的Java版本在Hadoop JavaVersions中描述。 -
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. Additionally, it is recommmended that pdsh also be installed for better ssh resource management.
如果要使用可选的启动和停止脚本,则必须安装ssh并运行sshd才能使用管理远程Hadoop守护进程的Hadoop脚本。此外,还建议安装pdsh,以便更好地管理ssh资源。
Installing Software
安装软件
If your cluster doesn’t have the requisite software you will need to install it.
如果您的集群没有必要的软件,您需要安装它。
For example on Ubuntu Linux:
例如在Ubuntu Linux上:
$ sudo apt-get install ssh $ sudo apt-get install pdsh
3 Download 下载
To get a Hadoop distribution, download a