hadoop3.x教程_Hadoop安装教程（Hadoop 2.x）

最新推荐文章于 2023-08-10 09:15:22 发布

cuma2369

最新推荐文章于 2023-08-10 09:15:22 发布

阅读量455

点赞数

文章标签：大数据 hadoop linux java python

原文链接：https://www.systutorials.com/hadoop-installation-tutorial-hadoop-2-x/

版权

hadoop3.x教程

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for processing and generating large data sets. HDFS is Hadoop’s underlying data persistency layer, loosely modeled after the Google file system (GFS). Many cloud computing services, such as Amazon EC2, provide MapReduce functions. Although MapReduce has its limitations, it is an important framework to process large data sets.

Hadoop 2或YARN是Hadoop的新版本。除了HDFS和MapReduce组件外，它还添加了纱线资源管理器。 Hadoop MapReduce是用于编写应用程序的编程模型和软件框架，它是MapReduce的开源变体，最初由Google设计和实现，用于处理和生成大数据集。 HDFS是Hadoop的基础数据持久性层，以Google文件系统（GFS）粗略地建模。许多云计算服务（例如Amazon EC2）都提供MapReduce功能。尽管MapReduce有其局限性，但它是处理大型数据集的重要框架。

How to set up a Hadoop 2.x (YARN) environment in a cluster is introduced in this tutorial. In this tutorial, we set up a Hadoop (YARN) cluster, one node runs as the NameNode and the ResourceManager and many other nodes runs as the NodeManager and DataNode (slaves). If you are not familiar with these names, please take a look at YARN architecture first.

本教程介绍了如何在集群中设置Hadoop 2.x（YARN）环境。在本教程中，我们建立了一个Hadoop（YARN）集群，一个节点作为NameNode和ResourceManager运行，其他许多节点作为NodeManager和DataNode（从属）运行。如果您不熟悉这些名称，请先看一下YARN架构。

First we assume that you have created a Linux user "hadoop" on each nodes that you will use for running Hadoop and the "hadoop" user’s home directory is /home/hadoop/.

首先，我们假设您已经在将用于运行Hadoop的每个节点上创建了一个Linux用户“ hadoop”，并且“ hadoop”用户的主目录是/home/hadoop/ 。

配置主机名 (Configure hostnames)

Hadoop uses hostnames to identify nodes by default. So you should first give each host a name and allow other nodes to find the IP of the hosts with names. The most simple way may be adding the host to IP mappings in every nodes’ /etc/hosts file. For larger cluster, you may use a DNS service. Here, we use 3 nodes as the example and these lines are added to the /etc/hosts file on each node:

Hadoop默认使用主机名来标识节点。因此，您应该首先给每个主机一个名称，并允许其他节点查找带有名称的主机的IP。最简单的方法可能是将主机添加到每个节点的/ etc / hosts文件中的IP映射中。对于较大的群集，您可以使用DNS服务。在这里，我们以3个节点为例，并将这些行添加到每个节点上的/ etc / hosts文件中：

10.0.3.29   hofstadter
10.0.3.30   snell
10.0.3.31   biot

使“ hadoop”用户能够以无密码的方式从SSH登录到从站 (Enable "hadoop" user to password-less SSH login to slaves)

Just for our convenience, make sure the "hadoop" user from the NameNode and ResourceManager can ssh to the slaves without password so that we need not to input the password every time.

为了方便起见，请确保NameNode和ResourceManager中的“ hadoop”用户可以不使用密码SSH到从属服务器，这样我们就不必每次都输入密码。

Details about password-less SSH login can be found in Enabling Password-less ssh Login.

有关无密码SSH登录的详细信息，请参见“ 启用无密码ssh登录” 。

安装Hadoop所需的软件 (Install software needed by Hadoop)

The software needed to install Hadoop is Java (we use JDK here) besides of Hadoop itself.

除了Hadoop本身之外，安装Hadoop所需的软件是Java（此处使用JDK ）。

Java JDK (Java JDK)

Oracle Java JDK can be downloaded from JDK’s webpage. You need to install (actually just copy the JDK directory) Java JDK on all nodes of the Hadoop cluster.

可以从JDK的网页下载Oracle Java JDK。您需要在Hadoop集群的所有节点上安装（实际上只是复制JDK目录）Java JDK。

As an example in this tutorial, the JDK is installed into

作为本教程的示例，JDK已安装到

/usr/java/default/

You may need to make soft link to /usr/java/default from the actual location where you installed JDK.

您可能需要从安装JDK的实际位置建立到/ usr / java / default的软链接。

Add these 2 lines to the "hadoop" user’s ~/.bashrc on all nodes:

将这两行添加到所有节点上的“ hadoop”用户的〜/ .bashrc中：

export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH

Hadoop的 (Hadoop)

Hadoop software can be downloaded from Hadoop website. In this tutorial, we use Hadoop 2.5.0.

可以从Hadoop网站下载Hadoop软件。在本教程中，我们使用Hadoop 2.5.0。

You can unpack the tar ball to a directory. In this example, we unpack it to

您可以将tar球解压到目录中。在此示例中，我们将其解压缩到

/home/hadoop/hadoop/

which is a directory under the hadoop Linux user’s home directory.

这是hadoop Linux用户主目录下的目录。

The Hadoop directory need to be duplicated to all nodes after configuration. Remember to do it after the configuration.

配置后，需要将Hadoop目录复制到所有节点。请记住在配置后执行此操作。

为“ hadoop”用户配置环境变量 (Configure environment variables for the "hadoop" user)

We assume the "hadoop" user uses bash as its shell.

我们假设“ hadoop”用户使用bash作为其外壳。

Add these lines at the bottom of ~/.bashrc on all nodes:

将这些行添加到所有节点上〜/ .bashrc的底部：

export HADOOP_COMMON_HOME=$HOME/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
export HADOOP_HDFS_HOME=$HADOOP_COMMON_HOME
export YARN_HOME=$HADOOP_COMMON_HOME
export PATH=$PATH:$HADOOP_COMMON_HOME/bin
export PATH=$PATH:$HADOOP_COMMON_HOME/sbin

The last 2 lines add hadoop’s bin directory to the PATH so that we can directly run hadoop’s commands without specifying the full path to it.

最后两行将hadoop的bin目录添加到PATH，以便我们可以直接运行hadoop的命令而无需指定其完整路径。

配置Hadoop (Configure Hadoop)

The configuration files for Hadoop is under $HADOOP_COMMON_HOME/etc/hadoop for our installation here. Here the content is added to the .xml files between <configuration> and </configuration>.

Hadoop的配置文件位于$ HADOOP_COMMON_HOME / etc / hadoop下，可在此处进行安装。这里，内容被添加到<configuration>和</configuration>之间的.xml文件中。

core-site.xml (core-site.xml)

Here the NameNode runs on biot.

这里的NameNode在biot上biot 。

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://biot/</value>
    <description>NameNode URI</description>
  </property>

yarn-site.xml (yarn-site.xml)

The YARN ResourceManager runs on biot and supports MapReduce shuffle.

YARN ResourceManager在biot上运行，并支持MapReduce随机播放。

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>biot</value>
  <description>The hostname of the ResourceManager</description>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
  <description>shuffle service for MapReduce</description>
</property>

hdfs-site.xml (hdfs-site.xml)

The configuration here is optional. Add the following settings if you need them. The descriptions contain the purpose of each configuration.

此处的配置是可选的。如果需要，请添加以下设置。说明包含每种配置的目的。

<property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hadoop/hdfs/</value>
        <description>DataNode directory for storing data chunks.</description>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hadoop/hdfs/</value>
        <description>NameNode directory for namespace and transaction logs storage.</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>3</value>
        <description>Number of replication for each chunk.</description>
    </property>

mapred-site.xml (mapred-site.xml)

First copy mapred-site.xml.template to mapred-site.xml and add the following content.

首先将mapred-site.xml.template复制到mapred-site.xml并添加以下内容。

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>Execution framework.</description>
</property>

奴隶 (slaves)

Delete localhost and add all the names of the TaskTrackers, each in on line. For example:

删除localhost并添加TaskTrackers的所有名称，每个名称都在线。例如：

hofstadter
snell
biot

将Hadoop配置文件复制到所有节点 (Duplicate Hadoop configuration files to all nodes)

We may duplicate the Hadoop directory and configuration files under the etc/hadoop directory to all nodes. You may use this script to duplicate the hadoop directory:

我们可以将etc / hadoop目录下的Hadoop目录和配置文件复制到所有节点。您可以使用以下脚本来复制hadoop目录：

cd
for i in `cat hadoop/etc/hadoop/slaves`; do 
  echo $i; rsync -avxP --exclude=logs hadoop/ $i:hadoop/; 
done

By now, we have finished copying Hadoop software and configuring the Hadoop. Now let’s have some fun with Hadoop.

至此，我们已经完成了Hadoop软件的复制和配置。现在，让我们玩一下Hadoop。

格式化新的分布式文件系统 (Format a new distributed filesystem)

Format a new distributed file system by

通过以下方式格式化新的分布式文件系统

hdfs namenode -format

启动HDFS / YARN (Start HDFS/YARN)

手动地 (Manually)

Start the HDFS with the following command, run on the designated NameNode:

使用以下命令启动HDFS，在指定的NameNode上运行：

hadoop-daemon.sh --script hdfs start namenode

Run the script to start DataNodes on each slave:

运行脚本以在每个从属服务器上启动DataNode：

hadoop-daemon.sh --script hdfs start datanode

Start the YARN with the following command, run on the designated ResourceManager:

使用以下命令启动YARN，在指定的ResourceManager上运行：

yarn-daemon.sh start resourcemanager

Run a script to start NodeManagers on all slaves:

运行脚本以在所有从属服务器上启动NodeManagers：

yarn-daemon.sh start nodemanager

通过脚本 (By scripts)

start-dfs.sh
start-yarn.sh

检查状态 (Check status)

检查HDFS状态 (Check HDFS status)

hdfs dfsadmin -report

It should show a report like:

它应该显示如下报告：

Configured Capacity: 158550355968 (147.66 GB)
Present Capacity: 11206017024 (10.44 GB)
DFS Remaining: 11205943296 (10.44 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 10.0.3.30:50010 (snell)
Hostname: snell
Decommission Status : Normal
Configured Capacity: 52850118656 (49.22 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 49800732672 (46.38 GB)
DFS Remaining: 3049361408 (2.84 GB)
DFS Used%: 0.00%
DFS Remaining%: 5.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 05 07:50:17 GMT 2014

Name: 10.0.3.31:50010 (biot)
Hostname: biot
Decommission Status : Normal
Configured Capacity: 52850118656 (49.22 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 49513574400 (46.11 GB)
DFS Remaining: 3336519680 (3.11 GB)
DFS Used%: 0.00%
DFS Remaining%: 6.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 05 07:50:17 GMT 2014

Name: 10.0.3.29:50010 (hofstadter)
Hostname: hofstadter
Decommission Status : Normal
Configured Capacity: 52850118656 (49.22 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 48030068736 (44.73 GB)
DFS Remaining: 4820025344 (4.49 GB)
DFS Used%: 0.00%
DFS Remaining%: 9.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 05 07:50:17 GMT 2014

检查YARN状态 (Check YARN status)

yarn node -list

It should show a report like:

它应该显示如下报告：

Total Nodes:3
         Node-Id         Node-State Node-Http-Address   Number-of-Running-Containers
hofstadter:43469            RUNNING   hofstadter:8042                              0
     snell:57039            RUNNING        snell:8042                              0
      biot:52834            RUNNING         biot:8042                              0

运行基本测试 (Run basic tests)

Let’s play with grep as the basic test of Hadoop YARN/HDFS/MapReduce.

让我们将grep用作Hadoop YARN / HDFS / MapReduce的基本测试。

First, create the directory to store data for the hadoop user.

首先，创建目录以存储hadoop用户的数据。

hadoop fs -mkdir /user
hadoop fs -mkdir /user/hadoop

Then, put the configuration file directory as the input.

然后，将配置文件目录作为输入。

hadoop fs -put /home/hadoop/hadoop/etc/hadoop /user/hadoop/hadoop-config

or simply refer the directory under the hadoop user’s HDFS home (check the the discussion, thanks to Thirumal Venkat for this tip):

或直接参考hadoop用户的HDFS主目录下的目录（请查看讨论，感谢Thirumal Venkat的技巧）：

hadoop fs -put /home/hadoop/hadoop/etc/hadoop hadoop-config

Let’s ls it to check the content:

让我们ls它来检查内容：

hdfs dfs -ls /user/hadoop/hadoop-config

It should print out the results like follows.

它应该打印出如下结果。

Found 26 items
-rw-r--r--   3 hadoop supergroup       3589 2014-09-05 07:53 /user/hadoop/hadoop-config/capacity-scheduler.xml
-rw-r--r--   3 hadoop supergroup       1335 2014-09-05 07:53 /user/hadoop/hadoop-config/configuration.xsl
-rw-r--r--   3 hadoop supergroup        318 2014-09-05 07:53 /user/hadoop/hadoop-config/container-executor.cfg
-rw-r--r--   3 hadoop supergroup        917 2014-09-05 07:53 /user/hadoop/hadoop-config/core-site.xml
-rw-r--r--   3 hadoop supergroup       3589 2014-09-05 07:53 /user/hadoop/hadoop-config/hadoop-env.cmd
-rw-r--r--   3 hadoop supergroup       3443 2014-09-05 07:53 /user/hadoop/hadoop-config/hadoop-env.sh
-rw-r--r--   3 hadoop supergroup       2490 2014-09-05 07:53 /user/hadoop/hadoop-config/hadoop-metrics.properties
-rw-r--r--   3 hadoop supergroup       1774 2014-09-05 07:53 /user/hadoop/hadoop-config/hadoop-metrics2.properties
-rw-r--r--   3 hadoop supergroup       9201 2014-09-05 07:53 /user/hadoop/hadoop-config/hadoop-policy.xml
-rw-r--r--   3 hadoop supergroup        775 2014-09-05 07:53 /user/hadoop/hadoop-config/hdfs-site.xml
-rw-r--r--   3 hadoop supergroup       1449 2014-09-05 07:53 /user/hadoop/hadoop-config/httpfs-env.sh
-rw-r--r--   3 hadoop supergroup       1657 2014-09-05 07:53 /user/hadoop/hadoop-config/httpfs-log4j.properties
-rw-r--r--   3 hadoop supergroup         21 2014-09-05 07:53 /user/hadoop/hadoop-config/httpfs-signature.secret
-rw-r--r--   3 hadoop supergroup        620 2014-09-05 07:53 /user/hadoop/hadoop-config/httpfs-site.xml
-rw-r--r--   3 hadoop supergroup      11118 2014-09-05 07:53 /user/hadoop/hadoop-config/log4j.properties
-rw-r--r--   3 hadoop supergroup        918 2014-09-05 07:53 /user/hadoop/hadoop-config/mapred-env.cmd
-rw-r--r--   3 hadoop supergroup       1383 2014-09-05 07:53 /user/hadoop/hadoop-config/mapred-env.sh
-rw-r--r--   3 hadoop supergroup       4113 2014-09-05 07:53 /user/hadoop/hadoop-config/mapred-queues.xml.template
-rw-r--r--   3 hadoop supergroup        887 2014-09-05 07:53 /user/hadoop/hadoop-config/mapred-site.xml
-rw-r--r--   3 hadoop supergroup        758 2014-09-05 07:53 /user/hadoop/hadoop-config/mapred-site.xml.template
-rw-r--r--   3 hadoop supergroup         22 2014-09-05 07:53 /user/hadoop/hadoop-config/slaves
-rw-r--r--   3 hadoop supergroup       2316 2014-09-05 07:53 /user/hadoop/hadoop-config/ssl-client.xml.example
-rw-r--r--   3 hadoop supergroup       2268 2014-09-05 07:53 /user/hadoop/hadoop-config/ssl-server.xml.example
-rw-r--r--   3 hadoop supergroup       2178 2014-09-05 07:54 /user/hadoop/hadoop-config/yarn-env.cmd
-rw-r--r--   3 hadoop supergroup       4567 2014-09-05 07:54 /user/hadoop/hadoop-config/yarn-env.sh
-rw-r--r--   3 hadoop supergroup       1007 2014-09-05 07:54 /user/hadoop/hadoop-config/yarn-site.xml

Now, let’s run the grep now:

现在，让我们现在运行grep ：

cd
hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/hadoop-config /user/hadoop/output 'dfs[a-z.]+'

It will print status as follows if everything works well.

如果一切正常，它将打印如下状态。

14/09/05 07:54:36 INFO client.RMProxy: Connecting to ResourceManager at biot/10.0.3.31:8032
14/09/05 07:54:37 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/09/05 07:54:37 INFO input.FileInputFormat: Total input paths to process : 26
14/09/05 07:54:37 INFO mapreduce.JobSubmitter: number of splits:26
14/09/05 07:54:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409903409779_0001
14/09/05 07:54:37 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/05 07:54:38 INFO impl.YarnClientImpl: Submitted application application_1409903409779_0001
14/09/05 07:54:38 INFO mapreduce.Job: The url to track the job: http://biot:8088/proxy/application_1409903409779_0001/
14/09/05 07:54:38 INFO mapreduce.Job: Running job: job_1409903409779_0001
14/09/05 07:54:45 INFO mapreduce.Job: Job job_1409903409779_0001 running in uber mode : false
14/09/05 07:54:45 INFO mapreduce.Job:  map 0% reduce 0%
14/09/05 07:54:50 INFO mapreduce.Job:  map 23% reduce 0%
14/09/05 07:54:52 INFO mapreduce.Job:  map 81% reduce 0%
14/09/05 07:54:53 INFO mapreduce.Job:  map 100% reduce 0%
14/09/05 07:54:56 INFO mapreduce.Job:  map 100% reduce 100%
14/09/05 07:54:56 INFO mapreduce.Job: Job job_1409903409779_0001 completed successfully
14/09/05 07:54:56 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=319
        FILE: Number of bytes written=2622017
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=65815
        HDFS: Number of bytes written=405
        HDFS: Number of read operations=81
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=26
        Launched reduce tasks=1
        Data-local map tasks=26
        Total time spent by all maps in occupied slots (ms)=116856
        Total time spent by all reduces in occupied slots (ms)=3000
        Total time spent by all map tasks (ms)=116856
        Total time spent by all reduce tasks (ms)=3000
        Total vcore-seconds taken by all map tasks=116856
        Total vcore-seconds taken by all reduce tasks=3000
        Total megabyte-seconds taken by all map tasks=119660544
        Total megabyte-seconds taken by all reduce tasks=3072000
    Map-Reduce Framework
        Map input records=1624
        Map output records=23
        Map output bytes=566
        Map output materialized bytes=469
        Input split bytes=3102
        Combine input records=23
        Combine output records=12
        Reduce input groups=10
        Reduce shuffle bytes=469
        Reduce input records=12
        Reduce output records=10
        Spilled Records=24
        Shuffled Maps =26
        Failed Shuffles=0
        Merged Map outputs=26
        GC time elapsed (ms)=363
        CPU time spent (ms)=15310
        Physical memory (bytes) snapshot=6807674880
        Virtual memory (bytes) snapshot=32081272832
        Total committed heap usage (bytes)=5426970624
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=62713
    File Output Format Counters
        Bytes Written=405
14/09/05 07:54:56 INFO client.RMProxy: Connecting to ResourceManager at biot/10.0.3.31:8032
14/09/05 07:54:56 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/09/05 07:54:56 INFO input.FileInputFormat: Total input paths to process : 1
14/09/05 07:54:56 INFO mapreduce.JobSubmitter: number of splits:1
14/09/05 07:54:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1409903409779_0002
14/09/05 07:54:56 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/05 07:54:56 INFO impl.YarnClientImpl: Submitted application application_1409903409779_0002
14/09/05 07:54:56 INFO mapreduce.Job: The url to track the job: http://biot:8088/proxy/application_1409903409779_0002/
14/09/05 07:54:56 INFO mapreduce.Job: Running job: job_1409903409779_0002
14/09/05 07:55:02 INFO mapreduce.Job: Job job_1409903409779_0002 running in uber mode : false
14/09/05 07:55:02 INFO mapreduce.Job:  map 0% reduce 0%
14/09/05 07:55:07 INFO mapreduce.Job:  map 100% reduce 0%
14/09/05 07:55:12 INFO mapreduce.Job:  map 100% reduce 100%
14/09/05 07:55:13 INFO mapreduce.Job: Job job_1409903409779_0002 completed successfully
14/09/05 07:55:13 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=265
        FILE: Number of bytes written=193601
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=527
        HDFS: Number of bytes written=179
        HDFS: Number of read operations=7
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2616
        Total time spent by all reduces in occupied slots (ms)=2855
        Total time spent by all map tasks (ms)=2616
        Total time spent by all reduce tasks (ms)=2855
        Total vcore-seconds taken by all map tasks=2616
        Total vcore-seconds taken by all reduce tasks=2855
        Total megabyte-seconds taken by all map tasks=2678784
        Total megabyte-seconds taken by all reduce tasks=2923520
    Map-Reduce Framework
        Map input records=10
        Map output records=10
        Map output bytes=239
        Map output materialized bytes=265
        Input split bytes=122
        Combine input records=0
        Combine output records=0
        Reduce input groups=5
        Reduce shuffle bytes=265
        Reduce input records=10
        Reduce output records=10
        Spilled Records=20
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=20
        CPU time spent (ms)=2090
        Physical memory (bytes) snapshot=415334400
        Virtual memory (bytes) snapshot=2382364672
        Total committed heap usage (bytes)=401997824
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=405
    File Output Format Counters
        Bytes Written=179

After the grep execution finishes, we can check the results. We can check the content in the output directory by:

grep执行完成后，我们可以检查结果。我们可以通过以下方式检查输出目录中的内容：

hdfs dfs -ls /user/hadoop/output/

It should print output as follows.

它应该按如下方式打印输出。

Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2014-09-05 07:55 /user/hadoop/output/_SUCCESS
-rw-r--r--   3 hadoop supergroup        179 2014-09-05 07:55 /user/hadoop/output/part-r-00000

part-r-00000 contains the result. Let’s cat it out to check the content.

part-r-00000包含结果。让我们来检查一下内容。

hdfs dfs -cat /user/hadoop/output/part-r-00000

It should print output as follows.

它应该按如下方式打印输出。

6   dfs.audit.logger
4   dfs.class
3   dfs.server.namenode.
2   dfs.period
2   dfs.audit.log.maxfilesize
2   dfs.audit.log.maxbackupindex
1   dfsmetrics.log
1   dfsadmin
1   dfs.servers
1   dfs.file

停止HDFS / YARN (Stop HDFS/YARN)

手动地 (Manually)

On the ResourceManager and NodeManager nodes:

在ResourceManager和NodeManager节点上：

yarn-daemon.sh stop resourcemanager
yarn-daemon.sh stop nodemanager

On the NameNode:

在NameNode上：

hadoop-daemon.sh --script hdfs stop namenode

On each DataNodes:

在每个DataNode上：

hadoop-daemon.sh --script hdfs stop datanode

带脚本 (With scripts)

stop-yarn.sh
stop-dfs.sh

调试信息 (Debug information)

You may check logs for debugging. The logs on each nodes are under:

您可以检查日志以进行调试。每个节点上的日志位于：

hadoop/logs/

You may want to cleanup everything on all nodes. To remove data directories on data node (if you did not set the hdfs-site.xml to choose the directories by yourself). (Be careful, the following scripts will delete everything in /tmp that your current user can delete. You may adapt it if you stores some useful data under /tmp .)

您可能要清除所有节点上的所有内容。删除数据节点上的数据目录（如果未将hdfs-site.xml设置为自行选择目录）。（请注意，以下脚本将删除/ tmp中您当前用户可以删除的所有内容。如果您将一些有用的数据存储在/ tmp下，则可以对其进行修改。）

rm -rf /tmp/* ~/hadoop/logs/*
for i in `cat hadoop/etc/hadoop/slaves`; do 
  echo $i; ssh $i 'rm -rf /tmp/* ~/hadoop/logs/'; 
done

有关HDFS / YARN / Hadoop的更多阅读 (More readings on HDFS/YARN/Hadoop)

Here are links to some good articles related to Hadoop on the Web:

这里是一些与Web上Hadoop相关的好文章的链接：

Default Hadoop configuration values: https://www.systutorials.com/qa/749/hadoop-2-yarn-default-configuration-values

Hadoop的默认配置值： https : //www.systutorials.com/qa/749/hadoop-2-yarn-default-configuration-values

Official cluster setup tutorial: https://hadoop.apache.org/docs/r2.5.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

官方集群设置教程： https : //hadoop.apache.org/docs/r2.5.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

Guides to configure Hadoop:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

配置Hadoop的指南：
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

补充笔记 (Additional notes)

控制容器数 (Control number of containers)

You may want to configure the number of containers managed by YARN on each nodes. You can refer to this example below.

您可能需要在每个节点上配置YARN管理的容器数。您可以在下面参考此示例。

The following lines are added to yarn-site.xml to specify that each node uses 3072MB memory and each container uses at least 1536MB memory. That is, at most 2 containers on each node.

将以下行添加到yarn-site.xml中，以指定每个节点使用3072MB内存，每个容器至少使用1536MB内存。也就是说，每个节点上最多2个容器。

<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>3072</value>
</property>
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>1536</value>
</property>