hadoop环境搭建官方文档（一）伪分布式搭建

最新推荐文章于 2022-11-24 15:02:18 发布

盖世英雄来了

最新推荐文章于 2022-11-24 15:02:18 发布

阅读量341

点赞数

分类专栏：大数据搭建文章标签： hadoop环境搭建伪分布式官方文档配置

本文链接：https://blog.csdn.net/qq_40513633/article/details/99174471

版权

大数据搭建专栏收录该内容

8 篇文章 0 订阅

订阅专栏

首先学习hadoop集群搭建，最好的方法还是去多看官方文档里面包含各种环境搭建的过程和原因。

官方文档：https://hadoop.apache.org/docs/r2.5.2/

下载hadoop2.5.0 ：http://archive.apache.org/dist/hadoop/common/

（这里有hadoop历史上的所有的版本，不过2.5.0版本是相对很稳定的版本，比较适合学习）

首先说明，hadoop环境搭建共分为四种方式：（安装hadoop之前需要安装jdk并且配置环境变量）

一.本地模式

Local (Standalone) Mode

本地模式不需要对hadoop进行配置，只要解压就可以了，下面是官方文档对hdfs的一些操作实现。

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

二.伪分布式

Pseudo-Distributed Mode
1.首先配置hdfs文件系统
配置之前先设置三个配置文件中的java_home的环境变量
etc/hadoop/mapred-env.sh
etc/hadoop/yarn-env.sh
etc/hadoop/hadoop-env.sh
1. core-site.xml (注意：localhost改成你的主机名)

配置namenode的主机名和端口号

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0/data/tmp</value>
</property>

2.配置hdfs-site.xml（设置文件备份，伪分布式部署在一台机器上，设置为1）

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

3.对应文档进行格式化文件系统

Format the filesystem:格式化文件系统（以后不要轻易再格式化了，容易报错，第二次格式化之前，清空日志信息，和配置的临时目录）
```
  $ bin/hdfs namenode -format
```
Start NameNode daemon and DataNode daemon:（启动hdfs）
```
  $ sbin/start-dfs.sh
```
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:（查看网址50070端口号，localhost修改为你对应的主机IP地址）
- NameNode - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:（创建hdfs文件，在网页上查看）
```
  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>
```
Copy the input files into the distributed filesystem:
```
  $ bin/hdfs dfs -put etc/hadoop input
```

Run some of the examples provided:

  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar grep input output 'dfs[a-z.]+'

Examine the output files:（上传文件到hdfs）
Copy the output files from the distributed filesystem to the local filesystem and examine them:
```
  $ bin/hdfs dfs -get output output
  $ cat output/*
```
or

View the output files on the distributed filesystem:（显示output文件内容）
```
  $ bin/hdfs dfs -cat output/*
```
When you're done, stop the daemons with:（停止hdfs）
$ sbin/stop-dfs.sh

3.配置yarn的环境变量 yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata-senior01</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640800</value>
</property>

4.配置mapred-site.xml (是mapred-site.xml.template重新命名来的)

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory,address</name>
<value>bigdata-senior01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata-senior01:19888</value>
</property>

</configuration>

sbin/start-yarn.sh

查看8088端口号