Hadoop2.x伪分布式环境搭建

1、对于Linux系统的目录进行规划

2、上传所需要的安装包

3、解压JDK配置环境变量

$ tar -zxfjdk-7u67-linux-x64.tar.gz -C /opt/modules/

sudo vi /etc/profile 环境变量配置文件(系统文件)

让文件生效:

su - root

source /etc/profile

验证:$ java -version

4、解压安装Hadoop-2.5.0

如果磁盘空间较为紧张可以删除doc目录,doc目录存放的都是官方英文说明文档

$ rm -rf ./doc/

5、设置Java的安装目录

etc/hadoop/hadoop-env.sh

etc/hadoop/yarn-env.sh

etc/hadoop/mapred-env.sh

exportJAVA_HOME=/opt/modules/jdk1.7.0_67

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution,edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation
  export JAVA_HOME=/usr/java/latest
 
  # Assuming your installation directory is /usr/local/hadoop
  export HADOOP_PREFIX=/usr/local/hadoop

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

 

6.修改自定义配置文件

配置core-site.xml

指定namenode主节点所在的位置以及交互端口号

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoop01.com:8020</value>

</property>

更改hadoop.tmp.dir的默认临时目录路径

hadoop.tmp.dir默认的路径在系统根目录下临时文件里/tmp/hadoop-${user.name}

主要存放镜像文件,日志文件,当系统临时文件清空,hadoop会找不到对应文件

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/modules/hadoop-2.5.0/data/tmp</value>

</property>

配置etc/hadoop/slaves

指定datanode从节点所在的位置,slaves文件配置

注意slaves既代表DN又代表NM

hadoop01.com

配置etc/hadoop/hdfs-site.xml

指定副本个数

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

配置默认快大小,实验环境不建议配置

<property>

<name>dfs.blocksize</name>

<value>134217728</value>

</property>

 

7、对于NameNode进行格式化操作

$ bin/hdfs namenode -format

建议:只需要一次格式化,多次格式化会出错

若要再次格式化,或再次格式化出错后,要到hadoop.tmp.dir配置的目录下,删除这个目录下的文件,然后重新格式化

8、启动相关的服务进程

$ sbin/hadoop-daemon.sh startnamenode

$ sbin/hadoop-daemon.sh startdatanode

9、通过web浏览器加上50070端口号访问管理界面

hadoop01.com:50070

10、对于HDFS文件系统进行读写上传下载测试:

$ bin/hdfs dfs -mkdir -p tmp/conf

$ bin/hdfs dfs -putetc/hadoop/core-site.xml /user/frank/tmp/conf

$ bin/hdfs dfs -cat/user/frank/tmp/conf/core-site.xml

$ bin/hdfs dfs -get/user/frank/tmp/conf/core-site.ml /home/frank/bf-site.xml

11、报错要首先去看日志文件的报错信息

hadoop-2.5.0/logs/查看具体的日志文件查看以.log结尾的文件

12、配置yarn

etc/hadoop/yarn-site.xml

reduce获取数据的方式

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

指定ResourceManager的位置

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoop01.com</value>

</property>

14、配置etc/hadoop/mapred-site.xml

指定MapReduce运行在YARN上

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

The runtime framework for executing MapReduce jobs. Canbe one of local, classic or yarn.

 

启动yarn

yarn sbin/yarn-daemon.sh startresourcemanager

sbin/yarn-daemon.sh startnodemanager

 

MapReduce程序打成jar包运行在YARN上

$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jarwordcount /user/frank/mapreduce/wordcount/input/user/frank/mapreduce/wordcount/output2

 

查看结果

bin/hdfs dfs -cat /user/frank/mapreduce/wordcount/output1/part*

常用的查看命令是 -text

bin/hdfs dfs -cat /user/frank/mapreduce/wordcount/output1/part*

每次运行的输出路径不能相同,当要再次运行时,需要指定新的输出路径,否则会报错

 

日志聚集

配置文件 etc/hadoop/yarn-site.xml

指定是否开启日志聚集功能

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

设置日志在HDFS上保留的时间期限,通常保存7天

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>106800</value>

</property>

配置完成之后,需要重启YARN的服务进程,然后重新跑一次任务就可以查看过往的日志信息,historyserver同样也要重启

sbin/yarn-daemon.sh stop resourcemanager

sbin/yarn-daemon.sh stop nodemanager

sbin/mr-jobhistory-daemon.sh stop historyserver

 

HDFS文件权限检测 -------

设置不启用HDFS文件系统的权限检查,修改hdfs-site.xml

<property>

<name>dfs.permissions.enabled</name>

<value>false</value>

</property>

配置之后需要重启HDFS相关的进程

 

指定修改Hadoop静态用户名,修改core-site.xml

dr.who 默认用户名 静态用户

<property>

<name>hadoop.http.staticuser.user</name>

<value>frank</value>

</property>

笔记:

[创建文件夹]
sudo mkdir /data
cd /
ls -all
[改变拥有者]
sudo chown neworigin:neworigin /data/

[拷贝]
cp /mnt/hgfs/BigData/第四天/jdk-8u121-linux-x64.tar.gz /data/
[解压]
tar -xzvf jdk-8u121-linux-x64.tar.gz

[查看路径]
pwd

[/etc/environment]

>sudo nano /etc/environment
JAVA_HOME=/data/jdk1.8.0_121
PATH="$PATH:/data/jdk1.8.0_121/bin"

>source /etc/environment

[查看环境]
>java -version


hadoop配置
[拷贝]
cp /mnt/hgfs/BigData/第四天/hadoop-2.7.0.tar.gz /data/
[解压]
tar -xzvf hadoop-2.7.0.tar.gz
[etc/environment]
HADOOP_HOME=/data/hadoop-2.7.0
PATH=$PATH:/data/hadoop-2.7.0/bin:/data/hadoop-2.7.0/sbin

[测试]
>hadoop version

[配置文件]
>cd /data/hadoop-2.7.0/etc/hadoop


配置文件
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<?xml version="1.0"?>
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>


Important Hadoop Daemon Properties

Example 10-1. A typical core-site.xml configuration file
<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
<property>

<name>fs.defaultFS</name>
<value>hdfs://namenode/</value>
</property>
</configuration>
Example 10-2. A typical hdfs-site.xml configuration file
<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/disk1/hdfs/name,/remote/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/disk1/hdfs/data,/disk2/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/disk1/hdfs/namesecondary,/disk2/hdfs/namesecondary</value>
</property>
</configuration>
Example 10-3. A typical yarn-site.xml configuration file
<?xml version="1.0"?>
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/disk1/nm-local-dir,/disk2/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>16</value>
</property>
</configuration>

笔记:

[集群搭建]


3台虚拟机(s100, s101, s102)
s100 --   master
s101 --   slave1
s102 --   slave1

[配置网络]
/etc/network/interfaces

[配置网络映射]
/etc/hosts

[更改主机名称]
/etc/hostname

------------------
[创建目录]
>sudo mkdir /data    (s100, s101, s102)
>sudo chown neworigin:neworigin /data




[无密登录ssh]

[s100, s101, s102]
>sudo apt-get install ssh //安装
>rm -rf ~/.ssh

[s100]
>ssh-keygen -t rsa -f ~/.ssh/id_rsa //在s100(生产公钥)
>cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

>ssh-copy-id s101 //把s100的公钥拷贝到s101
>ssh-copy-id s102 //把s100的公钥拷贝到s102

>ssh localhost
>exit
>ssh s101
>exit
>ssh s102
>exit


[s100]
cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ neworigin@s100


[s101] cat ~/.ssh/authorized_keysssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ neworigin@s100

[s102]cat ~/.ssh/authorized_keysssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ neworigin@s100



[多台主机执行相同命令]
[/usr/local/bin/]
>sudo nano xcall
#!/bin/bash
#获取参数个数


pcount=$#
if((pcount<1));then
echo no args;
exit;
fi


for((host=100;host<103;host=host+1));do
echo ------------s$host-----------------
ssh s$host $@
done





[发送文件]
[scp]
>cp -r /home/neworigin/Desktop/1.txt neworigin@s101:/home/neworigin/Desktop/

[rsync]
远程同步工具,主要用于备份和镜像;支持链接,设备等等;速度快,避免复制相同内容的文件数据;不支持两个远程主机间的复制
>rsync -rvl /home/neworigin/Desktop/1.txt neworigin@s101:/home/neworigin/Desktop/

#!/bin/bash
pcount=$#
if((pcount<1));then
echo no args
exit
fi


p1=$1
fname=`basename $p1`
#echo $fname


pdir=`cd -P $(dirname $p1);pwd`
#echo $pdir


cuser=`whoami`
for((host=101;host<103;host=host+1));do
echo -------------s$host---------------
rsync -rvl $pdir/$fname $cuser@s$host:$pdir
done

[免密码]
>sudo passwd
>su root
>sudo nano /etc/sudoers
neworigin       ALL=(ALL:ALL)   NOPASSWD:ALL

[安装jdk]
>xsync /data/jdk/

[/etc/environment]
JAVA_HOME=/data/jdk1.8.0_121
PATH="$PATH:/data/jdk1.8.0_121/bin"


[复制]
>cd /data/hadoop-2.7.0/etc
>cp -rf hadoop/ hadoop_tmp

[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://s100/</value>    
</property>
</configuration>

[hdfs-site.xml]
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/hadoop/hdfs/namesecondary</value>
</property>
</configuration>

[s100, s101, s102]
>sudo mkdir -p /hadoop/hdfs/name
>sudo mkdir -p /hadoop/hdfs/data
>sudo mkdir -p /hadoop/hdfs/namesecondary
 
[yarn-site.xml]
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s100</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
</configuration>

[slaves]
s100
s101
s102

[发送]
>xsync /data/hadoop-2.7.0

[配置环境s101, s102]
HADOOP_HOME=/data/hadoop-2.7.0
PATH="$PATH:/data/hadoop-2.7.0/bin:/data/hadoop-2.7.0/sbin"

[修改目录s100, s101, s102]
>sudo chown neworigin:neworigin /hadoop -R 
>sudo chmod 777 /hadoop -R

[启动s100]
>hdfs namenode -format
>start-all.sh //启动

>stop-all.sh //暂停

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值