Hadoop集群搭建

hadoop 伪分布式(一台)集群搭建

1.安装jdk 1.7以上
–2.安装hadoop 2.8.5
–3.配置/etc/profile,添加

JAVA_HOME=/opt/module/jdk1.8.0_221
HADOOP_HOME=/opt/module/hadoop-2.8.5
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export PATH

#使配置文件生效

source /etc/profile

–4.配置 hostnae
vi /etc/hosts

192.168.228.128 bigdata
192.168.228.129 bigdata02
192.168.228.130 bigdata03

vi /etc/sysconfig/network

HOSTNAME=bigdata

vi /etc/hostname

bigdata

–5.关防火墙

#--关闭防火墙:
systemctl stop firewalld.service
#--禁用防火墙:
systemctl disable firewalld.service
#--查看防火墙:
systemctl status firewalld.service
#--永久关闭 Selinux:
vi /etc/selinux /config将 SELINUX=enforcing 改为 SELINUX=disabled
#或者:
sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
#--临时关闭
setenforce 0

–6.配置静态IP

[root@bigdata hadoop]# cat /etc/sysconfig/network-scripts/ifcfg-ens33 
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
#设置成静态分配IP
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
#UUID="b0c93b25-6ab9-44a6-8f42-825e018cd065"
DEVICE="ens33"
#yes表示开机启动网卡
ONBOOT="yes"
#以下为手动添加内容
IPADDR0=192.168.228.128
PREFIXO0=24
GATEWAY0=192.168.228.1
DNS1=8.8.8.8
DNS2=8.8.4.4

7.配置ssh互信(如有多台服务器,每台执行)

ssh-keygen -t rsa +三个回车键
#如:
ssh-copy-id bigdata
#ssh-copy-id bigdata02
#ssh-copy-id bigdata03

–8.创建 itstar 用户,目录

 adduser itstar
 passwd itstar
 mkdir -p /opt/module/hadoop-2.8.5/data/tmp
 mkdir /opt/module/hadoop-2.8.5/logs

–设置 itstar 用户具有 root 权限
vi /etc/sudoers 92 行 找到 root ALL=(ALL) ALL复制一行:

itstar ALL=(ALL) 

–9.配置hadoop,路径:hadoop安装目录/etc/hadoop,在以下文件之间位置添加如下内容
集群规划:

--------+-----------------------+----------------+-------------
		+	bigdata				+	bigdata02    +	bigdata02
--------+-----------------------+----------------+--------------
HDFS  	+	NameNode			+	             +	
		+	SecondaryNameNode	+	DataNode     +	DataNode
		+	DataNode			+	             +	
--------+-----------------------+----------------+--------------
YARN  	+	ResourceManager		+	             +	
		+	NodeManager			+	NodeManager  +	NodeManager
--------+-----------------------+----------------+---------------

–1)core-site.xml

<property>
	<name>fs.defaultFS</name>
	<value>hdfs://bigdata:9000</value>
</property>
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.8.5/data/tmp</value>
</property>

注:/opt/module/hadoop-2.8.5/data/tmp目录如果没有需手工创建
bigdata此处表示主机名,以下文件中同义

–2)hdfs-site.xml

	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>bigdata:50090</value>
	</property>

	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>

–3)yarn-site.xml

<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>bigdata</value>
	</property>
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>604800</value>
	</property>

–4)mapred-site.xml

	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>bigdata:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>bigdata:19888</value>
	</property>

–5)在hadoop-env.sh、yarn-env.sh、mapred-env.sh末尾添加JAVA_HOME配置

export JAVA_HOME=/opt/module/jdk1.8.0_221

–6)修改slaves,将localhost改为相应的’hostname’

[root@bigdata hadoop]# cat slaves 
bigdata

–10.格式化hadoop运行临时数据目录

hdfs namenode -format

–11.运行hadoop

start-all.sh

注:如果以上文件配置过程中出现问题,10、11两步均需重新操作(需要将手动创建的目录下清空在进行格式化)

查看启动进程(不算jps有5个进程) jps

[root@bigdata hadoop]# jps
29504 ResourceManager
29348 SecondaryNameNode
29609 NodeManager
29210 DataNode
29115 NameNode
29964 Jps

如需关闭,执行

stop-all.sh

–12.查看效果
浏览器浏览http://IP:50070,此处为 http://192.168.228.128:50070

–13.上传文件
#关闭安全模式,否则无法上传文件

hadoop dfsadmin -safemode leave
hadoop fs -put slaves /

–表示将slaves 文件上传至/目录下
查看已上传文件:网页浏览:Utilities/Browse the file system

**

由伪分布式(一台服务器)调整成完全分布式集群(最少3台)

**

先将bigdata克隆成bigdata02,bigdata03

–1.注意SSH配置三台互信
–2.修改slaves里边配置成

bigdata
bigdata02
bigdata03

–3.删除data\logs目录下的内容

rm -rf data/* 
rm -rf logs/*

–4.格式化hadoop运行临时数据目录

hdfs namenode -format

–5.启动,只在节点1-bigdata执行

start-all.sh

注:如果NameNode和ResourceManager不在同一台服务器上,需要在各自所在的服务器分开使用启动命令start-dfs.sh 、start-yarn.sh

hadoop使用

-----------------hadoop使用-------------------------

#关闭安全模式,否则无法上传文件
hadoop dfsadmin -safemode leave

–1.上传文件
#将test01上传至HDFS的/目录下

hadoop fs -put test01 /

–2.查看上传文件内容

hadoop fs -cat /test01

–3.删除已上传文件

hadoop fs -rm /test01

–4.hadoop的第一个“hello word”–wordcount

[root@bigdata data]# vi test02
w h
w h x
linux
apache
mysql
mysql
hive

Pig
calary
kafka

kafka
kafka
kafka
kafka



nux
apache
mysql
mysql
hive
Pig
calary
w h
w h x
linux
apache
mysql
mysql
hive
Pig
calary

#上传文件

hadoop fs -put test02 /

#查看已上传文件
在这里插入图片描述
#使用mapreduce-wordcount进行word统计,统计结果输出到test02_

cd /opt/module/hadoop-2.8.5/share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_ 

#计算过程如下:

[root@bigdata mapreduce]# hadoop jar ./hhadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_ 
JAR does not exist or is not a normal file: /opt/module/hadoop-2.8.5/share/hadoop/mapreduce/hhadoop-mapreduce-examples-2.8.5.jar
[root@bigdata mapreduce]# hadoop jar ./hadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_ 
19/09/15 23:11:02 INFO client.RMProxy: Connecting to ResourceManager at bigdata/192.168.228.128:8032
19/09/15 23:11:04 INFO input.FileInputFormat: Total input files to process : 1
19/09/15 23:11:04 INFO mapreduce.JobSubmitter: number of splits:1
19/09/15 23:11:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568594271375_0001
19/09/15 23:11:06 INFO impl.YarnClientImpl: Submitted application application_1568594271375_0001
19/09/15 23:11:06 INFO mapreduce.Job: The url to track the job: http://bigdata:8088/proxy/application_1568594271375_0001/
19/09/15 23:11:06 INFO mapreduce.Job: Running job: job_1568594271375_0001
19/09/15 23:11:18 INFO mapreduce.Job: Job job_1568594271375_0001 running in uber mode : false
19/09/15 23:11:18 INFO mapreduce.Job:  map 0% reduce 0%
19/09/15 23:11:28 INFO mapreduce.Job:  map 100% reduce 0%
19/09/15 23:11:38 INFO mapreduce.Job:  map 100% reduce 100%
19/09/15 23:11:39 INFO mapreduce.Job: Job job_1568594271375_0001 completed successfully
19/09/15 23:11:39 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=123
                FILE: Number of bytes written=315923
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=266
                HDFS: Number of bytes written=73
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6871
                Total time spent by all reduces in occupied slots (ms)=8115
                Total time spent by all map tasks (ms)=6871
                Total time spent by all reduce tasks (ms)=8115
                Total vcore-milliseconds taken by all map tasks=6871
                Total vcore-milliseconds taken by all reduce tasks=8115
                Total megabyte-milliseconds taken by all map tasks=7035904
                Total megabyte-milliseconds taken by all reduce tasks=8309760
        Map-Reduce Framework
                Map input records=34
                Map output records=36
                Map output bytes=315
                Map output materialized bytes=123
                Input split bytes=91
                Combine input records=36
                Combine output records=11
                Reduce input groups=11
                Reduce shuffle bytes=123
                Reduce input records=11
                Reduce output records=11
                Spilled Records=22
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=204
                CPU time spent (ms)=2360
                Physical memory (bytes) snapshot=308465664
                Virtual memory (bytes) snapshot=4167356416
                Total committed heap usage (bytes)=165810176
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=175
        File Output Format Counters 
                Bytes Written=73

此时在网页中出现test02_文件夹
在这里插入图片描述
在这里插入图片描述
#查看HDFS上的文件内容

[root@bigdata data]# hadoop fs -cat /test02_/part-r-00000
Pig     3
apache  3
calary  3
h       4
hive    3
kafka   5
linux   2
mysql   6
nux     1
w       4
x       2
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值