hadoop 伪分布式(一台)集群搭建
1.安装jdk 1.7以上
–2.安装hadoop 2.8.5
–3.配置/etc/profile,添加
JAVA_HOME=/opt/module/jdk1.8.0_221
HADOOP_HOME=/opt/module/hadoop-2.8.5
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export PATH
#使配置文件生效
source /etc/profile
–4.配置 hostnae
vi /etc/hosts
192.168.228.128 bigdata
192.168.228.129 bigdata02
192.168.228.130 bigdata03
vi /etc/sysconfig/network
HOSTNAME=bigdata
vi /etc/hostname
bigdata
–5.关防火墙
#--关闭防火墙:
systemctl stop firewalld.service
#--禁用防火墙:
systemctl disable firewalld.service
#--查看防火墙:
systemctl status firewalld.service
#--永久关闭 Selinux:
vi /etc/selinux /config将 SELINUX=enforcing 改为 SELINUX=disabled
#或者:
sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
#--临时关闭
setenforce 0
–6.配置静态IP
[root@bigdata hadoop]# cat /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
#设置成静态分配IP
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
#UUID="b0c93b25-6ab9-44a6-8f42-825e018cd065"
DEVICE="ens33"
#yes表示开机启动网卡
ONBOOT="yes"
#以下为手动添加内容
IPADDR0=192.168.228.128
PREFIXO0=24
GATEWAY0=192.168.228.1
DNS1=8.8.8.8
DNS2=8.8.4.4
7.配置ssh互信(如有多台服务器,每台执行)
ssh-keygen -t rsa +三个回车键
#如:
ssh-copy-id bigdata
#ssh-copy-id bigdata02
#ssh-copy-id bigdata03
–8.创建 itstar 用户,目录
adduser itstar
passwd itstar
mkdir -p /opt/module/hadoop-2.8.5/data/tmp
mkdir /opt/module/hadoop-2.8.5/logs
–设置 itstar 用户具有 root 权限
vi /etc/sudoers 92 行 找到 root ALL=(ALL) ALL复制一行:
itstar ALL=(ALL)
–9.配置hadoop,路径:hadoop安装目录/etc/hadoop,在以下文件之间位置添加如下内容
集群规划:
--------+-----------------------+----------------+-------------
+ bigdata + bigdata02 + bigdata02
--------+-----------------------+----------------+--------------
HDFS + NameNode + +
+ SecondaryNameNode + DataNode + DataNode
+ DataNode + +
--------+-----------------------+----------------+--------------
YARN + ResourceManager + +
+ NodeManager + NodeManager + NodeManager
--------+-----------------------+----------------+---------------
–1)core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.8.5/data/tmp</value>
</property>
注:/opt/module/hadoop-2.8.5/data/tmp目录如果没有需手工创建
bigdata此处表示主机名,以下文件中同义
–2)hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata:50090</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
–3)yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
–4)mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata:19888</value>
</property>
–5)在hadoop-env.sh、yarn-env.sh、mapred-env.sh末尾添加JAVA_HOME配置
export JAVA_HOME=/opt/module/jdk1.8.0_221
–6)修改slaves,将localhost改为相应的’hostname’
[root@bigdata hadoop]# cat slaves
bigdata
–10.格式化hadoop运行临时数据目录
hdfs namenode -format
–11.运行hadoop
start-all.sh
注:如果以上文件配置过程中出现问题,10、11两步均需重新操作(需要将手动创建的目录下清空在进行格式化)
查看启动进程(不算jps有5个进程) jps
[root@bigdata hadoop]# jps
29504 ResourceManager
29348 SecondaryNameNode
29609 NodeManager
29210 DataNode
29115 NameNode
29964 Jps
如需关闭,执行
stop-all.sh
–12.查看效果
浏览器浏览http://IP:50070,此处为 http://192.168.228.128:50070
–13.上传文件
#关闭安全模式,否则无法上传文件
hadoop dfsadmin -safemode leave
hadoop fs -put slaves /
–表示将slaves 文件上传至/目录下
查看已上传文件:网页浏览:Utilities/Browse the file system
**
由伪分布式(一台服务器)调整成完全分布式集群(最少3台)
**
先将bigdata克隆成bigdata02,bigdata03
–1.注意SSH配置三台互信
–2.修改slaves里边配置成
bigdata
bigdata02
bigdata03
–3.删除data\logs目录下的内容
rm -rf data/*
rm -rf logs/*
–4.格式化hadoop运行临时数据目录
hdfs namenode -format
–5.启动,只在节点1-bigdata执行
start-all.sh
注:如果NameNode和ResourceManager不在同一台服务器上,需要在各自所在的服务器分开使用启动命令start-dfs.sh 、start-yarn.sh
hadoop使用
-----------------hadoop使用-------------------------
#关闭安全模式,否则无法上传文件
hadoop dfsadmin -safemode leave
–1.上传文件
#将test01上传至HDFS的/目录下
hadoop fs -put test01 /
–2.查看上传文件内容
hadoop fs -cat /test01
–3.删除已上传文件
hadoop fs -rm /test01
–4.hadoop的第一个“hello word”–wordcount
[root@bigdata data]# vi test02
w h
w h x
linux
apache
mysql
mysql
hive
Pig
calary
kafka
kafka
kafka
kafka
kafka
nux
apache
mysql
mysql
hive
Pig
calary
w h
w h x
linux
apache
mysql
mysql
hive
Pig
calary
#上传文件
hadoop fs -put test02 /
#查看已上传文件
#使用mapreduce-wordcount进行word统计,统计结果输出到test02_
cd /opt/module/hadoop-2.8.5/share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_
#计算过程如下:
[root@bigdata mapreduce]# hadoop jar ./hhadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_
JAR does not exist or is not a normal file: /opt/module/hadoop-2.8.5/share/hadoop/mapreduce/hhadoop-mapreduce-examples-2.8.5.jar
[root@bigdata mapreduce]# hadoop jar ./hadoop-mapreduce-examples-2.8.5.jar wordcount /test02 /test02_
19/09/15 23:11:02 INFO client.RMProxy: Connecting to ResourceManager at bigdata/192.168.228.128:8032
19/09/15 23:11:04 INFO input.FileInputFormat: Total input files to process : 1
19/09/15 23:11:04 INFO mapreduce.JobSubmitter: number of splits:1
19/09/15 23:11:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568594271375_0001
19/09/15 23:11:06 INFO impl.YarnClientImpl: Submitted application application_1568594271375_0001
19/09/15 23:11:06 INFO mapreduce.Job: The url to track the job: http://bigdata:8088/proxy/application_1568594271375_0001/
19/09/15 23:11:06 INFO mapreduce.Job: Running job: job_1568594271375_0001
19/09/15 23:11:18 INFO mapreduce.Job: Job job_1568594271375_0001 running in uber mode : false
19/09/15 23:11:18 INFO mapreduce.Job: map 0% reduce 0%
19/09/15 23:11:28 INFO mapreduce.Job: map 100% reduce 0%
19/09/15 23:11:38 INFO mapreduce.Job: map 100% reduce 100%
19/09/15 23:11:39 INFO mapreduce.Job: Job job_1568594271375_0001 completed successfully
19/09/15 23:11:39 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=123
FILE: Number of bytes written=315923
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=266
HDFS: Number of bytes written=73
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6871
Total time spent by all reduces in occupied slots (ms)=8115
Total time spent by all map tasks (ms)=6871
Total time spent by all reduce tasks (ms)=8115
Total vcore-milliseconds taken by all map tasks=6871
Total vcore-milliseconds taken by all reduce tasks=8115
Total megabyte-milliseconds taken by all map tasks=7035904
Total megabyte-milliseconds taken by all reduce tasks=8309760
Map-Reduce Framework
Map input records=34
Map output records=36
Map output bytes=315
Map output materialized bytes=123
Input split bytes=91
Combine input records=36
Combine output records=11
Reduce input groups=11
Reduce shuffle bytes=123
Reduce input records=11
Reduce output records=11
Spilled Records=22
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=204
CPU time spent (ms)=2360
Physical memory (bytes) snapshot=308465664
Virtual memory (bytes) snapshot=4167356416
Total committed heap usage (bytes)=165810176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=175
File Output Format Counters
Bytes Written=73
此时在网页中出现test02_文件夹
#查看HDFS上的文件内容
[root@bigdata data]# hadoop fs -cat /test02_/part-r-00000
Pig 3
apache 3
calary 3
h 4
hive 3
kafka 5
linux 2
mysql 6
nux 1
w 4
x 2