1.hadoop安装
hadoop2.6.4,一般软件安装都选择次新版
选择dashuju174的机器
su - hadoop,进入hadoop用户的目录,一般默认为/home/hadoop
mkdir application
cd application
上传hadoop-2.6.4.tar.gz,执行命令
tar zxvf hadoop-2.6.4.tar.gz
ln -s hadoop-2.6.4 hadoop // 创建软链接
2.设置环境变量
vi /etc/profile,添加以下内容,注意编辑完后,一定要source /etc/profile
export TERM=xterm
JAVA_HOME=/usr/java/jdk1.8.0_151
export JAVA_HOME
JRE_HOME=/usr/java/jdk1.8.0_151/jre
export JRE_HOME
CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export CLASSPATH
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:.
export PATH
# zoo hadoop hbase
export ZOOKEEPER_HOME=/application/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
export HADOOP_HOME=/application/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HBASE_HOME=/application/hbase
export PATH=$PATH:$HBASE_HOME/bin
export HBASE_LIBRARY_PATH=$HBASE_HOME/lib/native/Linux-amd64-64
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/*
3.hadoop配置
hadoop的配置文件位于${HADOOP_HOME}/etc/hadoop目录下
3.1 core-site.xml
这里提一下检查点,HDFS中的checkpoint( 检查点 )的问题
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://bwsc65:9000</value>
<description>bwsc65</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/appdata/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
<!-- 这里有两个用户可以访问hdfs,root和hadoop用户,如果有其他用户,照着风格配置对应的用户就可以了。 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
<description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>
<!-- 配置支持snappy压缩算法 -->
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<!-- 配置检查点 -->
<property>
<name>fs.checkpoint.period</name>
<value>36000</value>
<description>The number of seconds between two periodic checkpoints.</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired. </description>
</property>
</configuration>
3.2 hadoop-env.sh
注意pid的路径问题,参考无法停止hadoop集群(stop-all.sh)
export JAVA_HOME=/usr/java/jdk1.7.0_79 //可通过which java找到目录位置,只需写到Java的安装位置即可。注意hadoop无法使用linux预装的OpenJDK
export HADOOP_SSH_OPTS="-p 2222" // 此项如果ssh的端口为默认端口22,则不需要配置。
export HADOOP_PID_DIR=$HADOOP_HOME/pids
3.3 hdfs-site.xml
dfs.webhdfs.enabled配置为true的原因,再往后如果使用apache hue的时候,就非常依赖此配置。
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/platform/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/platform/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
详细的配置信息如下.
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/appdata/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/appdata/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.http.address</name>
<value>bwsc65:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bwsc66:50090</value>
</property>
<!-- impala -->
<!-- <property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
<value>10000</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property> -->
<!-- 20180301优化 -->
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>600000</value>
</property>
<property>
<name>dfs.client.socket-timeout</name>
<value>300000</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>10737418240</value>
</property>
<property>
<name>dfs.namenode.fs-limits.max-directory-items</name>
<value>6400000</value>
</property>
</configuration>
从上面的配置可以看到,第二名称节点配置的是bwsc66这台机器,三台hadoop节点配置都一样的。
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bwsc66:50090</value>
</property>
3.4 mapred-site.xml
如果没有mapred-site.xml,则执行命令 cp mapred-site.xml.template mapred-site.xml
,添加了snappy压缩配置。
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>bwsc65:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bwsc65:19888</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
3.5 slaves
hadoop集群中有两种节点namenode和datanode,slaves是配置datanode。
dashuju172
dashuju173
3.6 yarn-site.xml
Hadoop 2.x常用端口及查看方法,从这个表格了解到yarn.nodemanager.webapp.address
的端口是8042,而yarn.resourcemanager.webapp.address
端口为8088,可以看到这端口之间的关联关系。既然yarn有约定,那么yarn-site.xml
还需要配置吗?
下面的配置都将yarn指向了65那台机器,这台机器是namenode,namenode调度是datanode数据的操作运行,而yarn调度是mapreduce的计算资源,那么他们之间部署就应该在同一机器上吗?yarn既然是对集群中各类资源进行抽象,并根据各种应用程序或者服务的要求,按照一定的调度策略,分配资源。那么yarn作为一个中心节点自然有必要,总比每台机器都是yarn节点,各自为政,如何合理、平衡的调度呢?
Hadoop集群管理中也可以清晰看出两者关系,ResourceManager管理NodeManager。namenode有SecondaryNameNode来辅助,那么如果ResourceManager挂掉了,会怎么样呢?当然也无怪乎hadoop采用的是master/slaves的架构。
每个配置说明参考yarn-site.xml相关配置参数
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bwsc65</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>bwsc65:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>bwsc65:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>bwsc65:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>bwsc65:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>bwsc65:8088</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>bwsc65:8888</value>
</property>
<!-- 20180301优化 -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
</property>
<property>
<name>yarn.resourcemanager.zk-timeout-ms</name>
<value>120000</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>6</value>
</property>
<property>
<name>yarn.nodemanager.container-monitor.interval-ms</name>
<value>300000</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
</configuration>
4. 172、173hadoop配置
配置172、173的hadoop,则在174上执行命令,将文件同步过去
scp -P 2222 -r /home/hadoop/application/hadoop-2.6.4 hadoop@dashuju172:/home/hadoop/application/
scp -P 2222 -r /home/hadoop/application/hadoop-2.6.4 hadoop@dashuju173:/home/hadoop/application/
通过过去后,在172、173上执行命令ln -s hadoop-2.6.4 hadoop创建软链接
vi /etc/profile配置172、173的环境变量,注意source进行生效处理
export HADOOP_HOME=/home/hadoop/application/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6 启动hadoop
先创建hdfs文件系统分区,注意命令是hadoop用户下namenode节点执行。
hdfs namenode -format # 这点跟windows是一样的,重新分区后的卷总是需要格式化,所以系统分布创建好后,不要随便格式化
没有执行上面的语句,就会提示
2019-07-19 14:12:57,177 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /applications/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:314)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:202)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
执行完毕后,name这个文件夹就创建好了。
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start proxyserver
mr-jobhistory-daemon.sh start historyserver // 启动作业
登录http://192.168.5.174:50070,查看hadoop的情况
如果启动过程发生错误,可以执行一下命令,重新执行,查看hadoop的日志,hadoop的日志位于${HADOOP_HOME}/logs
export HADOOP_ROOT_LOGGER=DEBUG,console
也可以通过jps检查进程,如果缺少,则需要查看日志,定位问题
针对namenode
针对secondarynamenode
针对datanode
执行jps的时候,有时候会出现下面的问题
xxx – process information unavailable
参考Linux服务器jps报process information unavailable,在/tmp/hsperfdata_*
找到相应的进程,rm掉就可以了
[root@bwsc151 logs]# hdfs dfs -ls /
ls: Call From bwsc151.580kp.com/172.19.123.151 to bwsc151:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
执行命令netstat -tlpn
下面看一个正确的图,9000端口应该归属于namenode进程,在core-site.xml中已经指定了。
查看防火墙状态,防火墙没有起来
[root@bwsc151 logs]# service iptables status
iptables: Firewall is not running.
有人说执行hdfs namenode -format
,这个命令带来的代价是数据全部丢失,那岂不是疯了。
Hadoop namenode重新格式化需注意问题
查看日志,发现执行半个小时namenode报错了,内存不够用了。
2019-09-05 11:23:59,237 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /application/hadoop/dfs/name/current/edits_0000000000041603393-0000000000041603394 of size 42 edits # 2 loaded in 0 seconds
2019-09-05 11:23:59,237 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@35289160 expecting start txid #41603395
2019-09-05 11:23:59,237 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file /application/hadoop/dfs/name/current/edits_0000000000041603395-0000000000041603396
2019-09-05 11:24:07,104 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.closeAllStreams(FSEditLog.java:1544)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:841)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:684)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:597)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:764)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:748)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1441)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1507)
2019-09-05 11:24:07,109 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-09-05 11:24:07,110 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
于是我想到是否手工清理掉内存,参考Centos 内存占满 释放内存,将内存手工清理掉。然后再启动hadoop观察
sync
echo 3 > /proc/sys/vm/drop_caches
free -h