1.什么是Hadoop
1.1 Hadoop历史渊源
Doug Cutting是Apache Lucene创始人, Apache Nutch项目开始于2002年,Apache Nutch是Apache Lucene项目的一部分。2005年Nutch所有主要算法均完成移植,用MapReduce和NDFS来运行。2006年2月,Nutch将MapReduce和NDFS移出Nutch形成Lucene一个子项目,命名Hadoop。
Hadoop不是缩写,而是虚构名。项目创建者Doug Cutting解释Hadoop的得名:“这个名字是我孩子给一个棕黄色的大象玩具命名的。我的命名标准就是简短,容易发音和拼写,没有太多的意义,并且不会被用于别处。小孩子恰恰是这方面的高手。”
1.2 狭义的Hadoop
个人认为,狭义的Hadoop指Apache下Hadoop子项目,该项目由以下模块组成:
- Hadoop Common: 一系列组件和接口,用于分布式文件系统和通用I/O
- Hadoop Distributed File System (HDFS?): 分布式文件系统
- Hadoop YARN: 一个任务调调和资源管理框架
- Hadoop MapReduce: 分布式数据处理编程模型,用于大规模数据集并行运算
狭义的Hadoop主要解决三个问题,提供HDFS解决分布式存储问题,提供YARN解决任务调度和资源管理问题,提供一种编程模型,让开发者可以进来编写代码做离线大数据处理。
1.3 广义的Hadoop
个人认为,广义的Hadoop指整个Hadoop生态圈,生态圈中包含各个子项目,每个子项目为了解决某种场合问题而生,主要组成如下图:
2.Hadoop集群部署两种集群部署方式
2.1 hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式
2.2 仅hadoop2.x支持的active namenode+standby namenode方式
2.3 Hadoop官网关于集群方式介绍
1)单机Hadoop环境搭建
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
2)集群方式
集群方式一(hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式)
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html
集群方式二(仅hadoop2.x支持的active namenode+standby namenode方式,也叫HADOOP HA方式),这种方式又将HDFS的HA和YARN的HA单独分开讲解。
HDFS HA(zookeeper+journalnode)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
HDFS HA(zookeeper+NFS)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailability
YARN HA(zookeeper)http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
生产环境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式,这里我讲解的是hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式,这种方式主要用于学习实践,因为它需要的机器台数低,但存在namenode单节点问题
3.Hadoop安装
3.1 所需软件包
- JavaTM1.7.x,必须安装,建议选择Sun公司发行的Java版本。经验证目前hadoop2.7.1暂不支持jdk1.6,这里用的是jdk1.7,下载地址为:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
- ssh 必须安装并且保证 sshd一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
- hadoop安装包下载地址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
3.2 环境
- 操作系统: Red Hat Enterprise Linux Server release 5.8 (Tikanga)
- 主从服务器:Master 192.168.181.66 Slave1 192.168.88.21 Slave2 192.168.88.22
3.3 SSH免密码登录
首先需要在linux上安装SSH(因为Hadoop要通过SSH链接集群其他机器进行读写操作),请自行安装。Hadoop需要通过SSH登录到各个节点进行操作,我用的是hadoop用户,每台服务器都生成公钥,再合并到authorized_keys。
1.CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置。 修改前:
- #RSAAuthentication yes
- #PubkeyAuthentication yes
修改后(修改后需要执行service sshd restart):
- RSAAuthentication yes
- PubkeyAuthentication yes
后续请参考http://aperise.iteye.com/blog/2253544
3.4 安装JDK
Hadoop2.7需要JDK7,JDK1.6在Hadoop启动时候会报如下错误
- [hadoop@nmsc1 bin]# ./hdfs namenode -format
- Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hdfs/server/namenode/NameNode : Unsupported major.minor version 51.0
- at java.lang.ClassLoader.defineClass1(Native Method)
- at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
- at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
- at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
- at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
- at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
- at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
- at java.security.AccessController.doPrivileged(Native Method)
- at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
- Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode. Program will exit.
1.下载jdk-7u65-linux-x64.gz放置于/opt/java/jdk-7u65-linux-x64.gz.
2.解压,输入命令tar -zxvf jdk-7u65-linux-x64.gz.
3.编辑/etc/profile,在文件末尾追加如下内容
- export JAVA_HOME=/opt/java/jdk1.7.0_65
- export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
- export PATH=$PATH:$JAVA_HOME/bin
4.使配置生效,输入命令,source /etc/profile
5.输入命令java -version,检查JDK环境是否配置成功。
- [hadoop@nmsc2 java]# java -version
- java version "1.7.0_65"
- Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
- Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
- [hadoop@nmsc2 java]#
3.5 安装Hadoop2.7
1.只在master上下载hadoop-2.7.1.tar.gz并放置于/opt/hadoop-2.7.1.tar.gz.
2.解压,输入命令tar -xzvf hadoop-2.7.1.tar.gz.
3.在/home目录下创建数据存放的文件夹,hadoop/tmp、hadoop/hdfs、hadoop/hdfs/data、hadoop/hdfs/name.
4.配置/opt/hadoop-2.7.1/etc/hadoop目录下的core-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <!-- 开启垃圾回收站功能,HDFS文件删除后先进入垃圾回收站,垃圾回收站最长保留数据时间为1天,超过一天后就删除 -->
- <property>
- <name>fs.trash.interval</name>
- <value>1440</value>
- </property>
- <property>
- <!--NameNode的URI。格式:【hdfs://主机名/】-->
- <name>fs.defaultFS</name>
- <value>hdfs://192.168.181.66:9000</value>
- </property>
- <property>
- <!--hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中-->
- <name>hadoop.tmp.dir</name>
- <value>file:/home/hadoop/tmp</value>
- </property>
- <property>
- <!--hadoop访问文件的IO操作都需要通过代码库。因此,在很多情况下,io.file.buffer.size都被用来设置SequenceFile中用到的读/写缓存大小。不论是对硬盘或者是网络操作来讲,较大的缓存都可以提供更高的数据传输,但这也就意味着更大的内存消耗和延迟。这个参数要设置为系统页面大小的倍数,以byte为单位,默认值是4KB,一般情况下,可以设置为64KB(65536byte),这里设置128K-->
- <name>io.file.buffer.size</name>
- <value>131072</value>
- </property>
- <property>
- <name>dfs.namenode.handler.count</name>
- <value>200</value>
- <description>The number of server threads for the namenode.</description>
- </property>
- <property>
- <name>dfs.datanode.handler.count</name>
- <value>100</value>
- <description>The number of server threads for the datanode.</description>
- </property>
- </configuration>
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <!--dfs.namenode.name.dir - 这是NameNode结点存储hadoop文件系统信息的本地系统路径。这个值只对NameNode有效,DataNode并不需要使用到它。-->
- <name>dfs.namenode.name.dir</name>
- <value>file:/home/hadoop/hdfs/name</value>
- </property>
- <property>
- <!--dfs.datanode.data.dir - 这是DataNode结点被指定要存储数据的本地文件系统路径。DataNode结点上的这个路径没有必要完全相同,因为每台机器的环境很可能是不一样的。但如果每台机器上的这个路径都是统一配置的话,会使工作变得简单一些。默认的情况下,它的值-->
- <name>dfs.datanode.data.dir</name>
- <value>file:/home/hadoop/hdfs/data</value>
- </property>
- <property>
- <!--dfs.replication -它决定着系统里面的文件块的数据备份个数。对于一个实际的应用,它应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多的空间)。少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢失)-->
- <name>dfs.replication</name>
- <value>3</value>
- </property>
- <property>
- <!--配置hadoop做实验,在网上看了许多有关hadoop的配置,但是这些配置多数是将namenode和secondaryNameNode配置在同一台计算机上,这种配置方法如果是做做实验的还可以,如果应用到实际中,存在较大风险,如果存放namenode的主机出现问题,整个文件系统将被破坏,严重的情况是所有文件都丢失。配置hadoop2.7将namenode和secondaryNameNode配置在不同的机器上,这样的实用价值更大-->
- <name>dfs.namenode.secondary.http-address</name>
- <value>192.168.181.66:9001</value>
- </property>
- <property>
- <!--namenode的hdfs-site.xml是必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。
- 访问namenode的hdfs使用50070端口,访问datanode的webhdfs使用50075端口。访问文件、文件夹信息使用namenode的IP和50070端口,访问文件内容或者进行打开、上传、修改、下载等操作使用datanode的IP和50075端口。要想不区分端口,直接使用namenode的IP和端口进行所有的webhdfs操作,就需要在所有的datanode上都设置hefs-site.xml中的dfs.webhdfs.enabled为true。-->
- <name>dfs.webhdfs.enabled</name>
- <value>true</value>
- </property>
- <property>
- <name>dfs.datanode.du.reserved</name>
- <value>107374182400</value>
- <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
- </description>
- </property>
- <property>
- <!--这里设置HDFS客户端最大超时时间,尽量改大,后期hbase经常会因为该问题频繁宕机-->
- <name>dfs.client.socket-timeout</name>
- <value>600000/value>
- </property>
- <property>
- <!--这里设置Hadoop允许打开最大文件数,默认4096,不设置的话会提示xcievers exceeded错误-->
- <name>dfs.datanode.max.transfer.threads</name>
- <value>409600</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <!--新框架支持第三方 MapReduce 开发框架以支持如 SmartTalk/DGSG 等非 Yarn 架构,注意通常情况下这个配置的值都设置为 Yarn,如果没有配置这项,那么提交的 Yarn job 只会运行在 locale 模式,而不是分布式模式-->
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <!--Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过下面的命令来启动Hadoop历史服务器
- 参数是在mapred-site.xml文件中进行配置,mapreduce.jobhistory.address和mapreduce.jobhistory.webapp.address默认的值分别是0.0.0.0:10020和0.0.0.0:19888,大家可以根据自己的情况进行相应的配置,参数的格式是host:port。配置完上述的参数之后,重新启动Hadoop jobhistory,这样我们就可以在mapreduce.jobhistory.webapp.address参数配置的主机上对Hadoop历史作业情况经行查看-->
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:10020</value></span>
- </property>
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>192.168.88.21:19888</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <!--Shuffle service 需要加以设置的Map Reduce的应用程序服务。-->
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- <property>
- <!--新框架中 NodeManager 与 RM 通信的接口地址-->
- <name>yarn.resourcemanager.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:8032</value></span>
- </property>
- <property>
- <!--NodeManger 需要知道 RM 主机的 scheduler 调度服务接口地址-->
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:8030</value></span>
- </property>
- <property>
- <!--新框架中 NodeManager 需要向 RM 报告任务运行状态供 Resouce 跟踪,因此 NodeManager 节点主机需要知道 RM 主机的 tracker 接口地址-->
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:8031</value></span>
- </property>
- <property>
- <!--ResourceManager 对管理员暴露的访问地址。管理员通过该地址向RM发送管理命令等。默认值:${yarn.resourcemanager.hostname}:8033-->
- <name>yarn.resourcemanager.admin.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:8033</value></span>
- </property>
- <property>
- <!--参数解释:ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息。默认值:${yarn.resourcemanager.hostname}:8088-->
- <name>yarn.resourcemanager.webapp.address</name>
- <value>192.168.<span style="line-height: 1.5;">181.66</span><span style="font-size: 1em; line-height: 1.5;">:8088</value></span>
- </property>
- <property>
- <!--参数解释:NodeManager总的可用物理内存。注意,该参数是不可修改的,一旦设置,整个运行过程中不 可动态修改。另外,该参数的默认值是8192MB,即使你的机器内存不够8192MB,YARN也会按照这些内存来使用(傻不傻?),因此,这个值通过一 定要配置。不过,Apache已经正在尝试将该参数做成可动态修改的。默认值:8192(8GB)当该值配置小于1024(1GB)时,NM是无法启动的!会报错:
- NodeManager from slavenode2 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.-->
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>2048</value>
- </property>
- </configuration>
- export JAVA_HOME=/opt/java/jdk1.7.0_65
- #localhost
- 192.168.88.22
- 192.168.88.21
- chmod -R 777 /home/hadoop
- chmod -R 777 /opt/hadoop-2.7.1
- scp -r /opt/hadoop-2.7.1 192.168.88.22:/opt/
- scp -r /home/hadoop 192.168.88.22:/home
- scp -r /opt/hadoop-2.7.1 192.168.88.21:/opt/
- scp -r /home/hadoop 192.168.88.21:/home
- [hadoop@nmsc2 bin]# cd /opt/hadoop-2.7.1/bin/
- [hadoop@nmsc2 bin]# ls
- container-executor hadoop hadoop.cmd hdfs hdfs.cmd mapred mapred.cmd rcc test-container-executor yarn yarn.cmd
- [hadoop@nmsc2 bin]# ./hdfs namenode -format
- 15/09/23 16:03:17 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = nmsc2/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 2.7.1
- STARTUP_MSG: classpath = /opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpclient-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsch-0.1.42.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-net-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-client-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-registry-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-api-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.1.jar:/contrib/capacity-scheduler/*.jar
- STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
- STARTUP_MSG: java = 1.7.0_65
- ************************************************************/
- 15/09/23 16:03:17 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
- 15/09/23 16:03:17 INFO namenode.NameNode: createNameNode [-format]
- 15/09/23 16:03:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Formatting using clusterid: CID-695216ce-3c4e-47e4-a31f-24a7e40d8791
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: No KeyProvider found.
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: fsLock is fair:true
- 15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
- 15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Sep 23 16:03:18
- 15/09/23 16:03:18 INFO util.GSet: Computing capacity for map BlocksMap
- 15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
- 15/09/23 16:03:18 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
- 15/09/23 16:03:18 INFO util.GSet: capacity = 2^21 = 2097152 entries
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: defaultReplication = 1
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplication = 512
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: minReplication = 1
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: encryptDataTransfer = false
- 15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: fsOwner = hadoop(auth:SIMPLE)
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: supergroup = supergroup
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: isPermissionEnabled = true
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: HA Enabled: false
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: Append Enabled: true
- 15/09/23 16:03:18 INFO util.GSet: Computing capacity for map INodeMap
- 15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
- 15/09/23 16:03:18 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
- 15/09/23 16:03:18 INFO util.GSet: capacity = 2^20 = 1048576 entries
- 15/09/23 16:03:18 INFO namenode.FSDirectory: ACLs enabled? false
- 15/09/23 16:03:18 INFO namenode.FSDirectory: XAttrs enabled? true
- 15/09/23 16:03:18 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
- 15/09/23 16:03:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 15/09/23 16:03:18 INFO util.GSet: Computing capacity for map cachedBlocks
- 15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
- 15/09/23 16:03:18 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
- 15/09/23 16:03:18 INFO util.GSet: capacity = 2^18 = 262144 entries
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
- 15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
- 15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
- 15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
- 15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
- 15/09/23 16:03:18 INFO util.GSet: Computing capacity for map NameNodeRetryCache
- 15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
- 15/09/23 16:03:18 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
- 15/09/23 16:03:18 INFO util.GSet: capacity = 2^15 = 32768 entries
- 15/09/23 16:03:18 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1469452028-127.0.0.1-1442995398776
- 15/09/23 16:03:18 INFO common.Storage: Storage directory /home/hadoop/dfs/name has been successfully formatted.
- 15/09/23 16:03:19 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
- 15/09/23 16:03:19 INFO util.ExitUtil: Exiting with status 0
- 15/09/23 16:03:19 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at nmsc2/127.0.0.1
- ************************************************************/
- [hadoop@nmsc2 bin]#
- [hadoop@nmsc1 bin]# cd /opt/hadoop-2.7.1/sbin/
- [hadoop@nmsc1 sbin]# ./start-all.sh
- This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
- 15/09/23 16:48:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Starting namenodes on [192.168.88.21]
- 192.168.88.21: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-namenode-nmsc1.out
- 192.168.88.22: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-datanode-nmsc2.out
- Starting secondary namenodes [192.168.88.21]
- 192.168.88.21: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-nmsc1.out
- 15/09/23 16:48:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- starting yarn daemons
- resourcemanager running as process 5881. Stop it first.
- 192.168.88.22: starting nodemanager, logging to /opt/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-nmsc2.out
- [hadoop@nmsc1 sbin]#
(3)停止的话,输入命令,./stop-all.sh
关于Hadoop相关shell命令调用关系,见下图:
(4)输入命令,jps,可以看到相关信息
- [hadoop@nmsc1 sbin]# jps
- 14201 Jps
- 5881 ResourceManager
- 13707 NameNode
- 13924 SecondaryNameNode
- [hadoop@nmsc1 sbin]#
(1)输入命令
systemctl stop firewalld.service(centos)
chkconfig iptables on(redhat 防火墙开启)
chkconfig iptables off (redhat 防火墙关闭)
(2)浏览器打开http://192.168.181.66:8088/(ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息)
(3)浏览器打开http://192.168.181.66:50070/ (NameNode)
(4)浏览器打开http://192.168.181.66:9001 (备用第二个NameNode)
13.安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用
3.6 遇到的问题
1)hadoop 启动的时候datanode报错 Problem connecting to server
解决办法是修改/etc/hosts,详见http://blog.csdn.net/renfengjun/article/details/25320043
2)启动yarn时候nodemanager无法启动,报错doesn't satisfy minimum allocations, Sending SHUTDOWN signal原因是yarn-site.xml中yarn.nodemanager.resource.memory-mb配置的nodemanager可使用的内存过低,最低不能小于1024M
3)hbase报错:util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using built
解决办法:
sudo rm -r /opt/hbase-1.2.1/lib/native
sudo mkdir /opt/hbase-1.2.1/lib/native
sudo mkdir /opt/hbase-1.2.1/lib/native/Linux-amd64-64
sudo cp -r /opt/hadoop-2.7.1/lib/native/* /opt/hbase-1.2.1/lib/native/Linux-amd64-64/
4)在hbase创建表时指定压缩方式报错”Compression algorithm 'snappy' previously failed test. Set hbase.table.sanity.checks to false at conf“
建表语句为:create 'signal1', { NAME => 'info', COMPRESSION => 'SNAPPY' }, SPLITS => ['00000000000000000000000000000000','10000000000000000000000000000000','20000000000000000000000000000000','30000000000000000000000000000000','40000000000000000000000000000000','50000000000000000000000000000000','60000000000000000000000000000000','70000000000000000000000000000000','80000000000000000000000000000000','90000000000000000000000000000000']
解决办法是在hbase-site.xml中增加配置
<property>
<name>hbase.table.sanity.checks</name>
<value>false</value>
</property>
5)nodemanager无法启动,报错如下
- 2016-01-26 18:45:10,891 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
- 2016-01-26 18:45:11,778 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
- org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
- at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:270)
- at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:196)
- at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
- at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:271)
- at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:486)
- at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:533)
- 2016-01-26 18:45:11,781 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
解决办法
- <property>
- <!--该值不能小于1024-->
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>2048</value>
- </property>
- </configuration>
6) WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决办法 http://zhidao.baidu.com/link?url=_cOK3qt3yzgWwifuMGuZhSOTUyKTiYZfyHr3Xd1id345B9SvSIGsJ-mGLDsk4QseWmBnY5LjxgwHwjKQ4UTFtm8IV6J2im4QfSRh__MhzpW
7) 很多人单机版Hadoop遇到错误Hadoop hostname: Unknown host
解决办法:先ifconfig查看本机IP和用hostname查看主机名,比如为 192.168.55.128和hadoop,那就在/etc/hosts增加一条记录192.168.55.128 hadoop,然后同步修改core-site.xml和mapred-site.xml中localhost为hadoop,修改完后执行./hdfs namenode format,执行完后sbin/start-all.sh就可以了
3.7 网上找到一网友关于hadoop2.7+hbase1.0+hive1.2安装的总结,详见附件“我学大数据技术(hadoop2.7+hbase1.0+hive1.2).pdf”
另外写的比较好的文章有:
Hadoop2.7.1分布式安装-准备篇 http://my.oschina.net/allman90/blog/485352
Hadoop2.7.1分布式安装-安装篇 http://my.oschina.net/allman90/blog/486117
3.8 常用shell
- #显示hdfs指定路径/user/下的文件和文件夹
- bin/hdfs dfs –ls /user/
- #将本地文件/opt/smsmessage.txt上传到hdfs的目录/user/下
- bin/hdfs dfs –put /opt/smsmessage.txt /user/
- #将hdfs上的文件/user/smsmessage.txt下载到本地/opt/目录下
- bin/hdfs dfs -get /user/smsmessage.txt /opt/
- #查看hdfs中的文本文件/opt/smsmessage.txt内容
- bin/hdfs dfs –cat /opt/smsmessage.txt
- #查看hdfs中的/user/smsmessage.txt文件内容
- bin/hdfs dfs –text /user/smsmessage.txt
- #将hdfs上的文件/user/smsmessage.txt删除
- bin/hdfs dfs –rm /user/smsmessage.txt
- #在执行balance 操作之前,可设置一下balance 操作占用的网络带宽,设置10M,10*1024*1024
- bin/hdfs dfsadmin -setBalancerBandwidth <bandwidth in bytes per second>
- #执行Hadoop自带Wordcount例子,/input目录必须存在于HDFS上,且其下有文件,/output目录是输出目录,mapreduce会自动创建
- bin/hadoop jar /opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
- #用这个命令可以检查整个文件系统的健康状况,但是要注意它不会主动恢复备份缺失的block,这个是由NameNode单独的线程异步处理的。
- cd /opt/hadoop-2.7.1/bin
- ./hdfs fsck /
- #Hadoop设置根目录/下的备份数
- cd /opt/hadoop-2.7.1/bin
- ./hadoop fs -setrep -R 2 /
- #也可以使用如下命令
- ./hdfs dfs -setrep -R 2 /
- #打印出了这个文件每个block的详细信息包括datanode的机架信息。
- cd /opt/hadoop-2.7.1/bin
- bin/hadoop fsck /user/distribute-hadoop-boss/tmp/pgv/20090813/1000000103/input/JIFEN.QQ.COM.2009-08-13-18.30 -files -blocks -locations -racks
- #查看配置文件hdfs-site.xml中配置项dfs.client.block.write.replace-datanode-on-failure.enable和dfs.client.block.write.replace-datanode-on-failure.policy配置的值
- cd /opt/hadoop-2.7.1/bin
- ./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.enable
- ./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.policy
- #启动HDFS,该命令会读取slaves和配置文件,将所有节点HDFS相关服务启动
- cd /opt/hadoop-2.7.1/sbin
- ./start-dfs.sh
- #启动yarn,该命令会读取slaves和配置文件,将所有节点YARN相关服务启动
- cd /opt/hadoop-2.7.1/sbin
- ./start-yarn.sh
- #只在单机上启动服务namenode、secondarynamenode、journalnode、datanode
- ./hadoop-daemon.sh start/stop namenode
- ./hadoop-daemon.sh start/stop secondarynamenode
- ./hadoop-daemon.sh start/stop journalnode
- ./hadoop-daemon.sh start/stop datanode
- #查看是否在安全模式
- [hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin
- [hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode get
- Safe mode is OFF
- [hadoop@nmsc2 bin]$
- #离开安全模式
- [hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin
- [hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode leave
- Safe mode is OFF
- [hadoop@nmsc2 bin]$
- #查看某些参数配置值
- [hadoop@nmsc1 bin]$ cd /opt/hadoop-2.7.1/bin
- [hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.datanode.handler.count
- 100
- [hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.handler.count
- 200
- [hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.read.stale.datanode
- false
- [hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.write.stale.datanode
- false
- [hadoop@nmsc1 bin]$