安装准备
安装JDK
1.将jdk-8u151-linux.x64.tar.gz移动到/opt文件夹下,执行
tar -xzvf jdk-8u151-linux-x64.tar.gz
2.配置环境变量,对/etc/profile追加:
export JAVA_HOME=/opt/jdk1.8.0_151
export PATH=$PATH:JAVA_HOME/bin
执行source /etc/profile使之生效
3.更改java版本,将原链接在系统自带的openjdk上的java和javac链接到新安装的java和javac上:
a.执行 which java 和which javac查看链接文件的位置,然后将其删除(rm -rf)
b.执行以下命令
ln -s $JAVA_HOME/bin/javac /usr/bin/javac
ln -s $JAVA_HOME/bin/javac /usr/bin/java
c.验证,执行
java -version
看到java的版本是 java1.8.0_151
安装配置Hadoop
1.将hadoop-2.6.0-cdh5.6.0.tar.gz移动到/opt下,执行
tar -zxvf hadoop-2.6.0-cdh5.6.0.tar.gz
2.进入hadoop-2.6.0-cdh5.6.0/etc/hadoop下,配置hadoop:
# gedit core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
# gedit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
(注解:dfs.replication指定HDFS文件的备份方式默认3,由于是伪分布式,因此需要修改为1。)
# gedit mapred-site.xml.template
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/data/hadoop/staging</value>
</property>
</configuration>
(注解:mapreduce.framework.name配置mapreduce框架。)
#gedit yarn-site.xml
<configuration>
<!-- Site specific YARN configurationproperties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hadoop</value>
</property>
</configuration>
(注解:配置YARN信息)
3.配置hadoop环境变量,在/etc/profile文件下追加:
export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.6.0
export PATH=$PATH:$HADOOP_HOME/bin
执行 source /etc/profile 使之生效
4. 格式化hdfs,执行
hdfs namenode -format
5.进入/opt/hadoop-2.6.0-cdh5.6.0/etc/hadoop下,执行:
vi hadoop-env.sh
在其中再显示地重新声明一遍JAVA_HOME,添加:
export JAVA_HOME=/opt/jdk1.8.0_151
启动Hadoop并且验证安装
1.进入目录/opt/hadoop-2.6.0-cdh5.6.0/sbin下,执行:
./start-all.sh
【可能会多次输入密码】
之后如果执行jps显示:
NameNode 进程号
DataNode 进程号
ResourceManager 进程号
NodeManager 进程号
SecondaryNameNode 进程号
打开浏览器,输入:
http://localhost:50070/
如果能够查看信息说明Hadoop安装成功了
2.验证,运行WordCount实例
创建测试两侧文件file1.txt,file2.txt
$ vi file1.txt
welcome to hadoop
hello world!
$ vi file2.txt
hadoop hello
在HDFS上创建输入input输入目录:
$ hdfs dfs -mkdir /input
将file1.txt与file2.txt文件上传到HDFS的input目录
$ hdfs dfs -put file1.txt /input
$ hdfs dfs -put file2.txt /input
查看刚才上传的两个文件
$ hdfs dfs -ls /input
14/10/25 14:43:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hadoop supergroup 31 2014-10-25 14:43 /input/file1.txt
-rw-r--r-- 1 hadoop supergroup 13 2014-10-25 14:43 /input/file2.txt
执行hadoop自带的WordCount程序,统计单词数
进入/opt/hadoop-2.6.0-cdh5.6.0/share/hadoop/mapreduce执行命令:
$ hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
执行没报错,查看运行结果:
$ hdfs dfs -ls /output/part-r-00000
14/10/25 14:54:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `/outputpart-r-00000': No such file or directory
$ hdfs dfs -cat /output/part-r-00000
14/10/25 14:54:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop 2
hello 2
to 1
welcome 1
world! 1
统计结果正确!
centos7伪分布式方式安装Hadoop成功!