这里废话不多说,由于hadoop对于新手第一次接触还是会遇到很多坑点的,所以在这里记录一下hadoop使用过程中的各种问题,记录交流。
hadoop的下载
hadoop项目开源,直接从apache官网下载:
http://hadoop.apache.org/
点击下载能够看到其提供的版本,要区分2.x版本与3.x版本。
博主使用的是2.10.1的版本,但新手不建议使用,很多教程里推荐的更多是2.7.x的版本,官网也有保存着历史版本文件:历史版本下载
注意下载二进制版本binary,source是源码版本。
我以2.10.1版本为例子,下载好的包解压到指定目录中,我解压的包在:
/opt/hadoop_2.10/hadoop-2.10.1
解压好后进行配置-安装:
这里参考下博客 :安装教程
参考里面安装hadoop时配置的文件就行,其中后面需要自己配置一下下面文件,这里进行伪分布式集群的搭建:
vim ./etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/bbihh/hadoopWork/hdfs</value> <!-- file: path/hdfs 这个path为你存放hdfs的目录 -->
</property>
</configuration>
vim ./etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/bbihh/hadoopWork/tmp/dfs/name</value> <!-- file: path/tmp/dfs/name 这个path为你存放hdfs的目录 -->
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/bbihh/hadoopWork/tmp/dfs/data</value> <!-- file: path/tmp/dfs/data 这个path为你存放hdfs的目录 -->
</property>
</configuration>
开启hadoop
首先需要设置ssh 免密登录 , 默认会,不会可以参考其他博文。
在经过上面配置完后,使用 ssh localhost 进行免密登录
登录后会在终端模拟一个localhost远程账号,使用其对hadoop进行操作
首先 cd 到hadoop解压的目录: /opt/hadoop_2.10/hadoop-2.10.1
~: cd /opt/hadoop_2.10/hadoop-2.10.1/
在进行namenode的初始化:
bin/hadoop fs -format namenode
开启hdfs线程:
sbin/start-dfs.sh
执行后就能通过 : localhost:50070从web中查看到hadoop信息
开启yarn线程:
sbin/start-yarn.sh
开启后能够访问: localhost:8088
现在hadoop就基本部署成功了。
当关闭hadoop节点时,还需要进行关闭操作:
sbin/stop-all.sh
然后可以通过终端exit退出ssh登录。