一:安装hadoop
1.binary
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz
2.hadoop2.6 需要jdk1.7的环境,mac自带的是jdk1.6,升级方法见上一篇文章
3.tar -zxvf hadoop-2.6.2.tar.gz
4.cd hadoop-2.6.2/etc/hadoop
vi hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home
5.vi core-site.xml
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/username/opt/hadoop-2.6.2/tmp</value>
</property>
</configuration>
6.vi
hdfs-site.xml
<configuration>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
7.
mv mapred-site.xml.template mapred-site.xml
vi
mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
9.将hadoop加入环境变量
10.验证
usernameMacBook-Pro:jdk7.unpkg zhangxinxin$ hadoop version
Hadoop 2.6.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0cfd050febe4a30b1ee1551dcc527589509fb681
Compiled by jenkins on 2015-10-22T00:42Z
Compiled with protoc 2.5.0
Hadoop 2.6.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0cfd050febe4a30b1ee1551dcc527589509fb681
Compiled by jenkins on 2015-10-22T00:42Z
Compiled with protoc 2.5.0
11.初始化
格式化HDFS(namenode)第一次使用时要格式化
hadoop namenode -format
12.
启动hadoop
先启动HDFS
sbin/start-dfs.sh
再启动YARN
sbin/start-yarn.sh
13.
jps
27408 NameNode
28218 Jps
27643 SecondaryNameNode
28066 NodeManager
27803 ResourceManager
27512 DataNode
14.跑一个官方的例子
hadoop jar /Users/username/opt/hadoop-2.6.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /data/data1.txt out2/
15.配置ssh免登陆
生成ssh免登陆密钥 cd ~/.ssh/
ssh-keygen -t rsa (四个回车)
执行完这个命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
将公钥拷贝到要免登陆的机器上,在免登录的机器上执行
scp id_rsa.pub username@119.81.91.251:/home/username/.ssh/
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
若是自己也要自己免登陆的话,自己也要执行这个:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 644 id_rsa.pub 或者是
chmod 600 authorized_keys 这一步很重要,不然还是要密码的