原文:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
http://www.cnblogs.com/kinglau/p/3794433.html
http://www.linuxidc.com/Linux/2014-05/101693p3.htm
Hadoop Common 为2.4.1版
Spark安装
http://book.51cto.com/art/201408/448467.htm
Virtualbox 的共享文件夹
sudo mount -t vboxsf ccc /mnt/shared
http://blog.sina.com.cn/s/blog_62c493b101010ehd.html
scp /usr/local/hadoop/etc/hadoop/* anycom@192.168.56.103:/usr/local/hadoop/etc/hadoop/
注: Centos 修改hostname
需要修改两处:一处是/etc/sysconfig/network,另一处是/etc/hosts,只修改任一处会导致系统启动异常。首先切换到root用户。
- /etc/sysconfig/network
用任一款你喜爱的编辑器打开该文件,里面有一行 HOSTNAME=localhost.localdomain(如果是默认的话),修改 localhost.localdomain为你的主机名。
- /etc/hosts
打开该文件,会有一行 127.0.0.1 localhost.localdomain localhost。其中 127.0.0.1 是本地环路地址, localhost.localdomain是主机名(hostname),也就是你待修改的。localhost是主机名的别名(alias),它会出现在Konsole的提示符下。将第二项修改为你的主机名,第三项可选。
将上面两个文件修改完后,并不能立刻生效。如果要立刻生效的话,可以用 hostname your-hostname作临时修改,它只是临时地修改主机名,系统重启后会恢复原样的。但修改上面两个文件是永久的,重启系统会得到新的主机名。
~/etc/hadoop/core-site.xml
在configuration节点里面添加属性
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:9000</value>
</property>
~/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.dataname.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>namnode:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<name>yarn.resourcemanager.address</name>
<value>namenode:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
复制or创建 mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/usr/local/hadoop/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/usr/local/hadoop/mapred/local</value>
<final>true</final>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.11.7:9001</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>4</value>
<description>
As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
<description>
As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
</description>
</property>
格式化hdfs
sudo rm -rf /usr/local/hadoop/hdfs/data/*
sudo rm -rf /usr/local/hadoop/hdfs/tmp
sudo rm -rf /usr/local/hadoop/hdfs/name/*sudo rm -rf /usr/local/hadoop/hdfs/name/*
bin/hdfs namenode -format
bin/hadoop fs -copyFromLocal README.txt input
bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount input output
Spark 测试
Hadoop 目录下
hadoop fs -put README.txt /
val file=sc.textFile("hdfs://Master.Hadoop:9000/README.txt") //目录不一定对 用hadoop fs -ls / 测一下,ubuntu是不一样的。
val count = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count.collect