实验环境: Ubuntu12.10, Hadoop1.1.1, jdk1.7
准备工作:
1.下载Ubuntu12.10, 下载地址:http://www.ubuntu.com/download
2.下载jdk, 下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
3.下载hadoop, 下载地址:http://apache.dataguru.cn/hadoop/common/hadoop-1.1.1/
详细步骤:
1.安装配置jdk
1)在/home/hotye/下新建一个work文件夹(注:hotye为你自己计算机用户名)
2)使用快捷键Ctrl+Alt+T打开命令终端,拷贝下载好到jdk到/home/hotye/work下
3)解压jdk
#sudo tar -xzvf jdk-7u17-linux-i586.tar.gz
4)重命名jdk
#sudo mv jdk1.7
5)配置jdk环境变量
#sudo vi /etc/profile
输入一下内容:
export JAVA_HOME=/home/hotye/work/jdk1.7
PATH=$JAVA_HOME/bin:$PATH
配置好后,检查是否配置成功。
#java -version
显示的jdk是你下载配置到jdk1.7
#type java
显示的路径为自己解压的java文件路径,说明安装配置成功。
2.安装配置hadoop
1)将下载好到hadoop1.1.1拷贝到/home/hotye/work文件夹下
2)在/home/hotye/work下解压hadoop文件夹 ; #sudo tar -xzvf hadoop-1.1.1.tar.gz
3)重命名解压后到文件名为:hadoop; #sudo mv hadoop-1.1.1 hadoop
4)进入hadoop文件夹下的conf文件夹; #cd /home/hotye/work/hadoop/conf
5)修改core-site.xml文件如下:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
6)修改hdfs-site.xml文件如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
7)修改mapred-site.xml文件如下:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9000</value>
</property>
</configuration>
8)创建一个新的分布式文件系统
#cd /home/hotye/work/hadoop
#sudo bin/hadoop namenode -format
9)启动hadoop
#sudo bin/start-all.sh
10)判断是否配置启动成功
http://localhost:50030 Hadoop管理页面
http://localhost:50070 HadoopDFS状态
http://localhost:50060 Hadoop Task Tracker
如果全部打开,则说明配置hadoop成功。
11)关闭hadoop
#sudo bin/stop-all.sh
3.配置SSH
1)安装SSH; #sudo apt-get install ssh
2)然后创建一个新SSH密钥,以启动无密码登录
#ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3)用以下命令测试
#ssh localhost
如果成功,则无需键入密码。
4.创建一个例子
1)新建一个workspace;
#cd /home/hotye/work
#mkdir workspace
#cdworkspace
#mkdir wordcount
#cd wordcount
#mkdirworkcount_class
#mkdir input
2)在wordcount文件夹下新建一个WordCount.java文件
#vi WordCount.java
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
3)编译WordCount.java
#javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java
4)把编译后到class文件打包
#jar -cvf /home/hotye/work/workspace/wordcount/wordcount.jar -C wordcount_classes/ .
5)在input文件夹下新建一个example.txt
#cd /home/hotye/work/workspace/wordcount/input
#vi example.txt
输入一下内容:
hello world hello bye hello world bye
good hello world good
6)进入hadoop文件夹下,把刚刚创建到example.txt文件拷贝到HDFS中
#sudo bin/hadoop dfs -put /home/hotye/work/workspace/wordcount/input input
7)执行WordCount.java程序
#sudo bin/hadoop jar /home/hotye/work/workspace/wordcount/workcount.jar org.myorg.WordCount input output
8)查看运算结果
#sudo bin/hadoop dfs -cat /output/part-00000
hello 3
hotye 3
huangzn 1