虚拟机3台:
IP:
10.10.236.190 master
10.10.236.191 slave-A
10.10.236.192 slave-B
操作系统:CentOs5.6
JDK1.6.37
hadoop-1.0.4-1.x86_64.rpm
2.配置ssh免密码登录
首先进到root目录下
root@master:~# $ssh-keygen -t rsa
进入.ssh目录
root@master:~/.ssh# cp id_rsa.pubauthorized_keys
其余的datanode的机器
新建.ssh目录
root@slave-A:~# mkdir .ssh
在name(master)上远程拷贝
root@master:~/.ssh# scp authorized_keysslave-A:/root/.ssh/
测试ssh,保证从master机器到salve节点机器,ssh不需要密码。
3.安装JDK,配置JDK环境变量
10.10.236.190 master
10.10.236.191 slave-A
10.10.236.192 slave-B
全部安装,并且配置环境变量
exportJAVA_HOME=/opt/jre1.6.0_37
exportPATH=$JAVA_HOME/bin:$PATH
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
4.安装hadoop程序,
3台机器都安装hadoop程序,
命令:
rpm –ivh hadoop-1.0.4-1.x86_64.rpm
5.修改hadoop程序配置
进入Hadoop目录
cd /etc/hadoop
修改core-site.xml,增加下面内容
复制代码
<property>
<name>fs.default.name</name>
<value>hdfs:// 10.10.236.190:54310</value>//这个才是真正决定namenode
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hdfs/tmp</value> //临时文件,有问题的时候,可以删除
<description>A base for othertemporary directories.</description>
</property>
复制代码
3.修改hdfs-site.xml,增加下面内容
复制代码
<property>
<name>dfs.name.dir</name>
<value>/data/hdfs/name</value> //namenode持久存储名字空间,事务日志的本地路径
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hdfs/data</value> //datanode存放数据的路径
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value> //数据备份的个数,默认是3
</property>
复制代码
3.修改mapred-site.xml,增加下面内容
<property>
<name>mapred.job.tracker</name> //jobTracker的主机
<value>10.10.236.190:54311</value>
</property>
4. .修改masters,这个决定那个是secondarynamenode
10.10.236.190
5 .修改slaves,这个是所有datanode的机器
10.10.236.191
10.10.236.192
6.修改hadoop-env.sh的配置
export JAVA_HOME=/opt/jre1.6.0_37
7.复制配置到从节点,保持3台节点配置一致性。
scp -rp /etc/hadoop 10.10.236.191: /etc/hadoop
scp -rp /etc/hadoop 10.10.236.192: /etc/hadoop
8.修改启动脚本的权限
主节点的启动脚本可能没有执行权限,
改权限
chmod 755 /usr/sbin/s*.sh
9.格式化HDFS
/usr/sbin/hadoop namenode –format
10.启动Hadoop
start-all.sh
最后,验证Hadoop是否安装成功。打开浏览器,分别输入一下网址:
http:// 10.10.236.190:50030 (MapReduce的Web页面)
http:// 10.10.236.190:50070 (HDfS的web页面)
如果都能查看,说明安装成功。
Hadoop分别从三个角度将主机划分为两种角色:
第一,划分为master和slave。
第二,从HDFS的角度,将主机划分为namenode和datanode(在分布式文件系统中,目录的管理很重要,管理目录的就相当于主人,而namenode就是目录管理者)。
第三,从MapReduce的角度,将主机划分为JobTracker和TaskTracker(一个job经常被划分为多个task,从这个角度不难理解它们之间的关系)。
HDFS可以存入和读取本地文件,Hadoop默认提供了一个测试程序wordcount验证Mapreduce框架
输入如下命令
cd /opt
mkdir input
cd input
cp ../*.txt .
cd ..
[root@master input]# ls
README test.txt THIRDPARTYLICENSEREADME.txt Welcome.txt
input目录下拷贝放入了几个文本文件。每个文件都有很多英文字符。
输入命令,复制文件到HDFS。
hadoop fs -put input/ input
cd /usr/share/hadoop
hadoop jar hadoop-examples-1.0.4.jar wordcountinput output
查看结果运行结果:
hadoop fs -ls output
hadoop fs -cat output/part-r-00000
可能的正确结果如下:
。。。。。。
we 15
web 3
webmaster 1
website. 2
well 3
well-defined 1
were 2
what 5
whatever 2
whatsoever 2
whatsoever, 1
when 5
where 7
whereas 1
wherever 13
wherewithal 1
whether 25
which 39
which, 1
while 1
who 8
whole 5
whole, 9
whole. 1
wholly 1
whom 9
whose 2
wide 1
widely 1
widest 1
will 23
willing 1
wish 3
wish); 1
wish.) 1
with 181
with, 2
with. 1
within 35
without 95
wo 1
work 66
work, 11
work. 8
works 10
works, 1
works. 2
worldwide 1
worldwide, 12
would 6
write 4
writing 7
writing, 12
written 52
wrote 2
years, 1
you 112
you, 1
you. 2
you; 1
your 52
zlib 1
zlib.h 1
zlib; 1
~~~~~ 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1
漏 1
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.Reducer;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.util.GenericOptionsParser;
- public class WordCount {
- public static class TokenizerMapper
- extends Mapper<Object, Text, Text, IntWritable>{
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(Object key, Text value, Context context
- ) throws IOException, InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()) {
- word.set(itr.nextToken());
- context.write(word, one);
- }
- }
- }
- public static class IntSumReducer
- extends Reducer<Text,IntWritable,Text,IntWritable> {
- private IntWritable result = new IntWritable();
- public void reduce(Text key, Iterable<IntWritable> values,
- Context context
- ) throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- result.set(sum);
- context.write(key, result);
- }
- }
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
- String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
- if (otherArgs.length != 2) {
- System.err.println("Usage: wordcount <in> <out>");
- System.exit(2);
- }
- Job job = new Job(conf, "word count");
- job.setJarByClass(WordCount.class);
- job.setMapperClass(TokenizerMapper.class);
- job.setCombinerClass(IntSumReducer.class);
- job.setReducerClass(IntSumReducer.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
- FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
- }
(略)