本例在node01伪分布式配置,node02,node03,node04,node05高可用环境配置,在node01实践编程,本例能够运行的前提是搭建好以上环境
hadoop环境搭建
伪分布式(单节点)修改配置
(1) mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
完全分布式修改配置
(1) mv mapred-site.xml.template mapred-site.xml //修改名称
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
(2)配置yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1></value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node04</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node05</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node03:2181,node04:2181,node05:2181</value>
</property>
分发配置文件给其他节点
scp yarn-site.xml mapred-site.xml root@node03/`pwd`
(3)resourcemanager节点免密钥
node04,node05免密钥
node04将公钥分发到node05,并追加到authorized_keys文件中:
scp /root/.ssh/id_rsa.pub root@node05:`pwd`/node04.pub
cat node04.pub>>authorized_keys
node05也是一样的操作
最后验证一下:
(4)启动
在node03、4、5上启动zookeeper,zkServer.sh start
启动hdfs,start-dfs.sh
启动yarn,start-yarn.sh
启动resourcemanager,在node04,node05 yarn-daemon.sh start resourcemanager
(5)浏览器查看(ss -nal 查看通讯端口,node04:8088)
(6)运行wordcount
cd /opt/jxxy/hadoop-2.6.5/share/hadoop/mapreduce/
hadoop jar hadoop-mapreduce-example-2.6.5.jar wordcount /user/root/test.txt /wordcount
查看结果:
mapreduce编程实践
(1) 新建一个eclipse项目,导入mapred-site.xml,yarn-site.xml,hdfs-site.xml,core-site.xml,导入hadoop_jars包
(2) 新建com.jxxy.mr.test.WordCount主类
Configuration conf=new Configuration();
try{
//创建一个新作业
Job job=Job.getInstance(conf);
job.setJarByClass(WordCount.class); //jar包
job.setJobName("myjob");
//job.setInputPath(new Path());
//job.setOutputPath(new Path());
Path inPath=new Path("/user/root/test.txt");
FileInputFormat.addInputPath(job,inPath);
//org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Path outPath=new Path("/output/wordcount");
//如果输出路径存在,则先删除
if(outPath.getFileSystem(conf).exists(outPath))
outPath.getFileSystem(conf).delete(outPath,true);
FileOutputFormat.setOutputPath(job,outPath);
//org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//创建MyMapper,MyReducer两个类
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setReducerClass(MyReducer.class);
//提交作业
job.waitForCompletion(true);
}
catch(Exception e){
}
(3) 建立Mapper的子类MyMapper类
public class MyMapper extends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritable one=new IntWritable(1);
private Text word=new Text();
public void map(Object key,Text value,Context context) throws IOException,InterruptedException{
StringTokenizer str=new StringTokenizer(value.toString());
while(str.hasMoreTokens()){
word.set(str.nextToken());
context.write(word,one);
}
}
}
(4) 建立Reducer的子类MyReducer类
public class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
//迭代计算
private IntWritable result=new IntWritable();
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException{
int sum=0;
for(IntWritable val:values){
sum+=val.get();
}
result.set(sum);
context.write(key,result);
}
}
(5) 部署,制作jar包,上传到集群(本例中放在/root/)
(6) 运行程序
hadoop jar wc.jar com.jxxy.mr.test.WordCount
文件系统中查看结果:
这里要注意运行环境中的jdk版本要比编译版本高,如果版本不一致,需要更换jdk版本,本例运行版本是1.7(x64)jdk1.7下载地址https://www.oracle.com/cn/java/technologies/javase/javase7-archive-downloads.html