- 修改Driver.java
Job job = Job.getInstance(); //封装成Job对象
//指明main方法在Driver类中
job.setJarByClass(Driver.class);
job.setMapOutputKeyClass(Text.class);//map输出的key的类型
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);//reducer输出的key的类型
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMapper.class);//设置返回的类型
job.setReducerClass(WordCountReducer.class);
FileInputFormat.setInputPaths(job,new Path(args[0]));//读取文件路径
FileOutputFormat.setOutputPath(job,new Path(args[1]));//结果输出的路径
job.waitForCompletion(true);//完成之后退出
- 配置信息
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin </artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<!-- Driver的全路径 -->
<mainClass>mapreduce.Driver</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
-
打包
idea右侧maven lifecycle—>package—>绿色三角运行
文件下会出现target,下面会有2个jar包,选择没有依赖的,复制到集群上 -
集群
- 先启动进程
- 执行命令
命令格式 :
hadoop jar jar包全路径 Driver类全路径 要进行计算的文件的全路径 输出结果的全路径
hadoop jar wordCount.jar mapreduce.Driver /hduser/input/wc.input /hduser/output1
`