一、编写代码
1、创建maven项目
mvn archetype:generate
[WBQ@westgisB064 one]$ mvn archetype:generate
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:3.2.1:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:3.2.1:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO]
[INFO] --- maven-archetype-plugin:3.2.1:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[WARNING] No archetype found in remote catalog. Defaulting to internal catalog
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: internal -> org.apache.maven.archetypes:maven-archetype-archetype (An archetype which contains a sample archetype.)
2: internal -> org.apache.maven.archetypes:maven-archetype-j2ee-simple (An archetype which contains a simplifed sample J2EE application.)
3: internal -> org.apache.maven.archetypes:maven-archetype-plugin (An archetype which contains a sample Maven plugin.)
4: internal -> org.apache.maven.archetypes:maven-archetype-plugin-site (An archetype which contains a sample Maven plugin site.
This archetype can be layered upon an existing Maven plugin project.)
5: internal -> org.apache.maven.archetypes:maven-archetype-portlet (An archetype which contains a sample JSR-268 Portlet.)
6: internal -> org.apache.maven.archetypes:maven-archetype-profiles ()
7: internal -> org.apache.maven.archetypes:maven-archetype-quickstart (An archetype which contains a sample Maven project.)
8: internal -> org.apache.maven.archetypes:maven-archetype-site (An archetype which contains a sample Maven site which demonstrates
some of the supported document types like APT, XDoc, and FML and demonstrates how
to i18n your site. This archetype can be layered upon an existing Maven project.)
9: internal -> org.apache.maven.archetypes:maven-archetype-site-simple (An archetype which contains a sample Maven site.)
10: internal -> org.apache.maven.archetypes:maven-archetype-webapp (An archetype which contains a sample Maven Webapp project.)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): 7:
Define value for property 'groupId': ten
Define value for property 'artifactId': nine
Define value for property 'version' 1.0-SNAPSHOT: : 1.0
Define value for property 'package' ten: : eight
Confirm properties configuration:
groupId: ten
artifactId: nine
version: 1.0
package: eight
Y: :
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Old (1.x) Archetype: maven-archetype-quickstart:1.1
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: basedir, Value: /home/WBQ/code/maven/one
[INFO] Parameter: package, Value: eight
[INFO] Parameter: groupId, Value: ten
[INFO] Parameter: artifactId, Value: nine
[INFO] Parameter: packageName, Value: eight
[INFO] Parameter: version, Value: 1.0
[INFO] project created from Old (1.x) Archetype in dir: /home/WBQ/code/maven/one/nine
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 39.874 s
[INFO] Finished at: 2023-05-30T22:30:42+08:00
[INFO] ------------------------------------------------------------------------
[WBQ@westgisB064 one]$
2、配置maven项目pom.xml文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>ten</groupId>
<artifactId>nine</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>nine</name>
<url>http://maven.apache.org</url>
<!--依赖版本管理 -->
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<java.version>1.8</java.version>
<hadoop.version>3.1.3</hadoop.version>
<log4j.version>1.2.14</log4j.version>
<junit.version>4.8.2</junit.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>eight.dailyAccessCount</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
3、编写代码
package eight;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class dailyAccessCount {
public static class FindMaxMapper extends Mapper<LongWritable, Text,Text,IntWritable>{
Text course = new Text();
IntWritable score = new IntWritable();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String [] values = value.toString().trim().split(" ");
course.set(values[0]);
score.set(Integer.parseInt(values[1]));
context.write(course,score);
}
}
public static class FindMaxReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int maxScore = -1;
Text course = new Text();
for(IntWritable score:values){
if (score.get()>maxScore){
maxScore = score.get();
course = key;
}
}
context.write(course,new IntWritable(maxScore));
}
}
public static void main(String [] args) throws Exception{
if (args.length != 2){
System.out.println("FindMax <input> <output>");
System.exit(-1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"findmax");
job.setJarByClass(dailyAccessCount.class);
job.setMapperClass(FindMaxMapper.class);
job.setReducerClass(FindMaxReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileSystem.get(conf).delete(new Path(args[1]),true);
FileOutputFormat.setOutputPath(job,new Path(args[1]));
System.out.println(job.waitForCompletion(true) ? 0 : 1);
}
}
4、编译
mvn compile
5、打包
mvn install
二、提交jar包到集群中运行
1、准备数据
vi score.txt
语文 102
数学 30
英语 88
语文 120
数学 100
英语 67
2、启动集群
[WBQ@westgisB064 ~]$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [westgisB064]
Starting datanodes
Starting secondary namenodes [westgisB064]
[WBQ@westgisB064 ~]$ $HADOOP_HOME/sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[WBQ@westgisB064 ~]$
3、上传数据
[WBQ@westgisB064 ~]$ hdfs dfs -mkdir /input
[WBQ@westgisB064 ~]$ hdfs dfs -put /home/WBQ/code/maven/one/nine/score.txt /input
2023-05-30 22:51:56,341 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[WBQ@westgisB064 ~]$
4、执行jar包
[WBQ@westgisB064 ~]$ hadoop jar /home/WBQ/code/maven/one/nine/target/nine-1.0.jar /input /output
2023-05-30 23:00:18,488 INFO client.RMProxy: Connecting to ResourceManager at westgisB064/10.103.105.64:8032
2023-05-30 23:00:19,047 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2023-05-30 23:00:19,078 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/WBQ/.staging/job_1685458797877_0001
2023-05-30 23:00:19,222 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 23:00:19,679 INFO input.FileInputFormat: Total input files to process : 1
2023-05-30 23:00:19,711 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 23:00:20,029 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 23:00:20,055 INFO mapreduce.JobSubmitter: number of splits:1
2023-05-30 23:00:20,163 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 23:00:20,195 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1685458797877_0001
2023-05-30 23:00:20,195 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-05-30 23:00:20,385 INFO conf.Configuration: resource-types.xml not found
2023-05-30 23:00:20,386 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-05-30 23:00:20,659 INFO impl.YarnClientImpl: Submitted application application_1685458797877_0001
2023-05-30 23:00:20,700 INFO mapreduce.Job: The url to track the job: http://westgisB064:8088/proxy/application_1685458797877_0001/
2023-05-30 23:00:20,701 INFO mapreduce.Job: Running job: job_1685458797877_0001
2023-05-30 23:00:28,860 INFO mapreduce.Job: Job job_1685458797877_0001 running in uber mode : false
2023-05-30 23:00:28,861 INFO mapreduce.Job: map 0% reduce 0%
2023-05-30 23:00:33,927 INFO mapreduce.Job: map 100% reduce 0%
2023-05-30 23:00:39,968 INFO mapreduce.Job: map 100% reduce 100%
2023-05-30 23:00:40,984 INFO mapreduce.Job: Job job_1685458797877_0001 completed successfully
2023-05-30 23:00:41,094 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=84
FILE: Number of bytes written=436247
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=167
HDFS: Number of bytes written=32
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2778
Total time spent by all reduces in occupied slots (ms)=3010
Total time spent by all map tasks (ms)=2778
Total time spent by all reduce tasks (ms)=3010
Total vcore-milliseconds taken by all map tasks=2778
Total vcore-milliseconds taken by all reduce tasks=3010
Total megabyte-milliseconds taken by all map tasks=2844672
Total megabyte-milliseconds taken by all reduce tasks=3082240
Map-Reduce Framework
Map input records=6
Map output records=6
Map output bytes=66
Map output materialized bytes=84
Input split bytes=104
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=84
Reduce input records=6
Reduce output records=3
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=142
CPU time spent (ms)=1890
Physical memory (bytes) snapshot=658104320
Virtual memory (bytes) snapshot=5683085312
Total committed heap usage (bytes)=857735168
Peak Map Physical memory (bytes)=358383616
Peak Map Virtual memory (bytes)=2836721664
Peak Reduce Physical memory (bytes)=299720704
Peak Reduce Virtual memory (bytes)=2846363648
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=63
File Output Format Counters
Bytes Written=32
0
[WBQ@westgisB064 ~]$
5、查看执行结果
[WBQ@westgisB064 ~]$ hdfs dfs -cat /output/*
2023-05-30 23:01:27,454 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
数学 100
英语 88
语文 120
[WBQ@westgisB064 ~]$
6、执行完毕,关闭集群
[WBQ@westgisB064 ~]$ $HADOOP_HOME/sbin/stop-yarn.sh
Stopping nodemanagers
westgisB065: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
[WBQ@westgisB064 ~]$ $HADOOP_HOME/sbin/stop-dfs.sh
Stopping namenodes on [westgisB064]
Stopping datanodes
Stopping secondary namenodes [westgisB064]
[WBQ@westgisB064 ~]$ ps aux|grep java
WBQ 27055 0.0 0.0 112712 980 pts/2 R+ 23:02 0:00 grep --color=auto java
[WBQ@westgisB064 ~]$
备注:代码部分来自不争气大王