Hadoop学习1_在使用命令行运行WordCount时,遇到的jar命令说明

37 篇文章 2 订阅

1. jar cf WordCount.jar WordCount*.class

使用情况:

编译WordCount.java文件,编译java文件的命令为javac,截图如下:

编译WordCount.java编译WordCount.java

此时,在workspace文件夹下将会出现生成三个class文件,

编译后生成class文件编译后生成class文件

编译成功后,即可将三个class文件打包成jar文件,

打包class成jar文件打包class成jar文件

执行成功后,在workspace文件下生成了WordCount.jar文件,

打包jar完成打包jar完成


jar cf WordCount.jar WordCount*.class

-c:创建新的jar文件包;

-f:指定jar文件名;

WordCount.jar:[jar-文件] 即需要生成、查看、更新或者解开的 JAR 文件包,它是 -f 参数的附属参数 ;

WordCount*.class:名字可以简写为WordCount的.class文件;

2.bin/hadoop jar workspace/WordCount.jar WordCount input output

使用情况:

在/usr/local/hadoop文件夹下新建一个input文件夹,用于存放数据,

创建input文件夹创建input文件夹

接着cd 到input文件下,执行以下命令,就是将’Hello World Bye World’写进file01文件,将’Hello Hadoop Goodbye Hadoop’ 写进file02文件

创建输入数据创建输入数据

最后运行程序,

运行程序运行程序


bin/hadoop jar workspace/WordCount.jar WordCount input output

类似的:

hadoop jar WordCount.jar WordCount input output

hadoop jar WordCount.jar WordCount /tmp/input /tmp/output

/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output

/usr/local/hadoop/bin/hadoop jar WordCount.jarorg.apache.hadoop.examples.WordCount input output

(因为某些程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples 写完整,这些程序的第一行代码就是:package org.apache.hadoop.examples)


bin/hadoop:/usr/local/hadoop/bin/hadoop,这是一个hadoop文件的位置,不是文件夹,是对java命令的又一层封装,可以认为是hadoop在shell端的脚本;

jar:执行一个作业任务,其数据在jar中;

workspace/WordCount.jar:WordCount.jar的详细位置,结合前面的参数/usr/local/hadoop,详细位置为/usr/local/hadoop/workspace/WordCount.jar;

WordCount:

input:在hdfs中的数据输入目录;

output:在hdfs中的数据输出目录;


代码示例:

  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.examples;
  19. import java.io.IOException;
  20. import java.util.StringTokenizer;
  21. import org.apache.hadoop.conf.Configuration;
  22. import org.apache.hadoop.fs.Path;
  23. import org.apache.hadoop.io.IntWritable;
  24. import org.apache.hadoop.io.Text;
  25. import org.apache.hadoop.mapreduce.Job;
  26. import org.apache.hadoop.mapreduce.Mapper;
  27. import org.apache.hadoop.mapreduce.Reducer;
  28. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  29. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  30. import org.apache.hadoop.util.GenericOptionsParser;
  31. publicclassWordCount{
  32. publicstaticclassTokenizerMapper
  33. extendsMapper<Object,Text,Text,IntWritable>{
  34. privatefinalstaticIntWritable one=newIntWritable(1);
  35. privateText word=newText();
  36. publicvoid map(Object key,Text value,Context context
  37. )throwsIOException,InterruptedException{
  38. StringTokenizer itr =newStringTokenizer(value.toString());
  39. while(itr.hasMoreTokens()){
  40. word.set(itr.nextToken());
  41. context.write(word, one);
  42. }
  43. }
  44. }
  45. publicstaticclassIntSumReducer
  46. extendsReducer<Text,IntWritable,Text,IntWritable>{
  47. privateIntWritable result=newIntWritable();
  48. publicvoid reduce(Text key,Iterable<IntWritable> values,
  49. Context context
  50. )throwsIOException,InterruptedException{
  51. int sum =0;
  52. for(IntWritable val: values){
  53. sum+= val.get();
  54. }
  55. result.set(sum);
  56. context.write(key, result);
  57. }
  58. }
  59. publicstaticvoid main(String[] args)throwsException{
  60. Configuration conf =newConfiguration();
  61. String[] otherArgs =newGenericOptionsParser(conf, args).getRemainingArgs();
  62. if(otherArgs.length!=2){
  63. System.err.println("Usage: wordcount <in> <out>");
  64. System.exit(2);
  65. }
  66. Job job =newJob(conf,"word count");
  67. job.setJarByClass(WordCount.class);
  68. job.setMapperClass(TokenizerMapper.class);
  69. job.setCombinerClass(IntSumReducer.class);
  70. job.setReducerClass(IntSumReducer.class);
  71. job.setOutputKeyClass(Text.class);
  72. job.setOutputValueClass(IntWritable.class);
  73. FileInputFormat.addInputPath(job,newPath(otherArgs[0]));
  74. FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));
  75. System.exit(job.waitForCompletion(true)?0:1);
  76. }
  77. }


参考资料:

http://blog.csdn.net/wang_zhenwei/article/details/47403825

http://dblab.xmu.edu.cn/blog/hadoop-build-project-by-shell/


Hadoop WordCount是一个经典的MapReduce示例,它用于统计文本文件中每个单词的出现次数。在Hadoop集群上运行WordCount,你需要通过命令行界面或者提交应用程序到YARN。以下是基本步骤: 1. **准备数据**:首先将你的文本文件上传到Hadoop的分布式文件系统(如HDFS)。例如,你可以使用`hdfs dfs -put yourfile.txt /input`。 2. **编写Mapper和Reducer**:虽然默认的WordCount实现已经包含了Mapper和Reducer,但在实际项目中,你可能需要自定义这些组件。Mapper通常读取输入,分割单词,并发射键值对(单词作为键,1作为初始值),Reducer接收键值对并计算总数。 3. **创建Job Configuration**:使用Hadoop Streaming工具,你需要配置JobConf,指定Mapper、Reducer的入口点,以及其他的设置。这可以通过下面的命令模板开始: ``` hadoop jar hadoop-streaming.jar \ -mapper "your.mapper.class" \ -reducer "your.reducer.class" \ -input /input \ -output /output \ -file yourmapper.py,yourreducer.py \ -mapperargs "-tokenize" \ -jobconf mapred.reduce.tasks=1 ``` 其中,`your.mapper.class`和`your.reducer.class`替换为你的Mapper和Reducer的全限定名,`-mapperargs`用于传递给Mapper的参数。 4. **运行任务**:最后,执行这个命令启动WordCount作业: ```bash hadoop com.example.WordCount ``` 5. **检查结果**:WordCount完成后,可以在HDFS的/output目录下查看结果,通常是按单词排序的结果列表。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值