hadoop创建java项目的步骤_牛人三分钟教你用Maven构建Hadoop项目,

output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext) { sum += values.next.get; } result.set(sum); output.collect(key, result); } } public static void main(String args) throws Exception { String input = 'hdfs://192.168.1.210:9000/user/hdfs/o_t_account'; String output = 'hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result'; JobConf conf = new JobConf(WordCount.class); conf.setJobName('WordCount'); conf.addResource('classpath:/hadoop/core-site.xml'); conf.addResource('classpath:/hadoop/hdfs-site.xml'); conf.addResource('classpath:/hadoop/mapred-site.xml'); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(WordCountMapper.class); conf.setCombinerClass(WordCountReducer.class); conf.setReducerClass(WordCountReducer.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(input)); FileOutputFormat.setOutputPath(conf, new Path(output)); JobClient.runJob(conf); System.exit(0); } }

启动Java APP.

控制台错误

2013-9-30 19:25:02 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-9-30 19:25:02 org.apache.hadoop.security.UserGroupInformation doAs 严重: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1702422322\.staging to 0700 Exception in thread 'main' java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1702422322\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) at org.conan.myhadoop.mr.WordCount.main(WordCount.java:78)

这个错误是win中开发特有的错误,文件权限问题,在Linux下可以正常运行。

解决方法是,修改/hadoop-1.0.3/src/core/org/apache/hadoop/fs/FileUtil.java文件

688-692行注释,然后重新编译源代码,重新打一个hadoop.jar的包。

685 private static void checkReturnValue(boolean rv, File p, 686 FsPermission permission 687 ) throws IOException { 688 /*if (!rv) { 689 throw new IOException('Failed to set permissions of path: ' + p + 690 ' to ' + 691 String.format('%04o', permission.toShort)); 692 }*/ 693 }

我这里自己打了一个hadoop-core-1.0.3.jar包,放到了lib下面。

我们还要替换maven中的hadoop类库。

~ cp lib/hadoop-core-1.0.3.jar C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-core\1.0.3\hadoop-core-1.0.3.jar

再次启动Java APP,控制台输出:

2013-9-30 19:50:49 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-9-30 19:50:49 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2013-9-30 19:50:49 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2013-9-30 19:50:49 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded 2013-9-30 19:50:49 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 4 2013-9-30 19:50:50 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0001 2013-9-30 19:50:50 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:50 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:50 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 2013-9-30 19:50:51 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 0% reduce 0% 2013-9-30 19:50:53 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00003:0+119 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000000_0' done. 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:53 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:53 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting 2013-9-30 19:50:54 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 0% 2013-9-30 19:50:56 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00000:0+113 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000001_0' done. 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:56 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:56 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting 2013-9-30 19:50:59 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00001:0+110 2013-9-30 19:50:59 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00001:0+110 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000002_0' done. 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-9-30 19:50:59 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-9-30 19:50:59 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.210:9000/user/hdfs/o_t_account/part-m-00002:0+79 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000003_0' done. 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 4 sorted segments 2013-9-30 19:51:02 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 4 segments left of total size: 442 bytes 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 2013-9-30 19:51:02 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-9-30 19:51:02 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-9-30 19:51:02 org.apache.hadoop.mapred.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result 2013-9-30 19:51:05 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-9-30 19:51:05 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_r_000000_0' done. 2013-9-30 19:51:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-9-30 19:51:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0001 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=421 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=348 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=7377 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1535 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=209510 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=348 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=458 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map input records=11 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=30 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=509 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1838546944 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map input bytes=421 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=452 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Combine input records=22 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=15 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=13 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Combine output records=15 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=13 2013-9-30 19:51:06 org.apache.hadoop.mapred.Counters log 信息: Map output records=22

成功运行了wordcount程序,通过命令我们查看输出结果

~ hadoop fs -ls hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result Found 2 items -rw-r--r-- 3 Administrator supergroup 0 2013-09-30 19:51 /user/hdfs/o_t_account/result/_SUCCESS -rw-r--r-- 3 Administrator supergroup 348 2013-09-30 19:51 /user/hdfs/o_t_account/result/part-00000 ~ hadoop fs -cat hdfs://192.168.1.210:9000/user/hdfs/o_t_account/result/part-00000 1,abc@163.com,2013-04-22 1 10,ade121@sohu.com,2013-04-23 1 11,addde@sohu.com,2013-04-23 1 17:21:24.0 5 2,dedac@163.com,2013-04-22 1 20:21:39.0 6 3,qq8fed@163.com,2013-04-22 1 4,qw1@163.com,2013-04-22 1 5,af3d@163.com,2013-04-22 1 6,ab34@163.com,2013-04-22 1 7,q8d1@gmail.com,2013-04-23 1 8,conan@gmail.com,2013-04-23 1 9,adeg@sohu.com,2013-04-23 1

这样,我们就实现了在win7中的开发,通过Maven构建Hadoop依赖环境,在Eclipse中开发MapReduce的程序,然后运行JavaAPP。Hadoop应用会自动把我们的MR程序打成jar包,再上传的远程的hadoop环境中运行,返回日志在Eclipse控制台输出。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值