终于成功运行第一个Hadoop程序

 

之前一直只在Windows上玩 为了Hadoop 只好迁移到Ubuntu上~

 



 

记录一下自己的历程:

 

1. Windows上安装Ubuntu

 

方法就是通过wubi安装,全自动化,非常傻瓜,太适合我这种linux小白了!

 

wubi可以直接在官网下载,当时我不知道,使用的还是从iso之中解压出来的程序 哈哈

 

下载地址:http://www.ubuntu.com/download/desktop/windows-installer

 

自己选择合适的来用啊

 

 

 

 

2. 进行各种配置,包括: JDK Hadoop, 当然也包括安装一个拼音输入法。。。。

 

JDK以及配置Hadoop的方法,完全参考的:http://os.51cto.com/art/201211/364167.htm

 Note: 跟着上面的步骤走下来我才知道,这个实际上是伪分布式~

 

3. 开始跑一个例子程序吧~

hadoop自带的例子程序 就在解压出来的hadoop文件夹之中,但是一开始一直遇到一个很奇怪的问题:报错如下:

 

13/07/06 20:10:59 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)

at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)

at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)

at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)

at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)

at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

 

在各种google之后才知道,hdfs里面还没有把input这个文件夹加上去

 

敲入命令:

 

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -put input input

 

这样之后,通过dfs-ls命令终于能找到input文件夹:

 

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-07-06 20:12 /user/hadoop/input

 

 

 

这个时候再次执行运行例子程序的命令,就没有之前遇到的异常:

 

Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input

 

最后正确执行的log如下:

 

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.2.0.jar wordcount input wc_output2

13/07/06 20:12:39 INFO input.FileInputFormat: Total input paths to process : 1

13/07/06 20:12:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13/07/06 20:12:39 WARN snappy.LoadSnappy: Snappy native library not loaded

13/07/06 20:12:40 INFO mapred.JobClient: Running job: job_201307061937_0006

13/07/06 20:12:41 INFO mapred.JobClient: map 0% reduce 0%

13/07/06 20:12:46 INFO mapred.JobClient: map 100% reduce 0%

13/07/06 20:12:53 INFO mapred.JobClient: map 100% reduce 33%

13/07/06 20:12:55 INFO mapred.JobClient: map 100% reduce 100%

13/07/06 20:12:55 INFO mapred.JobClient: Job complete: job_201307061937_0006

13/07/06 20:12:56 INFO mapred.JobClient: Counters: 29

13/07/06 20:12:56 INFO mapred.JobClient: Job Counters

13/07/06 20:12:56 INFO mapred.JobClient: Launched reduce tasks=1

13/07/06 20:12:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4785

13/07/06 20:12:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/07/06 20:12:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/07/06 20:12:56 INFO mapred.JobClient: Launched map tasks=1

13/07/06 20:12:56 INFO mapred.JobClient: Data-local map tasks=1

13/07/06 20:12:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8917

13/07/06 20:12:56 INFO mapred.JobClient: File Output Format Counters

13/07/06 20:12:56 INFO mapred.JobClient: Bytes Written=24591

13/07/06 20:12:56 INFO mapred.JobClient: FileSystemCounters

13/07/06 20:12:56 INFO mapred.JobClient: FILE_BYTES_READ=34471

13/07/06 20:12:56 INFO mapred.JobClient: HDFS_BYTES_READ=46619

13/07/06 20:12:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180119

13/07/06 20:12:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=24591

13/07/06 20:12:56 INFO mapred.JobClient: File Input Format Counters

13/07/06 20:12:56 INFO mapred.JobClient: Bytes Read=46510

13/07/06 20:12:56 INFO mapred.JobClient: Map-Reduce Framework

13/07/06 20:12:56 INFO mapred.JobClient: Map output materialized bytes=34471

13/07/06 20:12:56 INFO mapred.JobClient: Map input records=561

13/07/06 20:12:56 INFO mapred.JobClient: Reduce shuffle bytes=34471

13/07/06 20:12:56 INFO mapred.JobClient: Spilled Records=4992

13/07/06 20:12:56 INFO mapred.JobClient: Map output bytes=77170

13/07/06 20:12:56 INFO mapred.JobClient: Total committed heap usage (bytes)=219152384

13/07/06 20:12:56 INFO mapred.JobClient: CPU time spent (ms)=2450

13/07/06 20:12:56 INFO mapred.JobClient: Combine input records=7804

13/07/06 20:12:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=109

13/07/06 20:12:56 INFO mapred.JobClient: Reduce input records=2496

13/07/06 20:12:56 INFO mapred.JobClient: Reduce input groups=2496

13/07/06 20:12:56 INFO mapred.JobClient: Combine output records=2496

13/07/06 20:12:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=280190976

13/07/06 20:12:56 INFO mapred.JobClient: Reduce output records=2496

13/07/06 20:12:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2009939968

13/07/06 20:12:56 INFO mapred.JobClient: Map output records=7804

 

 

 

 

 

最后,查看运行完成之后的结果:

 

bin/hadoop dfs -cat wc_output2/* | more

 

 

 

备注:

 

几个GUI管理网址:[目前我知道的]

 

http://localhost:50070/dfshealth.jsp 

 

http://localhost:50030/jobtracker.jsp

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hadoop第一个WordCount程序是一个非常简单的程序,它的主要目的是计算一个文本文件中每个单词出现的次数。 以下是一个基本的WordCount程序: 1. 创建一个Java项目并导入Hadoop库。 2. 创建一个Java类并实现以下Mapper和Reducer: Mapper类: ```java import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private final static LongWritable one = new LongWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for (String w : words) { word.set(w); context.write(word, one); } } } ``` Reducer类: ```java import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable val : values) { sum += val.get(); } context.write(key, new LongWritable(sum)); } } ``` 3. 在应用程序main()方法中,创建一个Job并设置Mapper和Reducer类: ```java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 4. 在命令行中运行程序,并指定输入和输出文件路径: ```bash hadoop jar WordCount.jar WordCount /input /output ``` 其中,/input是输入文件路径,/output是输出文件路径。 这是最基本的WordCount程序,你可以根据需要进行修改和扩展。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值