最近在工作中常常遇到一些大数据处理的问题如:内存不足,磁盘t空间不足,处理速度慢等等问题, 于是在寻求解决方案的过程中发现Hadoop可能是一个很不错的选择。
云计算在大数据处理方面在业内越来越火,本人怀着好奇之心偷窥了一下其源码,并写下了这篇文章。
一 ,环境搭建.
JDK:
hadoop@foreveryy:~/tmp$ java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
系统平台:
hadoop@foreveryy:~/tmp$ uname -a
Linux foreveryy 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:51:22 UTC 2012 i686 i686 i386 GNU/Linux
Hadoop version:
hadoop@foreveryy:~/tmp$ hadoop version
Hadoop 1.1.1-SNAPSHOT
Subversion http://192.168.0.103/svn/hadoop-release-1.1.1/trunk -r 222
Compiled by aheroboy on Sat Nov 17 07:10:49 PST 2012
From source with checksum e4a97c0dc6dfa7eebc2f68e043c85801
二,代码:
Hello World 代码:
package org.aheroboy.hadoop;
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello hadoop!");
}
}
导出jar包,注意导出jar包时记住勾选Main class,至于原因,切看后面,应为hadoop 会根据提供的命令中类名找到这个类。
String fileName = args[firstArg++];
File file = new File(fileName);
String mainClassName = null;
JarFile jarFile;
try {
jarFile = new JarFile(fileName);
} catch(IOException io) {
throw new IOException("Error opening job jar: " + fileName)
.initCause(io);
}
Manifest manifest = jarFile.getManifest();
if (manifest != null) {
mainClassName = manifest.getMainAttributes().getValue("Main-Class");
}
jarFile.close();
三,执行HelloWorld在Hadoop 环境中:
上传HelloWorld.jar到Linux:
hadoop@foreveryy:~/tmp$ ls -ltr
total 4
-rw-rw-r-- 1 hadoop hadoop 1348 Nov 24 04:40 HelloWrold.jar
执行jar 在hadoop环境中:
hadoop@foreveryy:~/tmp$ hadoop jar HelloWrold.jar HelloWorld
Warning: $HADOOP_HOME is deprecated.
Hello hadoop!