【hadoop】java文件的编译运行
Hadoop中Jar包:
对于hadoop的几个Jar包中,最好都添加到Java的环境变量CLASSPATH中去,能免去编译运行时的手动添加的Jar包依赖.
CLASSPATH=$CLASSPATH:$HADOOP_HOME/hadoop-core-0.20.203.0.jar
export CLASSPATH
1. 编译成jar文件:
javac -classpath /home/admin/hadoop/hadoop-core-0.20.203.0.jar WordCount.java -d /home/admin/WordCount
jar cvf WordCount.jar *.class
用hadoop jar *.jar args[0] args[1] 运行
类似hadoop自带的wordcount实例
hadoop jar hadoop-examples-0.20.203.0.jar wordcount inputfile outputfile
2. 编译成class文件:
FileSystemCat.java:
import java.net.URI;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataInputStream;
public class FileSystemCat {
public static void main(String[] args) throws Exception{
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri),conf);
FSDataInputStream in = null;
try
{
in = fs.open(new Path(uri));
IOUtils.copyBytes(in,System.out,4096,false);
in.seek(1);
IOUtils.copyBytes(in,System.out,4096,false);
}finally{
IOUtils.closeStream(in);
}
}
}
编译FileSystemCat.class文件到当前目录并运行:
javac -classpath $HADOOP_HOME/hadoop-core-0.20.203.0.jar FileSystemCat.java -d .
在当前目录运行时,需修改bin/hadoop 命令脚本
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH:." $CLASS "$@"
指定classpath加载当前目录,否则会报如下异常
Exception in thread "main" java.lang.NoClassDefFoundError: FileSystemCat
Caused by: java.lang.ClassNotFoundException: FileSystemCat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: FileSystemCat. Program will exit.
运行命令 :hadoop FileSystemCat readme.txt