转载请注明出处,来源地址:http://blog.csdn.net/lastsweetop/article/details/8964520
1.前言
Google I/O 2013开发者大会上被android studio震撼,没想到intellij IDEA变的如此强大,我一直是eclipse
的忠实粉丝,但已经为intellij IDEA折服,果断下载安装调试,确实很给力,但居然没有hadoop插件,这点
有点小郁闷,因我最近正在研究hadoop,于是决定自己实现远程调试,代码全部内容托管在github上。
项目使用maven管理,如果对maven不是很熟悉可以看下我的专栏
工程目录如下:
2.第一步:配置ssh
这些配置网上已经一堆了,我这里简单描述一下
执行ssh-keygen -t rsa
keygen -t rsa
会在~/.ssh/id_rsa.pub 文件
将此文件通过scp远程拷贝到namenode节点
scp ~/.ssh/id_rsa.pub hadoop@namenode:~/.ssh/
登陆到namenode
ssh hadoop@namenode
将开发环境的id_rsa.pub文件拷贝到authorized_keys下
cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
ssh无密码登陆已经完成
3.第二步:编写脚本
deploy.sh脚本
#!/bin/sh
echo "deploy jar"
scp ../target/styhadoop-ch2-1.0.0-SNAPSHOT.jar hadoop@namenode:~/test/
echo "deploy run.sh"
scp run.sh hadoop@namenode:~/test/
echo "change authority"
ssh hadoop@namenode "chmod 755 ~/test/run.sh"
echo "start run.sh"
ssh hadoop@namenode "~/test/run.sh"
run.sh脚本
#!/bin/sh
echo "add jar to classpath"
export HADOOP_CLASSPATH=~/test/styhadoop-ch2-1.0.0-SNAPSHOT.jar
echo "run hadoop task"
~/hadoop/bin/hadoop com.sweetop.styhadoop.MaxTemperature input/ output/
4.第三步:配置pom.xml
使用maven-antrun-plugin执行脚本,将其绑定再verify生命周期
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.7</version>
<executions>
<execution>
<id>hadoop remote run</id>
<phase>verify</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target name="test">
<exec dir="${basedir}/shell" executable="bash">
<arg value="deploy.sh"></arg>
</exec>
</target>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
5.hdfs文件准备
[hadoop@namenode test]$hadoop fs -mkdir /user
[hadoop@namenode test]$hadoop fs -mkdir /user/hadoop/
[hadoop@namenode test]$hadoop fs -put input /user/hadoop/
[hadoop@namenode test]$hadoop fs -lsr /usr/hadoop
6.执行结果
test:
[exec] deploy jar
[exec] deploy run.sh
[exec] change authority
[exec] start run.sh
[exec] add jar to classpath
[exec] run hadoop task
[exec] 13/05/23 11:36:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
[exec] 13/05/23 11:36:28 INFO input.FileInputFormat: Total input paths to process : 2
[exec] 13/05/23 11:36:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
[exec] 13/05/23 11:36:28 WARN snappy.LoadSnappy: Snappy native library not loaded
[exec] 13/05/23 11:36:29 INFO mapred.JobClient: Running job: job_201305032210_0003
[exec] 13/05/23 11:36:30 INFO mapred.JobClient: map 0% reduce 0%
[exec] 13/05/23 11:36:46 INFO mapred.JobClient: map 100% reduce 0%
[exec] 13/05/23 11:37:04 INFO mapred.JobClient: map 100% reduce 100%
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Job complete: job_201305032210_0003
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Counters: 29
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Job Counters
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Launched reduce tasks=1
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19771
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Launched map tasks=2
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Data-local map tasks=2
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13494
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: File Output Format Counters
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Bytes Written=8
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: FileSystemCounters
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: FILE_BYTES_READ=131296
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: HDFS_BYTES_READ=1777394
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=327106
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=8
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: File Input Format Counters
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Bytes Read=1777168
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map-Reduce Framework
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output materialized bytes=131302
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map input records=13130
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce shuffle bytes=65656
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Spilled Records=26258
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output bytes=105032
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: CPU time spent (ms)=6030
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total committed heap usage (bytes)=379518976
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Combine input records=0
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=226
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce input records=13129
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce input groups=1
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Combine output records=0
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=469196800
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce output records=1
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1723944960
[exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output records=13129