http://www.it165.net/admin/html/201505/5427.html
一、win7下
(一)、安装环境及安装包
win7 32 bit
jdk7
eclipse-java-juno-SR2-win32.zip
hadoop-2.2.0.tar.gz
hadoop-eclipse-plugin-2.2.0.jar
hadoop-common-2.2.0-bin.rar
(二)、安装
默认已经安装好了jdk、eclipse以及配置好了hadoop伪分布模式
1、拷贝hadoop-eclipse-plugin-2.2.0.jar插件到Eclipse安装目录的子目录plugins下,重启Eclipse。
2、设置环境变量
3、配置eclipse中hadoop的安装目录
解压hadoop-2.2.0.tar.gz
4、解压hadoop-common-2.2.0-bin.rar
复制里面的文件到hadoop安装目录的bin文件夹下
(三)、在win7下,MapReuce On Yarn执行
新建一个工程
点击window–>show view–>Map/Reduce Locations
点击New Hadoop Location……
添加如下配置,点击完成。
自此,你就可以查看HDFS中的相关内容了。
编写mapreduce程序
在src目录下添加文件log4j.properties,内容如下:
1.
<code
class
=
" hljs avrasm"
>log4j.rootLogger=debug,appender1
2.
3.
log4j.appender.appender1=org.apache.log4j.ConsoleAppender
4.
5.
log4j.appender.appender1.layout=org.apache.log4j.TTCCLayout
6.
</code>
运行,结果如下:
二、在Linux下
(一)在Linux下,MapReuce On Yarn上
运行
01.
<code
class
=
" hljs vhdl"
>[root
@liguodong
Documents]# yarn jar test.jar hdfs:
//liguodong:8020/hello hdfs://liguodong:8020/output
02.
15
/
05
/
03
03
:
16
:
12
INFO client.RMProxy: Connecting to ResourceManager at /
0.0
.
0.0
:
8032
03.
………………
04.
15
/
05
/
03
03
:
16
:
13
INFO mapreduce.JobSubmitter: Submitting tokens
for
job: job_1430648117067_0001
05.
15
/
05
/
03
03
:
16
:
13
INFO impl.YarnClientImpl: Submitted application application_1430648117067_0001 to ResourceManager at /
0.0
.
0.0
:
8032
06.
15
/
05
/
03
03
:
16
:
13
INFO mapreduce.Job: The url to track the job: http:
//liguodong:8088/proxy/application_1430648117067_0001/
07.
15
/
05
/
03
03
:
16
:
13
INFO mapreduce.Job: Running job: job_1430648117067_0001
08.
15
/
05
/
03
03
:
16
:
21
INFO mapreduce.Job: Job job_1430648117067_0001 running in uber mode :
false
09.
15
/
05
/
03
03
:
16
:
21
INFO mapreduce.Job: map
0
% reduce
0
%
10.
15
/
05
/
03
03
:
16
:
40
INFO mapreduce.Job: map
100
% reduce
0
%
11.
15
/
05
/
03
03
:
16
:
45
INFO mapreduce.Job: map
100
% reduce
100
%
12.
15
/
05
/
03
03
:
16
:
45
INFO mapreduce.Job: Job job_1430648117067_0001 completed successfully
13.
15
/
05
/
03
03
:
16
:
45
INFO mapreduce.Job: Counters:
43
14.
File System Counters
15.
FILE: Number of bytes read=
98
16.
FILE: Number of bytes written=
157289
17.
FILE: Number of read operations=
0
18.
FILE: Number of large read operations=
0
19.
FILE: Number of write operations=
0
20.
HDFS: Number of bytes read=
124
21.
HDFS: Number of bytes written=
28
22.
HDFS: Number of read operations=
6
23.
HDFS: Number of large read operations=
0
24.
HDFS: Number of write operations=
2
25.
Job Counters
26.
Launched map tasks=
1
27.
Launched reduce tasks=
1
28.
Data-local map tasks=
1
29.
Total time spent by all maps in occupied slots (ms)=
16924
30.
Total time spent by all reduces in occupied slots (ms)=
3683
31.
Map-Reduce Framework
32.
Map input records=
3
33.
Map output records=
6
34.
Map output bytes=
80
35.
Map output materialized bytes=
98
36.
Input split bytes=
92
37.
Combine input records=
0
38.
Combine output records=
0
39.
Reduce input groups=
4
40.
Reduce shuffle bytes=
98
41.
Reduce input records=
6
42.
Reduce output records=
4
43.
Spilled Records=
12
44.
Shuffled Maps =
1
45.
Failed Shuffles=
0
46.
Merged Map outputs=
1
47.
GC time elapsed (ms)=
112
48.
CPU time spent (ms)=
12010
49.
Physical memory (bytes) snapshot=
211070976
50.
Virtual memory (bytes) snapshot=
777789440
51.
Total committed heap usage (bytes)=
130879488
52.
Shuffle Errors
53.
BAD_ID=
0
54.
CONNECTION=
0
55.
IO_ERROR=
0
56.
WRONG_LENGTH=
0
57.
WRONG_MAP=
0
58.
WRONG_REDUCE=
0
59.
File Input Format Counters
60.
Bytes Read=
32
61.
File Output Format Counters
62.
Bytes Written=
28
63.
</code>
查看结果
01.
<code
class
=
" hljs applescript"
>[root
@liguodong
Documents]# hdfs dfs -ls /
02.
Found
3
items
03.
-rw-r--r--
2
root supergroup
32
2015
-
05
-
03
03
:
15
/hello
04.
drwxr-xr-x - root supergroup
0
2015
-
05
-
03
03
:
16
/output
05.
drwx------ - root supergroup
0
2015
-
05
-
03
03
:
16
/tmp
06.
[root
@liguodong
Documents]# hdfs dfs -ls /output
07.
Found
2
items
08.
-rw-r--r--
2
root supergroup
0
2015
-
05
-
03
03
:
16
/output/_SUCCESS
09.
-rw-r--r--
2
root supergroup
28
2015
-
05
-
03
03
:
16
/output/part-r-
00000
10.
[root
@liguodong
Documents]# hdfs dfs -text /output/pa*
11.
hadoop
1
12.
hello
3
13.
me
1
14.
you
1
15.
</code>
遇到的问题
1.
<code
class
=
" hljs coffeescript"
>File /output/……… could only be replicated to
0
nodes instead of minReplication (=
1
).
2.
There are
1
datanode(s) running and no node(s) are excluded in
this
operation.</code>
在网上找了很多方法是试了没有解决,然后自己根据这句话的中文意思是只有被复制到0个副本,而不是最少的一个副本。
我将最先dfs.replication.min设置为0,但是很遗憾,后面运行之后发现必须大于0,我又改为了1。
然后再dfs.datanode.data.dir多设置了几个路径,就当是在一个系统中多次备份吧,后面发现成功了。
设置如下,在hdfs-site.xml中添加如下配置。
1.
<code
class
=
" hljs avrasm"
> <property>
2.
<name>dfs.datanode.data.dir</name>
3.
<value> file:
//${hadoop.tmp.dir}/dfs/dn,file://${hadoop.tmp.dir}/dfs/dn1,file://${hadoop.tmp.dir}/dfs/dn2
4.
</value>
5.
</property>
6.
</code>
(二)在Linux下,MapReuce On Local上
在mapred-site.xml中,添加如下配置文件。
1.
<code
class
=
" hljs xml"
><configuration>
2.
<property>
3.
<name>mapreduce.framework.name</name>
4.
<value>local</value>
5.
</property>
6.
</configuration></code>
可以不用启动ResourceManager和NodeManager。
运行
1.
<code
class
=
" hljs ruby"
>[root
@liguodong
Documents]# hadoop jar test.jar hdfs:
//liguodong:8020/hello hdfs://liguodong:8020/output</code>
三、MapReduce运行模式有多种
mapred-site.xml中
1)本地运行模式(默认)
1.
<code
class
=
" hljs xml"
><configuration>
2.
<property>
3.
<name>mapreduce.framework.name</name>
4.
<value>local</value>
5.
</property>
6.
</configuration></code>
2)运行在YARN上
1.
<code
class
=
" hljs xml"
><configuration>
2.
<property>
3.
<name>mapreduce.framework.name</name>
4.
<value>yarn</value>
5.
</property>
6.
</configuration></code>
四、Uber Mode
Uber Mode是针对于在Hadoop2.x中,对于MapReuduce Job小作业来说的一种优化方式(重用JVM的方式)。
小作业指的是MapReduce Job 运行处理的数据量,当数据量(大小)小于 HDFS 存储数据时block的大小(128M)。
默认是没有启动的。
mapred-site.xml中
1.
<code
class
=
" hljs xml"
><name>mapreduce.job.ubertask.enable</name>
2.
<value>
true
</value></code>