hadoop on yarn 入门系列1-伪分布式环境搭建并运行wordcount

作为学习使用,直接使用伪分布式
搭建学习环境,下载hadoop2.6.0以及源码
1 配置文件:
1.1 hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>  <!--只有一台机器,所以备份也是1-->
    </property>
    <property>
            <name>dfs.permissions</name>
            <value>false</value>
    </property>
</configuration>

1.2 mapred-site.xml,声明为yarn,后边两个可以不要

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
   <property>
    <name>mapreduce.jobhistory.address </name>
    <value>localhost:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>localhost:19888</value>
  </property>
</configuration>

1.3 core-site.xml:指定hdfs的地址

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/shenyb/hadoop_datanode/data</value>
        <description>A base for other temporary directories.</description>
    </property>
</configuration>

1.4 yarn.site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
         <name>yarn.log.server.url</name>
         <value>http://localhost:19888/jobhistory/logs</value>
    </property>
    <property>
       <name>yarn.nodemanager.delete.debug-delay-sec</name>
       <value>1200</value>
    </property>
</configuration>

**说明:最好把hadoop-env.sh中JAVA_HOME配置成绝对路径,因为mac和linux默认的 /usr/java命令路径不一致
数据路径如果使用默认的,经常会被系统清掉,建议使用自定义的目录。**

2运行wordcount
初始化namenode,启动hadoop on yarn,上传文件到hdfs,运行
hadoop git:(master) ✗ hadoop namenode -format
➜ hadoop-2.6.0 git:(master) ✗ start-all.sh

➜  hadoop-2.6.0 git:(master) ✗ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting namenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-shenyb-namenode-shenybdeMacBook-Pro.local.out
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting datanode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-shenyb-datanode-shenybdeMacBook-Pro.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/share/hadoop-2.6.0/logs/hadoop-shenyb-secondarynamenode-shenybdeMacBook-Pro.local.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/share/hadoop-2.6.0/logs/yarn-shenyb-resourcemanager-shenybdeMacBook-Pro.local.out
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting nodemanager, logging to /usr/local/share/hadoop-2.6.0/logs/yarn-shenyb-nodemanager-shenybdeMacBook-Pro.local.out
➜  hadoop-2.6.0 git:(master) ✗ jps
4506 NameNode
4848 NodeManager
4890 Jps
4770 ResourceManager
4667 SecondaryNameNode
4578 DataNode
➜  hadoop-2.6.0 git:(master) ✗ hadoop hdfs -mkdir /input
错误: 找不到或无法加载主类 hdfs
➜  hadoop-2.6.0 git:(master) ✗ hadoop fs -mkdir /input
➜  hadoop-2.6.0 git:(master) ✗ hadoop fs -put input/*.txt /input
➜  hadoop-2.6.0 git:(master) ✗ hadoop fs -ls /input
Found 2 items
-rw-r--r--   1 shenyb supergroup         36 2016-11-24 22:38 /input/a.txt
-rw-r--r--   1 shenyb supergroup         30 2016-11-24 22:38 /input/b.txt
➜  hadoop-2.6.0 git:(master) ✗ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output3
出现 /bin/java 找不到问题,只需要将hadoop_env.sh 中java
路径改成绝对路径,重启即可。
16/11/24 22:39:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/11/24 22:39:42 INFO input.FileInputFormat: Total input paths to process : 2
16/11/24 22:39:42 INFO mapreduce.JobSubmitter: number of splits:2
16/11/24 22:39:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479998223646_0001
16/11/24 22:39:42 INFO impl.YarnClientImpl: Submitted application application_1479998223646_0001
16/11/24 22:39:42 INFO mapreduce.Job: The url to track the job: http://shenybdeMacBook-Pro.local:8088/proxy/application_1479998223646_0001/
16/11/24 22:39:42 INFO mapreduce.Job: Running job: job_1479998223646_0001
16/11/24 22:39:49 INFO mapreduce.Job: Job job_1479998223646_0001 running in uber mode : false
16/11/24 22:39:49 INFO mapreduce.Job:  map 0% reduce 0%
16/11/24 22:39:55 INFO mapreduce.Job:  map 50% reduce 0%
16/11/24 22:39:56 INFO mapreduce.Job:  map 100% reduce 0%
16/11/24 22:40:01 INFO mapreduce.Job:  map 100% reduce 100%
16/11/24 22:40:01 INFO mapreduce.Job: Job job_1479998223646_0001 completed successfully
16/11/24 22:40:01 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=131
        FILE: Number of bytes written=317610
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=262
        HDFS: Number of bytes written=45
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=7268
        Total time spent by all reduces in occupied slots (ms)=2729
        Total time spent by all map tasks (ms)=7268
        Total time spent by all reduce tasks (ms)=2729
        Total vcore-seconds taken by all map tasks=7268
        Total vcore-seconds taken by all reduce tasks=2729
        Total megabyte-seconds taken by all map tasks=7442432
        Total megabyte-seconds taken by all reduce tasks=2794496
    Map-Reduce Framework
        Map input records=10
        Map output records=17
        Map output bytes=133
        Map output materialized bytes=137
        Input split bytes=196
        Combine input records=17
        Combine output records=13
        Reduce input groups=8
        Reduce shuffle bytes=137
        Reduce input records=13
        Reduce output records=8
        Spilled Records=26
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=60
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=533200896
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=66
    File Output Format Counters
        Bytes Written=45
➜  hadoop-2.6.0 git:(master) ✗ ls
LICENSE.txt  README.txt   etc          input        job.xml      lib          logs         sbin
NOTICE.txt   bin          include      job.jar      jobSubmitDir libexec      output       share
➜  hadoop-2.6.0 git:(master) ✗ hadoop fs -ls /output3
Found 2 items
-rw-r--r--   1 shenyb supergroup          0 2016-11-24 22:39 /output3/_SUCCESS
-rw-r--r--   1 shenyb supergroup         45 2016-11-24 22:39 /output3/part-r-00000
➜  hadoop-2.6.0 git:(master) ✗ hadoop fs -get /output3 output3
➜  hadoop-2.6.0 git:(master) ✗ ls
LICENSE.txt  README.txt   etc          input        job.xml      lib          logs         output3      share
NOTICE.txt   bin          include      job.jar      jobSubmitDir libexec      output       sbin
➜  hadoop-2.6.0 git:(master) ✗ cat output3/part-r-00000
and 2
do  1
i   2
love    4
me  2
we  1
what    1
you 4
➜  hadoop-2.6.0 git:(master) ✗

如果机器是32位可以直接使用官方编译好的hadoop,如果是64则需要自己编译:
查看源码下的BUILDING.txt,就可以编译成功。

Building on OS/X

----------------------------------------------------------------------------------

A one-time manual step is required to enable building Hadoop OS X with Java 7
every time the JDK is updated.
see: https://issues.apache.org/jira/browse/HADOOP-9350

$ sudo mkdir `/usr/libexec/java_home`/Classes
$ sudo ln -s `/usr/libexec/java_home`/lib/tools.jar `/usr/libexec/java_home`/Classes/classes.jar

----------------------------------------------------------------------------------

Building on Windows

----------------------------------------------------------------------------------
Requirements:

* Windows System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer
* Windows SDK or Visual Studio 2010 Professional
* Unix command-line tools from GnuWin32 or Cygwin: sh, mkdir, rm, cp, tar, gzip
* zlib headers (if building native code bindings for zlib)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
Do not use Visual Studio Express.  It does not support compiling for 64-bit,
which is problematic if running a 64-bit system.  The Windows SDK is free to
download here:

http://www.microsoft.com/en-us/download/details.aspx?id=8279

----------------------------------------------------------------------------------
Building:

Keep the source code tree in a short path to avoid running into problems related
to Windows maximum path length limitation.  (For example, C:\hdc).

Run builds from a Windows SDK Command Prompt.  (Start, All Programs,
Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt.)

JAVA_HOME must be set, and the path must not contain spaces.  If the full path
would contain spaces, then use the Windows short path instead.

You must set the Platform environment variable to either x64 or Win32 depending
on whether you're running a 64-bit or 32-bit system.  Note that this is
case-sensitive.  It must be "Platform", not "PLATFORM" or "platform".
Environment variables on Windows are usually case-insensitive, but Maven treats
them as case-sensitive.  Failure to set this environment variable correctly will
cause msbuild to fail while building the native code in hadoop-common.

set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)

Several tests require that the user must have the Create Symbolic Links
privilege.

All Maven goals are the same as described above with the exception that
native code is built by enabling the 'native-win' Maven profile. -Pnative-win
is enabled by default when building on Windows since the native components
are required (not optional) on Windows.

If native code bindings for zlib are required, then the zlib headers must be
deployed on the build machine.  Set the ZLIB_HOME environment variable to the
directory containing the headers.

set ZLIB_HOME=C:\zlib-1.2.7

At runtime, zlib1.dll must be accessible on the PATH.  Hadoop has been tested
with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in
the zlib 1.2.7 source tree.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱美事爱生活

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值