windows本地开发MapReduce提交到集群

  1. 概述
  2. 准备
    1. JDK安装及环境变量

参考:https://jingyan.baidu.com/article/f96699bb163475894e3c1be4.html

    1. 下载hadoop安装包

链接:https://archive.apache.org/dist/hadoop/common/

备注:我选用的是hadoop-2.6.5.tar.gz

    1. Hadoop环境变量

HADOOP_HOME

D:\soft\developsoft\Hadoop\hadoop-2.6.5

 

Path后面追加 %HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;

    1. 修改配置文件
  1. 进入目录D:\HadoopCilent\hadoop-2.6.5\etc\hadoop
  2. 修改core-site.xml,找一个节点就好

<configuration>

         <property>

                   <name>fs.defaultFS</name>

                   <value>hdfs://10.111.32.165:8020</value>

         </property>

</configuration>

 

  1. 修改hdfs-site.xml

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

 

  1. 修改mapred-site.xml

<property>

 <name>mapreduce.framework.name</name>

 <value>yarn</value>

</property>

 

  1. 修改yarn-site.xml,找一下测试集群的resourcemanager的IP

<property>

 <name>yarn.resourcemanager.hostname</name>

 <value>10.111.32.166</value>

</property>

 

    1. winutils安装
  1. 按照如上操作出现如下报错

ava.io.IOException: Could not locate executable D:\HadoopCilent\hadoop-2.6.5\bin\winutils.exe in the Hadoop binaries.

        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)

        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)

        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)

        at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)

        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)

        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)

        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

19/01/17 17:07:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

  1. winutils下载 https://github.com/steveloughran/winutils。下载响应winutils.exe放在D:\HadoopCilent\hadoop-6.5\bin下,再执行如下命令看成功了:

 

>hadoop fs -ls /

19/01/17 17:10:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 12 items

drwxrwxrwx   - yarn   hadoop          0 2018-05-16 11:56 /app-logs

drwxr-xr-x   - hdfs   hdfs            0 2017-11-22 02:33 /apps

drwxr-xr-x   - yarn   hadoop          0 2017-11-22 02:30 /ats

drwxrwxrwx   - hdfs   hdfs            0 2018-01-03 21:42 /flume

drwxr-xr-x   - hdfs   hdfs            0 2017-11-22 02:30 /hdp

drwx------   - livy   hdfs            0 2018-03-23 11:36 /livy2-recovery

drwxr-xr-x   - hdfs   hdfs            0 2018-04-27 12:25 /log

drwxr-xr-x   - mapred hdfs            0 2017-11-22 02:30 /mapred

drwxrwxrwx   - mapred hadoop          0 2017-11-22 02:30 /mr-history

drwxrwxrwx   - spark  hadoop          0 2019-01-17 17:10 /spark2-history

drwxrwxrwx   - hdfs   hdfs            0 2019-01-17 14:45 /tmp

drwxr-xr-x   - hdfs   hdfs            0 2018-05-16 11:55 /user

 

    1. 跑一个wordcount
  1. 先做一下IP映射,打开C:\Windows\System32\drivers\etc\ hosts:

10.111.32.165 hdp165.tmtgeo.com

10.111.32.166 hdp166.tmtgeo.com

10.111.32.168 hdp168.tmtgeo.com

 

  1. 从这里考一个demo:https://hadoop.apache.org/docs/r7.7/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
  2. Pom文件:

 

<dependencies>
       
<!--<dependency>-->
        <!--<groupId>org.apache.hadoop</groupId>-->
        <!--<artifactId>hadoop-hdfs</artifactId>-->
        <!--<version>2.6.5</version>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>org.apache.hadoop</groupId>-->
        <!--<artifactId>hadoop-common</artifactId>-->
        <!--<version>2.6.5</version>-->
        <!--</dependency>-->
        <!--&lt;!&ndash;<dependency>&ndash;&gt;-->
        <!--&lt;!&ndash;<groupId>org.apache.hadoop</groupId>&ndash;&gt;-->
        <!--&lt;!&ndash;<artifactId>hadoop-mapreduce-client-core</artifactId>&ndash;&gt;-->
        <!--&lt;!&ndash;<version>2.6.5</version>&ndash;&gt;-->
        <!--&lt;!&ndash;</dependency>&ndash;&gt;-->
        <!--<dependency>-->
        <!--<groupId>org.apache.hadoop</groupId>-->
        <!--<artifactId>hadoop-client</artifactId>-->
        <!--<version>2.6.5</version>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>org.apache.hadoop</groupId>-->
        <!--<artifactId>hadoop-mapreduce-client-common</artifactId>-->
        <!--<version>2.6.5</version>-->
        <!--</dependency>-->
       
<dependency>
            <groupId>
org.apache.hadoop</groupId>
            <artifactId>
hadoop-common</artifactId>
            <version>
2.6.5</version>
        </dependency>
        <dependency>
            <groupId>
org.apache.hadoop</groupId>
            <artifactId>
hadoop-client</artifactId>
            <version>
2.6.5</version>
        </dependency>
        <dependency>
            <groupId>
org.apache.hadoop</groupId>
            <artifactId>
hadoop-hdfs</artifactId>
            <version>
2.6.5</version>
        </dependency>
        <dependency>
            <groupId>
org.apache.hadoop</groupId>
            <artifactId>
hadoop-hdfs-client</artifactId>
            <version>
2.8.0</version>
        </dependency>
        <dependency>
            <groupId>
org.apache.hadoop</groupId>
            <artifactId>
hadoop-mapreduce-client-jobclient</artifactId>
            <version>
2.6.5</version>
        </dependency>
        <dependency>
            <groupId>
commons-cli</groupId>
            <artifactId>
commons-cli</artifactId>
            <version>
1.2</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <artifactId>
maven-assembly-plugin</artifactId>
                <configuration>
                    <appendAssemblyId>
false</appendAssemblyId>
                    <descriptorRefs>
                        <descriptorRef>
jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                           
<!-- 此处指定main方法入口的class -->
                            
<mainClass>com.geotmt.dw.WordCount2</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>
make-assembly</id>
                        <phase>
package</phase>
                        <goals>
                            <goal>
assembly</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            
<!--plugin元素包含描述插件所需要的信息。-->
           
<plugin>
                <groupId>
org.apache.maven.plugins</groupId>
               
<!--Compiler 插件包含编译源代码和单元测试代码的目标-->
               
<artifactId>maven-compiler-plugin</artifactId>
                <version>
3.3</version>
                <configuration>
                    <source>
1.7</source>
                    <target>
1.7</target>
                   
<!-- “编码 GBK 的不可映射字符”问题的解决 -->
                   
<encoding>utf-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

 

 

  1. Main函数追加如下内容:

System.setProperty("hadoop.home.dir", "D:\\soft\\developsoft\\Hadoop\\hadoop-2.6.5");

System.setProperty("HADOOP_USER_NAME", "hdfs");

conf.set("mapreduce.app-submission.cross-platform", "true");

conf.set("mapreduce.job.ubertask.enable", "true");

conf.set("fs.defaultFS","hdfs://10.111.32.165:8020");

conf.set("mapreduce.app-submission.cross-platform", "true");

conf.set("mapreduce.job.ubertask.enable", "true");

conf.set("fs.defaultFS","hdfs://10.111.32.165:8020");

 

  1. 复制测试集群的配置文件放在项目resources中

  1. 运行可以看见:

2019-01-17 16:22:00,924 WARN  [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.

2019-01-17 16:22:02,191 INFO  [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(292)) - Timeline service address: http://hdp166.tmtgeo.com:8188/ws/v1/timeline/

2019-01-17 16:22:02,529 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp166.tmtgeo.com/10.111.32.166:8050

2019-01-17 16:22:03,008 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2019-01-17 16:22:12,404 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1

2019-01-17 16:22:12,845 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(199)) - number of splits:1

2019-01-17 16:22:13,212 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(288)) - Submitting tokens for job: job_1529566523883_6787

2019-01-17 16:22:13,728 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(251)) - Submitted application application_1529566523883_6787

2019-01-17 16:22:13,771 INFO  [main] mapreduce.Job (Job.java:submit(1301)) - The url to track the job: http://hdp166.tmtgeo.com:8088/proxy/application_1529566523883_6787/

2019-01-17 16:22:13,772 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1346)) - Running job: job_1529566523883_6787

2019-01-17 16:22:22,007 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - Job job_1529566523883_6787 running in uber mode : false

2019-01-17 16:22:22,011 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) -  map 0% reduce 0%

2019-01-17 16:22:29,271 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) -  map 100% reduce 0%

2019-01-17 16:22:36,436 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) -  map 100% reduce 100%

2019-01-17 16:22:38,516 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Job job_1529566523883_6787 completed successfully

2019-01-17 16:22:38,628 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1392)) - Counters: 49

         File System Counters

                   FILE: Number of bytes read=87

                   FILE: Number of bytes written=285801

                   FILE: Number of read operations=0

                   FILE: Number of large read operations=0

                   FILE: Number of write operations=0

                   HDFS: Number of bytes read=136

                   HDFS: Number of bytes written=45

                   HDFS: Number of read operations=6

                   HDFS: Number of large read operations=0

                   HDFS: Number of write operations=2

         Job Counters

                   Launched map tasks=1

                   Launched reduce tasks=1

                   Data-local map tasks=1

                   Total time spent by all maps in occupied slots (ms)=42024

                   Total time spent by all reduces in occupied slots (ms)=75872

                   Total time spent by all map tasks (ms)=5253

                   Total time spent by all reduce tasks (ms)=4742

                   Total vcore-milliseconds taken by all map tasks=5253

                   Total vcore-milliseconds taken by all reduce tasks=4742

                   Total megabyte-milliseconds taken by all map tasks=43032576

                   Total megabyte-milliseconds taken by all reduce tasks=77692928

         Map-Reduce Framework

                   Map input records=9

                   Map output records=9

                   Map output bytes=63

                   Map output materialized bytes=87

                   Input split bytes=109

                   Combine input records=9

                   Combine output records=9

                   Reduce input groups=9

                   Reduce shuffle bytes=87

                   Reduce input records=9

                   Reduce output records=9

                   Spilled Records=18

                   Shuffled Maps =1

                   Failed Shuffles=0

                   Merged Map outputs=1

                   GC time elapsed (ms)=63

                   CPU time spent (ms)=2300

                   Physical memory (bytes) snapshot=2806378496

                   Virtual memory (bytes) snapshot=23102570496

                   Total committed heap usage (bytes)=6315048960

         Shuffle Errors

                   BAD_ID=0

                   CONNECTION=0

                   IO_ERROR=0

                   WRONG_LENGTH=0

                  WRONG_MAP=0

                   WRONG_REDUCE=0

         File Input Format Counters

                   Bytes Read=27

         File Output Format Counters

                   Bytes Written=45

 

Process finished with exit code 0

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值