windows本地开发MapReduce提交到集群

最新推荐文章于 2021-05-29 22:26:08 发布

风少年~

最新推荐文章于 2021-05-29 22:26:08 发布

阅读量586

点赞数

分类专栏：大数据

本文链接：https://blog.csdn.net/albg_boy/article/details/86528745

版权

大数据专栏收录该内容

23 篇文章 1 订阅

订阅专栏

概述
准备
1. JDK安装及环境变量

参考：https://jingyan.baidu.com/article/f96699bb163475894e3c1be4.html

1. 下载hadoop安装包

链接：https://archive.apache.org/dist/hadoop/common/

备注：我选用的是hadoop-2.6.5.tar.gz

1. Hadoop环境变量

HADOOP_HOME

D:\soft\developsoft\Hadoop\hadoop-2.6.5

Path后面追加 %HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;

1. 修改配置文件
进入目录D:\HadoopCilent\hadoop-2.6.5\etc\hadoop
修改core-site.xml，找一个节点就好

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://10.111.32.165:8020</value>

</property>

</configuration>

修改hdfs-site.xml

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

修改mapred-site.xml

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

修改yarn-site.xml，找一下测试集群的resourcemanager的IP

<property>

<name>yarn.resourcemanager.hostname</name>

<value>10.111.32.166</value>

</property>

1. winutils安装
按照如上操作出现如下报错

ava.io.IOException: Could not locate executable D:\HadoopCilent\hadoop-2.6.5\bin\winutils.exe in the Hadoop binaries.

at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)

at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)

at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)

at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)

at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)

at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

19/01/17 17:07:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

winutils下载 https://github.com/steveloughran/winutils。下载响应winutils.exe放在D:\HadoopCilent\hadoop-6.5\bin下，再执行如下命令看成功了：

>hadoop fs -ls /

19/01/17 17:10:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 12 items

drwxrwxrwx - yarn hadoop 0 2018-05-16 11:56 /app-logs

drwxr-xr-x - hdfs hdfs 0 2017-11-22 02:33 /apps

drwxr-xr-x - yarn hadoop 0 2017-11-22 02:30 /ats

drwxrwxrwx - hdfs hdfs 0 2018-01-03 21:42 /flume

drwxr-xr-x - hdfs hdfs 0 2017-11-22 02:30 /hdp

drwx------ - livy hdfs 0 2018-03-23 11:36 /livy2-recovery

drwxr-xr-x - hdfs hdfs 0 2018-04-27 12:25 /log

drwxr-xr-x - mapred hdfs 0 2017-11-22 02:30 /mapred

drwxrwxrwx - mapred hadoop 0 2017-11-22 02:30 /mr-history

drwxrwxrwx - spark hadoop 0 2019-01-17 17:10 /spark2-history

drwxrwxrwx - hdfs hdfs 0 2019-01-17 14:45 /tmp

drwxr-xr-x - hdfs hdfs 0 2018-05-16 11:55 /user

1. 跑一个wordcount
先做一下IP映射，打开C:\Windows\System32\drivers\etc\ hosts:

10.111.32.165 hdp165.tmtgeo.com

10.111.32.166 hdp166.tmtgeo.com

10.111.32.168 hdp168.tmtgeo.com

从这里考一个demo：https://hadoop.apache.org/docs/r7.7/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
Pom文件：

<dependencies>
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs-client</artifactId>
            <version>2.8.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.6.5</version>
        </dependency>
        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.2</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <appendAssemblyId>false</appendAssemblyId>
                    <descriptorRefs>
                      <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            
                            <mainClass>com.geotmt.dw.WordCount2</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>assembly</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                    
                    <encoding>utf-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Main函数追加如下内容：

System.setProperty("hadoop.home.dir", "D:\\soft\\developsoft\\Hadoop\\hadoop-2.6.5");

System.setProperty("HADOOP_USER_NAME", "hdfs");

conf.set("mapreduce.app-submission.cross-platform", "true");

conf.set("mapreduce.job.ubertask.enable", "true");

conf.set("fs.defaultFS","hdfs://10.111.32.165:8020");

conf.set("mapreduce.app-submission.cross-platform", "true");

conf.set("mapreduce.job.ubertask.enable", "true");

conf.set("fs.defaultFS","hdfs://10.111.32.165:8020");

复制测试集群的配置文件放在项目resources中

运行可以看见：

2019-01-17 16:22:00,924 WARN [main] shortcircuit.DomainSocketFactory (DomainSocketFactory.java:<init>(117)) - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.

2019-01-17 16:22:02,191 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(292)) - Timeline service address: http://hdp166.tmtgeo.com:8188/ws/v1/timeline/

2019-01-17 16:22:02,529 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at hdp166.tmtgeo.com/10.111.32.166:8050

2019-01-17 16:22:03,008 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2019-01-17 16:22:12,404 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1

2019-01-17 16:22:12,845 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(199)) - number of splits:1

2019-01-17 16:22:13,212 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(288)) - Submitting tokens for job: job_1529566523883_6787

2019-01-17 16:22:13,728 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(251)) - Submitted application application_1529566523883_6787

2019-01-17 16:22:13,771 INFO [main] mapreduce.Job (Job.java:submit(1301)) - The url to track the job: http://hdp166.tmtgeo.com:8088/proxy/application_1529566523883_6787/

2019-01-17 16:22:13,772 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1346)) - Running job: job_1529566523883_6787

2019-01-17 16:22:22,007 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - Job job_1529566523883_6787 running in uber mode : false

2019-01-17 16:22:22,011 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) - map 0% reduce 0%

2019-01-17 16:22:29,271 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) - map 100% reduce 0%

2019-01-17 16:22:36,436 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1374)) - map 100% reduce 100%

2019-01-17 16:22:38,516 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Job job_1529566523883_6787 completed successfully

2019-01-17 16:22:38,628 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1392)) - Counters: 49

File System Counters

FILE: Number of bytes read=87

FILE: Number of bytes written=285801

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=136

HDFS: Number of bytes written=45

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=42024

Total time spent by all reduces in occupied slots (ms)=75872

Total time spent by all map tasks (ms)=5253

Total time spent by all reduce tasks (ms)=4742

Total vcore-milliseconds taken by all map tasks=5253

Total vcore-milliseconds taken by all reduce tasks=4742

Total megabyte-milliseconds taken by all map tasks=43032576

Total megabyte-milliseconds taken by all reduce tasks=77692928

Map-Reduce Framework

Map input records=9

Map output records=9

Map output bytes=63

Map output materialized bytes=87

Input split bytes=109

Combine input records=9

Combine output records=9

Reduce input groups=9

Reduce shuffle bytes=87

Reduce input records=9

Reduce output records=9

Spilled Records=18

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=63

CPU time spent (ms)=2300

Physical memory (bytes) snapshot=2806378496

Virtual memory (bytes) snapshot=23102570496

Total committed heap usage (bytes)=6315048960

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=27

File Output Format Counters

Bytes Written=45

Process finished with exit code 0