Hadoop Mapreduce Java操作实例

Mapreduce简介:

http://blog.csdn.net/admin1973/article/details/60956943

部分内容转载自:

http://my.oschina.net/itblog/blog/275294

分析MapReduce执行过程

    MapReduce运行的时候,会通过Mapper运行的任务读取HDFS中的数据文件,然后调用自己的方法,处理数据,最后输出。Reducer任务会接收Mapper任务输出的数据,作为自己的输入数据,调用自己的方法,最后输出到HDFS的文件中。整个流程如图:


Mapper任务的执行过程详解

每个Mapper任务是一个Java进程,它会读取HDFS中的文件,解析成很多的键值对,经过我们覆盖的map方法处理后,转换为很多的键值对再输出。整个Mapper任务的处理过程又可以分为以下几个阶段,

  1. 第一阶段是把输入文件按照一定的标准分片(InputSplit),每个输入片的大小是固定的。默认情况下,输入片(InputSplit)的大小与数据块(Block)的大小是相同的。如果数据块(Block)的大小是默认值64MB,输入文件有两个,一个是32MB,一个是72MB。那么小的文件是一个输入片,大文件会分为两个数据块,那么是两个输入片。一共产生三个输入片。每一个输入片由一个Mapper进程处理。这里的三个输入片,会有三个Mapper进程处理。

  2. 第二阶段是对输入片中的记录按照一定的规则解析成键值对。有个默认规则是把每一行文本内容解析成键值对。“键”是每一行的起始位置(单位是字节),“值”是本行的文本内容。

  3. 第三阶段是调用Mapper类中的map方法。第二阶段中解析出来的每一个键值对,调用一次map方法。如果有1000个键值对,就会调用1000次map方法。每一次调用map方法会输出零个或者多个键值对。

  4. 第四阶段是按照一定的规则对第三阶段输出的键值对进行分区。比较是基于键进行的。比如我们的键表示省份(如北京、上海、山东等),那么就可以按照不同省份进行分区,同一个省份的键值对划分到一个区中。默认是只有一个区分区的数量就是Reducer任务运行的数量。默认只有一个Reducer任务。

  5. 第五阶段是对每个分区中的键值对进行排序。首先,按照键进行排序,对于键相同的键值对,按照值进行排序。比如三个键值对<2,2>、<1,3>、<2,1>,键和值分别是整数。那么排序后的结果是<1,3>、<2,1>、<2,2>。如果有第六阶段,那么进入第六阶段;如果没有,直接输出到本地的Linux文件中。

  6. 第六阶段是对数据进行归约处理,也就是reduce处理。键相等的键值对会调用一次reduce方法。经过这一阶段,数据量会减少。归约后的数据输出到本地的linxu文件中。本阶段默认是没有的,需要用户自己增加这一阶段的代码

Reducer任务的执行过程详解

每个Reducer任务是一个java进程。Reducer任务接收Mapper任务的输出,归约处理后写入到HDFS中,可以分为如下图所示的几个阶段。


  1. 第一阶段是Reducer任务会主动从Mapper任务复制其输出的键值对。Mapper任务可能会有很多,因此Reducer会复制多个Mapper的输出。

  2. 第二阶段是把复制到Reducer本地数据,全部进行合并,即把分散的数据合并成一个大的数据。再对合并后的数据排序。

  3. 第三阶段是对排序后的键值对调用reduce方法。键相等的键值对调用一次reduce方法,每次调用会产生零个或者多个键值对。最后把这些输出的键值对写入到HDFS文件中。

在整个MapReduce程序的开发过程中,我们最大的工作量是覆盖map函数和覆盖reduce函数。

键值对的编号

在对Mapper任务、Reducer任务的分析过程中,会看到很多阶段都出现了键值对,读者容易混淆,所以这里对键值对进行编号,方便大家理解键值对的变化情况,

对于Mapper任务输入的键值对,定义为key1和value1。在map方法中处理后,输出的键值对,定义为key2和value2。reduce方法接收key2和value2,处理后,输出key3和value3。在下文讨论键值对时,可能把key1和value1简写为<k1,v1>,key2和value2简写为<k2,v2>,key3和value3简写为<k3,v3>。

例子:求每年最高气温

在HDFS中的根目录下有以下文件格式: /input.txt

2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023
注意:行与行之间不能有隔行,每行前面不能有空格(以上显示可能格式有误),要不然解析数据将出错

比如:2010012325表示在2010年01月23日的气温为25度。现在要求使用MapReduce,计算每一年出现过的最大气温。在写代码之前,先确保正确的导入了相关的jar包。我使用的是maven,可以到http://mvnrepository.com去搜索这几个artifactId。此程序需要以Hadoop文件作为输入文件,以Hadoop文件作为输出文件,因此需要用到文件系统,于是需要引入hadoop-hdfs包;我们需要向Map-Reduce集群提交任务,需要用到Map-Reduce的客户端,于是需要导入hadoop-mapreduce-client-jobclient包;另外,在处理数据的时候会用到一些hadoop的数据类型例如IntWritable和Text等,因此需要导入hadoop-common包。于是运行此程序所需要的相关依赖有以下几个:



java代码如下:

package com.sct.hadoop.mapreduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * Created by leitao on 2017/3/9.
 */
public class Temperature {
    /**
     * 四个泛型类型分别代表:
     * KeyIn        Mapper的输入数据的Key,这里是每行文字的起始位置(0,11,...)
     * ValueIn      Mapper的输入数据的Value,这里是每行文字
     * KeyOut       Mapper的输出数据的Key,这里是每行文字中的“年份”
     * ValueOut     Mapper的输出数据的Value,这里是每行文字中的“气温”
     */

    static class TempMapper extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // 打印样本: Before Mapper: 0, 2000010115
            System.out.print("Before Mapper: " + key + ", " + value);
            String line = value.toString();
            String year = line.substring(0, 4);
            int temperature = Integer.parseInt(line.substring(8));
            context.write(new Text(year), new IntWritable(temperature));
            // 打印样本: After Mapper:2000, 15
            System.out.println("======" + "After Mapper:" + new Text(year) + ", " + new IntWritable(temperature));
        }

    }


    /**
     * 四个泛型类型分别代表:
     * KeyIn        Reducer的输入数据的Key,这里是每行文字中的“年份”
     * ValueIn      Reducer的输入数据的Value,这里是每行文字中的“气温”
     * KeyOut       Reducer的输出数据的Key,这里是不重复的“年份”
     * ValueOut     Reducer的输出数据的Value,这里是这一年中的“最高气温”
     */
    static class TempReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context) throws IOException, InterruptedException {
            int maxValue = Integer.MIN_VALUE;
            StringBuffer sb = new StringBuffer();
            //取values的最大值
            for (IntWritable value : values) {
                maxValue = Math.max(maxValue, value.get());
                sb.append(value).append(", ");
            }
            // 打印样本: Before Reduce: 2000, 15, 23, 99, 12, 22,
            System.out.print("Before Reduce: " + key + ", " + sb.toString());
            context.write(key, new IntWritable(maxValue));
            // 打印样本: After Reduce: 2000, 99
            System.out.println("======" + "After Reduce: " + key + ", " + maxValue);
        }
    }
    public static void main(String[] args) throws Exception {
        //输入路径
        String dst = "hdfs://192.168.113.130:9000/input.txt";
        //输出路径,必须是不存在的,空文件夹也不行。
        String dstOut = "hdfs://192.168.113.130:9000/output.txt";
        Configuration hadoopConfig = new Configuration();
        System.setProperty("hadoop.home.dir", "F:\\hadoop-2.7.3");
        hadoopConfig.set("fs.hdfs.impl",
                org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
        );
        hadoopConfig.set("fs.file.impl",
                org.apache.hadoop.fs.LocalFileSystem.class.getName()
        );
        Job job = new Job(hadoopConfig);
        //如果需要打成jar运行,需要下面这句
        //job.setJarByClass(NewMaxTemperature.class);
        //job执行作业时输入和输出文件的路径
        FileInputFormat.addInputPath(job, new Path(dst));
        FileOutputFormat.setOutputPath(job, new Path(dstOut));
        //指定自定义的Mapper和Reducer作为两个阶段的任务处理类
        job.setMapperClass(TempMapper.class);
        job.setReducerClass(TempReducer.class);
        //设置最后输出结果的Key和Value的类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //执行job,直到完成
        job.waitForCompletion(true);
        System.out.println("Finished");
    }

}

上面代码中,注意Mapper类的泛型不是java的基本类型,而是Hadoop的数据类型Text、IntWritable。我们可以简单的等价为java的类String、int。

代码中Mapper类的泛型依次是<k1,v1,k2,v2>。map方法的第二个形参是行文本内容,是我们关心的。核心代码是把行文本内容按照空格拆分,把每行数据中“年”和“气温”提取出来,其中“年”作为新的键,“温度”作为新的值,写入到上下文context中。在这里,因为每一年有多行数据,因此每一行都会输出一个<年份, 气温>键值对。



HDFS上传下载文件见:

http://blog.csdn.net/admin1973/article/details/60876255

此处可能会遇到多个问题:

1.第一个问题:

错误描述如下:

org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security .AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="hadoop": hadoop:supergroup:rwxr-xr-x

其实这个错误的原因很容易看出来,用户Administator在hadoop上执行写操作时被权限系统拒绝.

解决问题的过程


看到这个错误的,第一步就是将这个错误直接入放到百度google里面进行搜索。找到了N多篇文章,但是主要的思路就如此篇文章所写的两个解决办法:http://www.linuxidc.com/Linux/2014-08/105334.htm


1、在hdfs的配置文件中,将dfs.permissions修改为False

    <property>  
        <name>dfs.permissions</name>  
        <value>false</value>  
    </property> 

2、执行这样的操作 hadoop fs -chmod 777 /user/hadoop


对于上面的第一个方法,我试了行不通,不知道是自己设置错误还是其他原因,对我此法不可行,第二个方法可行。第二个方法是让我们来修改HDFS中相应文件夹的权限,后面的/user/hadoop这个路径为HDFS中的文件路径,这样修改之后就让我们的administrator有在HDFS的相应目录下有写文件的权限(所有的用户都是写权限)。

2.第二个问题

错误描述如下:

Exceptioninthread"main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Wi


解决办法:

在项目中新建包

package org.apache.hadoop.io.nativeio
然后将NativeIO.java复制到此包下



3第三个问题

描述:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解决方法就是:

Windows本地下载一个Hadoop的安装包

然后再程序中指定(如上代码中的一样):

System.setProperty("hadoop.home.dir", "F:\\hadoop-2.7.3");



改完就收工


程序正确运行输出日志结果如下:

D:\Softwear\JDK\jdk1.8.0_91\bin\java -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:51130,suspend=y,server=n -javaagent:D:\Softwear\ideaIU-14.1.4\plugins\Groovy\lib\agent\gragent.jar -Dfile.encoding=UTF-8 -classpath D:\Softwear\JDK\jdk1.8.0_91\jre\lib\charsets.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\deploy.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\javaws.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\jce.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\jfr.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\jfxswt.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\jsse.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\management-agent.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\plugin.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\resources.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\rt.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\access-bridge-64.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\cldrdata.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\dnsns.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\jaccess.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\jfxrt.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\localedata.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\nashorn.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\sunec.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\sunjce_provider.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\sunmscapi.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\sunpkcs11.jar;D:\Softwear\JDK\jdk1.8.0_91\jre\lib\ext\zipfs.jar;D:\Workspaces\SpringSessionRedis\hadoop\build\classes\main;D:\Workspaces\SpringSessionRedis\hadoop\build\resources\main;D:\Workspaces\SpringSessionRedis\hadoop\lib\activation-1.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\apacheds-i18n-2.0.0-M15.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\apacheds-kerberos-codec-2.0.0-M15.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\api-asn1-api-1.0.0-M20.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\api-util-1.0.0-M20.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\asm-3.2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\avro-1.7.4.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-beanutils-1.7.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-beanutils-core-1.8.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-cli-1.2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-codec-1.4.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-collections-3.2.2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-compress-1.4.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-configuration-1.6.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-digester-1.8.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-httpclient-3.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-io-2.4.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-lang-2.6.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-logging-1.1.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-math3-3.1.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-net-3.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\curator-client-2.7.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\curator-framework-2.7.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\curator-recipes-2.7.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\gson-2.2.4.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\guava-11.0.2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-annotations-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-auth-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-common-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-common-2.7.3-tests.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-nfs-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hamcrest-core-1.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\htrace-core-3.1.0-incubating.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\httpclient-4.2.5.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\httpcore-4.2.5.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jackson-core-asl-1.9.13.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jackson-jaxrs-1.9.13.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jackson-mapper-asl-1.9.13.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jackson-xc-1.9.13.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\java-xmlbuilder-0.4.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jaxb-api-2.2.2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jaxb-impl-2.2.3-1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jersey-core-1.9.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jersey-json-1.9.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jersey-server-1.9.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jets3t-0.9.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jettison-1.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jetty-6.1.26.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jetty-util-6.1.26.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jsch-0.1.42.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jsp-api-2.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jsr305-3.0.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\junit-4.11.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\log4j-1.2.17.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\mockito-all-1.8.5.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\netty-3.6.2.Final.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\paranamer-2.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\protobuf-java-2.5.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\servlet-api-2.5.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\slf4j-api-1.7.10.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\slf4j-log4j12-1.7.10.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\snappy-java-1.0.4.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\stax-api-1.0-2.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\xmlenc-0.52.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\xz-1.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\zookeeper-3.4.6.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-hdfs-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-hdfs-2.7.3-tests.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-hdfs-nfs-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\commons-daemon-1.0.13.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\leveldbjni-all-1.8.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\netty-all-4.0.23.Final.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\xercesImpl-2.9.1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\xml-apis-1.3.04.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\aopalliance-1.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\guice-3.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\guice-servlet-3.0.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-app-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-common-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-core-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-hs-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-hs-plugins-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-jobclient-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-jobclient-2.7.3-tests.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-client-shuffle-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-mapreduce-examples-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\javax.inject-1.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jersey-guice-1.9.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-api-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-applications-distributedshell-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-common-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-client-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-registry-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-common-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-nodemanager-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-applicationhistoryservice-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-resourcemanager-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-sharedcachemanager-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-tests-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\hadoop-yarn-server-web-proxy-2.7.3.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\jersey-client-1.9.jar;D:\Workspaces\SpringSessionRedis\hadoop\lib\zookeeper-3.4.6-tests.jar;D:\Softwear\ideaIU-14.1.4\lib\idea_rt.jar com.sct.hadoop.mapreduce.Temperature
Connected to the target VM, address: '127.0.0.1:51130', transport: 'socket'
2017-03-10 15:47:08,607 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-10 15:47:19,032 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-10 15:47:19,033 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-03-10 15:47:19,341 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2017-03-10 15:47:19,370 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-03-10 15:47:19,398 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2017-03-10 15:47:19,714 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
2017-03-10 15:47:19,885 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_local1918959209_0001
2017-03-10 15:47:20,058 INFO  [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://localhost:8080/
2017-03-10 15:47:20,059 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_local1918959209_0001
2017-03-10 15:47:20,064 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2017-03-10 15:47:20,070 INFO  [Thread-18] output.FileOutputCommitter (FileOutputCommitter.java:<init>(108)) - File Output Committer Algorithm version is 1
2017-03-10 15:47:20,072 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2017-03-10 15:47:20,118 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2017-03-10 15:47:20,119 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local1918959209_0001_m_000000_0
2017-03-10 15:47:20,144 INFO  [LocalJobRunner Map Task Executor #0] output.FileOutputCommitter (FileOutputCommitter.java:<init>(108)) - File Output Committer Algorithm version is 1
2017-03-10 15:47:20,151 INFO  [LocalJobRunner Map Task Executor #0] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(192)) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-10 15:47:20,190 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(612)) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@505b9c31
2017-03-10 15:47:20,195 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(756)) - Processing split: hdfs://192.168.113.130:9000/input.txt:0+478
2017-03-10 15:47:20,247 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:setEquator(1205)) - (EQUATOR) 0 kvi 26214396(104857584)
2017-03-10 15:47:20,247 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(998)) - mapreduce.task.io.sort.mb: 100
2017-03-10 15:47:20,247 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(999)) - soft limit at 83886080
2017-03-10 15:47:20,247 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(1000)) - bufstart = 0; bufvoid = 104857600
2017-03-10 15:47:20,247 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(1001)) - kvstart = 26214396; length = 6553600
2017-03-10 15:47:20,251 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:createSortingCollector(403)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
Before Mapper: 0, 2014010114======After Mapper:2014, 14
Before Mapper: 12, 2014010216======After Mapper:2014, 16
Before Mapper: 24, 2014010317======After Mapper:2014, 17
Before Mapper: 36, 2014010410======After Mapper:2014, 10
Before Mapper: 48, 2014010506======After Mapper:2014, 6
Before Mapper: 60, 2012010609======After Mapper:2012, 9
Before Mapper: 72, 2012010732======After Mapper:2012, 32
Before Mapper: 84, 2012010812======After Mapper:2012, 12
Before Mapper: 96, 2012010919======After Mapper:2012, 19
Before Mapper: 108, 2012011023======After Mapper:2012, 23
Before Mapper: 120, 2001010116======After Mapper:2001, 16
Before Mapper: 132, 2001010212======After Mapper:2001, 12
Before Mapper: 144, 2001010310======After Mapper:2001, 10
Before Mapper: 156, 2001010411======After Mapper:2001, 11
Before Mapper: 168, 2001010529======After Mapper:2001, 29
Before Mapper: 180, 2013010619======After Mapper:2013, 19
Before Mapper: 192, 2013010722======After Mapper:2013, 22
Before Mapper: 204, 2013010812======After Mapper:2013, 12
Before Mapper: 216, 2013010929======After Mapper:2013, 29
Before Mapper: 228, 2013011023======After Mapper:2013, 23
Before Mapper: 240, 2008010105======After Mapper:2008, 5
Before Mapper: 252, 2008010216======After Mapper:2008, 16
Before Mapper: 264, 2008010337======After Mapper:2008, 37
Before Mapper: 276, 2008010414======After Mapper:2008, 14
Before Mapper: 288, 2008010516======After Mapper:2008, 16
Before Mapper: 300, 2007010619======After Mapper:2007, 19
Before Mapper: 312, 2007010712======After Mapper:2007, 12
Before Mapper: 324, 2007010812======After Mapper:2007, 12
Before Mapper: 336, 2007010999======After Mapper:2007, 99
Before Mapper: 348, 2007011023======After Mapper:2007, 23
Before Mapper: 360, 2010010114======After Mapper:2010, 14
Before Mapper: 372, 2010010216======After Mapper:2010, 16
Before Mapper: 384, 2010010317======After Mapper:2010, 17
Before Mapper: 396, 2010010410======After Mapper:2010, 10
Before Mapper: 408, 2010010506======After Mapper:2010, 6
Before Mapper: 420, 2015010649======After Mapper:2015, 49
Before Mapper: 432, 2015010722======After Mapper:2015, 22
Before Mapper: 444, 2015010812======After Mapper:2015, 12
Before Mapper: 456, 2015010999======After Mapper:2015, 99
Before Mapper: 468, 2015011023======After Mapper:2015, 23
2017-03-10 15:47:20,395 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 
2017-03-10 15:47:20,397 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1460)) - Starting flush of map output
2017-03-10 15:47:20,397 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1482)) - Spilling map output
2017-03-10 15:47:20,397 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1483)) - bufstart = 0; bufend = 360; bufvoid = 104857600
2017-03-10 15:47:20,397 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1485)) - kvstart = 26214396(104857584); kvend = 26214240(104856960); length = 157/6553600
2017-03-10 15:47:20,426 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:sortAndSpill(1667)) - Finished spill 0
2017-03-10 15:47:20,431 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:done(1038)) - Task:attempt_local1918959209_0001_m_000000_0 is done. And is in the process of committing
2017-03-10 15:47:20,440 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2017-03-10 15:47:20,440 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_local1918959209_0001_m_000000_0' done.
2017-03-10 15:47:20,440 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local1918959209_0001_m_000000_0
2017-03-10 15:47:20,440 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2017-03-10 15:47:20,442 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2017-03-10 15:47:20,443 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local1918959209_0001_r_000000_0
2017-03-10 15:47:20,448 INFO  [pool-6-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:<init>(108)) - File Output Committer Algorithm version is 1
2017-03-10 15:47:20,450 INFO  [pool-6-thread-1] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(192)) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-10 15:47:20,480 INFO  [pool-6-thread-1] mapred.Task (Task.java:initialize(612)) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@3375e471
2017-03-10 15:47:20,483 INFO  [pool-6-thread-1] mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6984d036
2017-03-10 15:47:20,496 INFO  [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(197)) - MergerManager: memoryLimit=1316801664, maxSingleShuffleLimit=329200416, mergeThreshold=869089152, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2017-03-10 15:47:20,498 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local1918959209_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2017-03-10 15:47:20,525 INFO  [localfetcher#1] reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(144)) - localfetcher#1 about to shuffle output of map attempt_local1918959209_0001_m_000000_0 decomp: 442 len: 446 to MEMORY
2017-03-10 15:47:20,528 INFO  [localfetcher#1] reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 442 bytes from map-output for attempt_local1918959209_0001_m_000000_0
2017-03-10 15:47:20,530 INFO  [localfetcher#1] reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(315)) - closeInMemoryFile -> map-output of size: 442, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->442
2017-03-10 15:47:20,532 INFO  [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2017-03-10 15:47:20,533 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2017-03-10 15:47:20,533 INFO  [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(687)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2017-03-10 15:47:20,541 INFO  [pool-6-thread-1] mapred.Merger (Merger.java:merge(606)) - Merging 1 sorted segments
2017-03-10 15:47:20,541 INFO  [pool-6-thread-1] mapred.Merger (Merger.java:merge(705)) - Down to the last merge-pass, with 1 segments left of total size: 435 bytes
2017-03-10 15:47:20,543 INFO  [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(754)) - Merged 1 segments, 442 bytes to disk to satisfy reduce memory limit
2017-03-10 15:47:20,543 INFO  [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(784)) - Merging 1 files, 446 bytes from disk
2017-03-10 15:47:20,544 INFO  [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(799)) - Merging 0 segments, 0 bytes from memory into reduce
2017-03-10 15:47:20,544 INFO  [pool-6-thread-1] mapred.Merger (Merger.java:merge(606)) - Merging 1 sorted segments
2017-03-10 15:47:20,544 INFO  [pool-6-thread-1] mapred.Merger (Merger.java:merge(705)) - Down to the last merge-pass, with 1 segments left of total size: 435 bytes
2017-03-10 15:47:20,545 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2017-03-10 15:47:20,573 INFO  [pool-6-thread-1] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
Before Reduce: 2001, 12, 10, 11, 29, 16, ======After Reduce: 2001, 29
Before Reduce: 2007, 23, 19, 12, 12, 99, ======After Reduce: 2007, 99
Before Reduce: 2008, 16, 14, 37, 16, 5, ======After Reduce: 2008, 37
Before Reduce: 2010, 10, 6, 14, 16, 17, ======After Reduce: 2010, 17
Before Reduce: 2012, 19, 12, 32, 9, 23, ======After Reduce: 2012, 32
Before Reduce: 2013, 23, 29, 12, 22, 19, ======After Reduce: 2013, 29
Before Reduce: 2014, 14, 6, 10, 17, 16, ======After Reduce: 2014, 17
Before Reduce: 2015, 23, 49, 22, 12, 99, ======After Reduce: 2015, 99
2017-03-10 15:47:20,724 INFO  [pool-6-thread-1] mapred.Task (Task.java:done(1038)) - Task:attempt_local1918959209_0001_r_000000_0 is done. And is in the process of committing
2017-03-10 15:47:20,727 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2017-03-10 15:47:20,728 INFO  [pool-6-thread-1] mapred.Task (Task.java:commit(1199)) - Task attempt_local1918959209_0001_r_000000_0 is allowed to commit now
2017-03-10 15:47:20,738 INFO  [pool-6-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(535)) - Saved output of task 'attempt_local1918959209_0001_r_000000_0' to hdfs://192.168.113.130:9000/output.txt/_temporary/0/task_local1918959209_0001_r_000000
2017-03-10 15:47:20,739 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2017-03-10 15:47:20,739 INFO  [pool-6-thread-1] mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_local1918959209_0001_r_000000_0' done.
2017-03-10 15:47:20,739 INFO  [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local1918959209_0001_r_000000_0
2017-03-10 15:47:20,739 INFO  [Thread-18] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2017-03-10 15:47:21,064 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_local1918959209_0001 running in uber mode : false
2017-03-10 15:47:21,066 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 100% reduce 100%
2017-03-10 15:47:21,068 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1378)) - Job job_local1918959209_0001 completed successfully
2017-03-10 15:47:21,084 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 35
	File System Counters
		FILE: Number of bytes read=1260
		FILE: Number of bytes written=593510
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=956
		HDFS: Number of bytes written=64
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=40
		Map output records=40
		Map output bytes=360
		Map output materialized bytes=446
		Input split bytes=102
		Combine input records=0
		Combine output records=0
		Reduce input groups=8
		Reduce shuffle bytes=446
		Reduce input records=40
		Reduce output records=8
		Spilled Records=80
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=6
		Total committed heap usage (bytes)=567279616
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=478
	File Output Format Counters 
		Bytes Written=64
Finished
Disconnected from the target VM, address: '127.0.0.1:51130', transport: 'socket'

Process finished with exit code 0

然后我们把HDFS中此结果输出文件

/output.txt/part-r-00000

下载下来,查看结果如下:

2001	29
2007	99
2008	37
2010	17
2012	32
2013	29
2014	17
2015	99
即每年的做高温







参考:http://blog.csdn.net/zhangt85/article/details/42077281

基于hadoop的Hive数据仓库JavaAPI简单调用的实例,关于Hive的简介在此不赘述。hive提供了三种用户接口:CLI,JDBC/ODBC和 WebUI CLI,即Shell命令行 JDBC/ODBC 是 Hive 的Java,与使用传统数据库JDBC的方式类似 WebGUI是通过浏览器访问 Hive 本文主要介绍的就是第二种用户接口,直接进入正题。 1、Hive 安装: 1)hive的安装请参考网上的相关文章,测试时只在hadoop一个节点上安装hive即可。 2)测试数据data文件'\t'分隔: 1 zhangsan 2 lisi 3 wangwu 3)将测试数据data上传到linux目录下,我放置在:/home/hadoop01/data 2、在使用 JDBC 开发 Hive 程序时, 必须首先开启 Hive 的远程服务接口。使用下面命令进行开启: Java代码 收藏代码 hive --service hiveserver >/dev/null 2>/dev/null & 我们可以通过CLI、Client、Web UI等Hive提供的用户接口来和Hive通信,但这三种方式最常用的是CLI;Client 是Hive的客户端,用户连接至 Hive Server。在启动 Client 模式的时候,需要指出Hive Server所在节点,并且在该节点启动 Hive Server。 WUI 是通过浏览器访问 Hive。今天我们来谈谈怎么通过HiveServer来操作Hive。   Hive提供了jdbc驱动,使得我们可以用Java代码来连接Hive并进行一些类关系型数据库的sql语句查询等操作。同关系型数据库一样,我们也需要将Hive的服务打开;在Hive 0.11.0版本之前,只有HiveServer服务可用,你得在程序操作Hive之前,必须在Hive安装的服务器上打开HiveServer服务,如下: 1 [wyp@localhost/home/q/hive-0.11.0]$ bin/hive --service hiveserver -p10002 2 Starting Hive Thrift Server 上面代表你已经成功的在端口为10002(默认的端口是10000)启动了hiveserver服务。这时候,你就可以通过Java代码来连接hiveserver,代码如下:
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值