问题背景
1.做毕设,需要对表中的不合法数据进行过滤、清洗。因为后期要用sqrksql知识完成毕设,所以在scala工程下创建项目,完成代码编程。第一次在scala工程下运行MR项目,生成的结果文件,很顺利,将需要的数据结果都保存到了文件里。但日后需要在原来的MR程序上对数据进行进一步完善时,发现生成的文件打开,内容为空。
2.运行代码后显示
3.打开文件夹内容显示为空
我的代码
package bike;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class CleanUserData {
public static class DealDataMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
Text text = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//1、获取一行数据
String line = value.toString();
//切割数据
String split[] = line.split(",");
//2、清洗数据
/**
* 1)过滤脏数据 小于5列的数据
* 2)去掉类别字段中的空格替换为null,出生年份字段设为的空格替换为平均值
* 3)将清洗的数据按照,分割保存
*/
//1)过滤脏数据 为空,长度不为5的数据过滤掉
for (int i = 0; i < split.length; i++) {
if (split[i].equals(""))
return;
}
//2)将字段中性别字段的Error替换为null,
split[3] = split[3].replaceAll("Error","null");
//3)出生年份字段设为的-1替换为0001
split[split.length-1] = split[split.length-1].replaceAll("-1","0001");
StringBuffer sb = new StringBuffer();
for(int i = 0; i < split.length; i++) {
//4)将清洗的数据按照,分割保存
if (i < 5) {
if (i == split.length - 1) {
sb.append(split[i]);
} else {
sb.append(split[i]+ ",");
}
}
}
text.set(sb.toString());
//写出数据
context.write(text, NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//获取job信息
Job job = Job.getInstance();
//关联map
job.setMapperClass(DealDataMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
//设置最终输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
//设置输入和输出路径
FileInputFormat.setInputPaths(job, new Path("F:\\user.csv"));
FileOutputFormat.setOutputPath(job, new Path("F:\\design\\userOfResult"));
//提交
job.waitForCompletion(true);
}
}
问题解决过程
1.因为运行代码后不会报错,所以我判断代码没有问题,可能是文件输入、输出路径有问题。
然后改变被清洗表的输入路径,输出路径。这么反复尝试。然而结果如我问题背景所放置的图一样。失败!!!
2.看来不是这个问题,我开始猜测会不会是代码所在的工程出了问题。于是关闭idea。重新运行MR程序。失败!!!
3.接下来我重新创建一个项目,写其他代码运行,查看是否是idea出了故障,其他代码运行结果出来以后,我怀着疑虑的心态,将MR程序复制到这个项目里运行。失败!!!
4.然后将MR程序,换到之前创建的hadoop工程下运行,提示代码中下标越界,虽然运行过程错了,但给了我希望,我认为只要我把代码错误解决就可以得到自己想要的文件结果。失败!!!
5.代码解决问题后,生成了自己的想要结果。但没有彻底找到这个问题原因所在。失败!!!
6.第二天打开电脑,想要从根本上解决问题。想到了之前运行代码时提示的依赖更新。于是将hadoop工程的部分依赖加到scala工程里的pom.xml文件里。之后再次运行MR程序解决了文件为空的问题。
如图
提取码为uv8p scala的配置文件pom.xml
需要添加的依赖为(这里还包括hive和hbase依赖,因为依赖多了不会有负面影响,日后方便进行其他操作,免去配依赖步骤)
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.5</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.5</version>
</dependency>
正确的运行结果、及文件打开显示
1.运行结果日志
D:\Java\jdk1.8.0_131\bin\java.exe "-javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2018.2.3\lib\idea_rt.jar=55901:D:\Program Files\JetBrains\IntelliJ IDEA 2018.2.3\bin" -Dfile.encoding=UTF-8 -classpath D:\Java\jdk1.8.0_131\jre\lib\charsets.jar;D:\Java\jdk1.8.0_131\jre\lib\deploy.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\access-bridge-64.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\cldrdata.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\dnsns.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\jaccess.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\jfxrt.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\localedata.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\nashorn.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\sunec.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\sunjce_provider.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\sunmscapi.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\sunpkcs11.jar;D:\Java\jdk1.8.0_131\jre\lib\ext\zipfs.jar;D:\Java\jdk1.8.0_131\jre\lib\javaws.jar;D:\Java\jdk1.8.0_131\jre\lib\jce.jar;D:\Java\jdk1.8.0_131\jre\lib\jfr.jar;D:\Java\jdk1.8.0_131\jre\lib\jfxswt.jar;D:\Java\jdk1.8.0_131\jre\lib\jsse.jar;D:\Java\jdk1.8.0_131\jre\lib\management-agent.jar;D:\Java\jdk1.8.0_131\jre\lib\plugin.jar;D:\Java\jdk1.8.0_131\jre\lib\resources.jar;D:\Java\jdk1.8.0_131\jre\lib\rt.jar;D:\Users\Administrator\IdeaProjects\forhadoop\target\classes;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-common\2.6.1\hadoop-common-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-annotations\2.6.1\hadoop-annotations-2.6.1.jar;D:\Java\jdk1.8.0_131\lib\tools.jar;D:\oracleXLH\maven\respository\com\google\guava\guava\11.0.2\guava-11.0.2.jar;D:\oracleXLH\maven\respository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\oracleXLH\maven\respository\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;D:\oracleXLH\maven\respository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;D:\oracleXLH\maven\respository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;D:\oracleXLH\maven\respository\commons-codec\commons-codec\1.4\commons-codec-1.4.jar;D:\oracleXLH\maven\respository\commons-io\commons-io\2.4\commons-io-2.4.jar;D:\oracleXLH\maven\respository\commons-net\commons-net\3.1\commons-net-3.1.jar;D:\oracleXLH\maven\respository\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;D:\oracleXLH\maven\respository\javax\servlet\servlet-api\2.5\servlet-api-2.5.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;D:\oracleXLH\maven\respository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;D:\oracleXLH\maven\respository\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;D:\oracleXLH\maven\respository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;D:\oracleXLH\maven\respository\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;D:\oracleXLH\maven\respository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;D:\oracleXLH\maven\respository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;D:\oracleXLH\maven\respository\javax\activation\activation\1.1\activation-1.1.jar;D:\oracleXLH\maven\respository\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;D:\oracleXLH\maven\respository\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;D:\oracleXLH\maven\respository\asm\asm\3.1\asm-3.1.jar;D:\oracleXLH\maven\respository\tomcat\jasper-compiler\5.5.23\jasper-compiler-5.5.23.jar;D:\oracleXLH\maven\respository\tomcat\jasper-runtime\5.5.23\jasper-runtime-5.5.23.jar;D:\oracleXLH\maven\respository\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;D:\oracleXLH\maven\respository\commons-el\commons-el\1.0\commons-el-1.0.jar;D:\oracleXLH\maven\respository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;D:\oracleXLH\maven\respository\log4j\log4j\1.2.17\log4j-1.2.17.jar;D:\oracleXLH\maven\respository\net\java\dev\jets3t\jets3t\0.9.0\jets3t-0.9.0.jar;D:\oracleXLH\maven\respository\org\apache\httpcomponents\httpclient\4.1.2\httpclient-4.1.2.jar;D:\oracleXLH\maven\respository\org\apache\httpcomponents\httpcore\4.1.2\httpcore-4.1.2.jar;D:\oracleXLH\maven\respository\com\jamesmurty\utils\java-xmlbuilder\0.4\java-xmlbuilder-0.4.jar;D:\oracleXLH\maven\respository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;D:\oracleXLH\maven\respository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;D:\oracleXLH\maven\respository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;D:\oracleXLH\maven\respository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;D:\oracleXLH\maven\respository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;D:\oracleXLH\maven\respository\org\slf4j\slf4j-api\1.7.5\slf4j-api-1.7.5.jar;D:\oracleXLH\maven\respository\org\slf4j\slf4j-log4j12\1.7.5\slf4j-log4j12-1.7.5.jar;D:\oracleXLH\maven\respository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;D:\oracleXLH\maven\respository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;D:\oracleXLH\maven\respository\org\apache\avro\avro\1.7.4\avro-1.7.4.jar;D:\oracleXLH\maven\respository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;D:\oracleXLH\maven\respository\org\xerial\snappy\snappy-java\1.0.4.1\snappy-java-1.0.4.1.jar;D:\oracleXLH\maven\respository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;D:\oracleXLH\maven\respository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-auth\2.6.1\hadoop-auth-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;D:\oracleXLH\maven\respository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;D:\oracleXLH\maven\respository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;D:\oracleXLH\maven\respository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;D:\oracleXLH\maven\respository\com\jcraft\jsch\0.1.42\jsch-0.1.42.jar;D:\oracleXLH\maven\respository\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;D:\oracleXLH\maven\respository\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;D:\oracleXLH\maven\respository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;D:\oracleXLH\maven\respository\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;D:\oracleXLH\maven\respository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;D:\oracleXLH\maven\respository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;D:\oracleXLH\maven\respository\org\tukaani\xz\1.0\xz-1.0.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-hdfs\2.6.1\hadoop-hdfs-2.6.1.jar;D:\oracleXLH\maven\respository\commons-daemon\commons-daemon\1.0.13\commons-daemon-1.0.13.jar;D:\oracleXLH\maven\respository\io\netty\netty\3.6.2.Final\netty-3.6.2.Final.jar;D:\oracleXLH\maven\respository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;D:\oracleXLH\maven\respository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-client\2.6.1\hadoop-client-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.1\hadoop-mapreduce-client-app-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.1\hadoop-mapreduce-client-common-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-client\2.6.1\hadoop-yarn-client-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-server-common\2.6.1\hadoop-yarn-server-common-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.1\hadoop-mapreduce-client-shuffle-2.6.1.jar;D:\oracleXLH\maven\respository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-api\2.6.1\hadoop-yarn-api-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.1\hadoop-mapreduce-client-core-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-common\2.6.1\hadoop-yarn-common-2.6.1.jar;D:\oracleXLH\maven\respository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.1\hadoop-mapreduce-client-jobclient-2.6.1.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-exec\2.1.0\hive-exec-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-ant\2.1.0\hive-ant-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\velocity\velocity\1.5\velocity-1.5.jar;D:\oracleXLH\maven\respository\oro\oro\2.0.8\oro-2.0.8.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-llap-tez\2.1.0\hive-llap-tez-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-common\2.1.0\hive-common-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-storage-api\2.1.0\hive-storage-api-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-orc\2.1.0\hive-orc-2.1.0.jar;D:\oracleXLH\maven\respository\org\iq80\snappy\snappy\0.2\snappy-0.2.jar;D:\oracleXLH\maven\respository\org\eclipse\jetty\aggregate\jetty-all\7.6.0.v20120127\jetty-all-7.6.0.v20120127.jar;D:\oracleXLH\maven\respository\org\apache\geronimo\specs\geronimo-jta_1.1_spec\1.1.1\geronimo-jta_1.1_spec-1.1.1.jar;D:\oracleXLH\maven\respository\javax\mail\mail\1.4.1\mail-1.4.1.jar;D:\oracleXLH\maven\respository\org\apache\geronimo\specs\geronimo-jaspic_1.0_spec\1.0\geronimo-jaspic_1.0_spec-1.0.jar;D:\oracleXLH\maven\respository\org\apache\geronimo\specs\geronimo-annotation_1.0_spec\1.1.1\geronimo-annotation_1.0_spec-1.1.1.jar;D:\oracleXLH\maven\respository\asm\asm-commons\3.1\asm-commons-3.1.jar;D:\oracleXLH\maven\respository\asm\asm-tree\3.1\asm-tree-3.1.jar;D:\oracleXLH\maven\respository\org\eclipse\jetty\orbit\javax.servlet\3.0.0.v201112011016\javax.servlet-3.0.0.v201112011016.jar;D:\oracleXLH\maven\respository\joda-time\joda-time\2.5\joda-time-2.5.jar;D:\oracleXLH\maven\respository\org\apache\logging\log4j\log4j-web\2.4.1\log4j-web-2.4.1.jar;D:\oracleXLH\maven\respository\org\json\json\20090211\json-20090211.jar;D:\oracleXLH\maven\respository\io\dropwizard\metrics\metrics-core\3.1.0\metrics-core-3.1.0.jar;D:\oracleXLH\maven\respository\io\dropwizard\metrics\metrics-jvm\3.1.0\metrics-jvm-3.1.0.jar;D:\oracleXLH\maven\respository\io\dropwizard\metrics\metrics-json\3.1.0\metrics-json-3.1.0.jar;D:\oracleXLH\maven\respository\com\fasterxml\jackson\core\jackson-databind\2.4.2\jackson-databind-2.4.2.jar;D:\oracleXLH\maven\respository\com\fasterxml\jackson\core\jackson-annotations\2.4.0\jackson-annotations-2.4.0.jar;D:\oracleXLH\maven\respository\com\fasterxml\jackson\core\jackson-core\2.4.2\jackson-core-2.4.2.jar;D:\oracleXLH\maven\respository\com\github\joshelser\dropwizard-metrics-hadoop-metrics2-reporter\0.1.0\dropwizard-metrics-hadoop-metrics2-reporter-0.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-llap-client\2.1.0\hive-llap-client-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-llap-common\2.1.0\hive-llap-common-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-serde\2.1.0\hive-serde-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-service-rpc\2.1.0\hive-service-rpc-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\thrift\libfb303\0.9.3\libfb303-0.9.3.jar;D:\oracleXLH\maven\respository\net\sf\opencsv\opencsv\2.3\opencsv-2.3.jar;D:\oracleXLH\maven\respository\org\apache\parquet\parquet-hadoop-bundle\1.8.1\parquet-hadoop-bundle-1.8.1.jar;D:\oracleXLH\maven\respository\org\apache\commons\commons-lang3\3.1\commons-lang3-3.1.jar;D:\oracleXLH\maven\respository\org\apache\hive\hive-shims\2.1.0\hive-shims-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\shims\hive-shims-common\2.1.0\hive-shims-common-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\thrift\libthrift\0.9.3\libthrift-0.9.3.jar;D:\oracleXLH\maven\respository\org\apache\hive\shims\hive-shims-0.23\2.1.0\hive-shims-0.23-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-server-resourcemanager\2.6.0\hadoop-yarn-server-resourcemanager-2.6.0.jar;D:\oracleXLH\maven\respository\com\google\inject\extensions\guice-servlet\3.0\guice-servlet-3.0.jar;D:\oracleXLH\maven\respository\com\google\inject\guice\3.0\guice-3.0.jar;D:\oracleXLH\maven\respository\javax\inject\javax.inject\1\javax.inject-1.jar;D:\oracleXLH\maven\respository\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;D:\oracleXLH\maven\respository\com\sun\jersey\contribs\jersey-guice\1.9\jersey-guice-1.9.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-server-applicationhistoryservice\2.6.0\hadoop-yarn-server-applicationhistoryservice-2.6.0.jar;D:\oracleXLH\maven\respository\org\apache\hadoop\hadoop-yarn-server-web-proxy\2.6.0\hadoop-yarn-server-web-proxy-2.6.0.jar;D:\oracleXLH\maven\respository\org\apache\hive\shims\hive-shims-scheduler\2.1.0\hive-shims-scheduler-2.1.0.jar;D:\oracleXLH\maven\respository\org\apache\logging\log4j\log4j-1.2-api\2.4.1\log4j-1.2-api-2.4.1.jar;D:\oracleXLH\maven\respository\org\apache\logging\log4j\log4j-api\2.4.1\log4j-api-2.4.1.jar;D:\oracleXLH\maven\respository\org\apache\logging\log4j\log4j-core\2.4.1\log4j-core-2.4.1.jar;D:\oracleXLH\maven\respository\org\apache\logging\log4j\log4j-slf4j-impl\2.4.1\log4j-slf4j-impl-2.4.1.jar;D:\oracleXLH\maven\respository\org\antlr\antlr-runtime\3.4\antlr-runtime-3.4.jar;D:\oracleXLH\maven\respository\org\antlr\stringtemplate\3.2.1\stringtemplate-3.2.1.jar;D:\oracleXLH\maven\respository\antlr\antlr\2.7.7\antlr-2.7.7.jar;D:\oracleXLH\maven\respository\org\antlr\ST4\4.0.4\ST4-4.0.4.jar;D:\oracleXLH\maven\respository\org\apache\ant\ant\1.9.1\ant-1.9.1.jar;D:\oracleXLH\maven\respository\org\apache\ant\ant-launcher\1.9.1\ant-launcher-1.9.1.jar;D:\oracleXLH\maven\respository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;D:\oracleXLH\maven\respository\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;D:\oracleXLH\maven\respository\org\codehaus\groovy\groovy-all\2.4.4\groovy-all-2.4.4.jar;D:\oracleXLH\maven\respository\org\datanucleus\datanucleus-core\4.1.6\datanucleus-core-4.1.6.jar;D:\oracleXLH\maven\respository\org\apache\calcite\calcite-core\1.6.0\calcite-core-1.6.0.jar;D:\oracleXLH\maven\respository\org\apache\calcite\calcite-linq4j\1.6.0\calcite-linq4j-1.6.0.jar;D:\oracleXLH\maven\respository\commons-dbcp\commons-dbcp\1.4\commons-dbcp-1.4.jar;D:\oracleXLH\maven\respository\commons-pool\commons-pool\1.5.4\commons-pool-1.5.4.jar;D:\oracleXLH\maven\respository\net\hydromatic\eigenbase-properties\1.1.5\eigenbase-properties-1.1.5.jar;D:\oracleXLH\maven\respository\org\codehaus\janino\janino\2.7.6\janino-2.7.6.jar;D:\oracleXLH\maven\respository\org\codehaus\janino\commons-compiler\2.7.6\commons-compiler-2.7.6.jar;D:\oracleXLH\maven\respository\org\pentaho\pentaho-aggdesigner-algorithm\5.1.5-jhyde\pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar;D:\oracleXLH\maven\respository\org\apache\calcite\calcite-avatica\1.6.0\calcite-avatica-1.6.0.jar;D:\oracleXLH\maven\respository\stax\stax-api\1.0.1\stax-api-1.0.1.jar;D:\oracleXLH\maven\respository\jline\jline\2.12\jline-2.12.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-client\1.2.5\hbase-client-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-annotations\1.2.5\hbase-annotations-1.2.5.jar;D:\oracleXLH\maven\respository\com\github\stephenc\findbugs\findbugs-annotations\1.3.9-1\findbugs-annotations-1.3.9-1.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-common\1.2.5\hbase-common-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-protocol\1.2.5\hbase-protocol-1.2.5.jar;D:\oracleXLH\maven\respository\io\netty\netty-all\4.0.23.Final\netty-all-4.0.23.Final.jar;D:\oracleXLH\maven\respository\org\apache\htrace\htrace-core\3.1.0-incubating\htrace-core-3.1.0-incubating.jar;D:\oracleXLH\maven\respository\org\jruby\jcodings\jcodings\1.0.8\jcodings-1.0.8.jar;D:\oracleXLH\maven\respository\org\jruby\joni\joni\2.1.2\joni-2.1.2.jar;D:\oracleXLH\maven\respository\com\yammer\metrics\metrics-core\2.2.0\metrics-core-2.2.0.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-server\1.2.5\hbase-server-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-procedure\1.2.5\hbase-procedure-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-common\1.2.5\hbase-common-1.2.5-tests.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-prefix-tree\1.2.5\hbase-prefix-tree-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-hadoop-compat\1.2.5\hbase-hadoop-compat-1.2.5.jar;D:\oracleXLH\maven\respository\org\apache\commons\commons-math\2.2\commons-math-2.2.jar;D:\oracleXLH\maven\respository\org\apache\hbase\hbase-hadoop2-compat\1.2.5\hbase-hadoop2-compat-1.2.5.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\jetty-sslengine\6.1.26\jetty-sslengine-6.1.26.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\jsp-2.1\6.1.14\jsp-2.1-6.1.14.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\jsp-api-2.1\6.1.14\jsp-api-2.1-6.1.14.jar;D:\oracleXLH\maven\respository\org\mortbay\jetty\servlet-api-2.5\6.1.14\servlet-api-2.5-6.1.14.jar;D:\oracleXLH\maven\respository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;D:\oracleXLH\maven\respository\org\jamon\jamon-runtime\2.4.1\jamon-runtime-2.4.1.jar;D:\oracleXLH\maven\respository\com\lmax\disruptor\3.3.0\disruptor-3.3.0.jar day_3_18.CleanUserData
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/oracleXLH/maven/respository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/oracleXLH/maven/respository/org/apache/logging/log4j/log4j-slf4j-impl/2.4.1/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
INFO [main] - session.id is deprecated. Instead, use dfs.metrics.session-id
INFO [main] - Initializing JVM Metrics with processName=JobTracker, sessionId=
WARN [main] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN [main] - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO [main] - Total input paths to process : 1
INFO [main] - number of splits:1
INFO [main] - Submitting tokens for job: job_local645345651_0001
INFO [main] - The url to track the job: http://localhost:8080/
INFO [main] - Running job: job_local645345651_0001
INFO [Thread-2] - OutputCommitter set in config null
INFO [Thread-2] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
INFO [Thread-2] - Waiting for map tasks
INFO [LocalJobRunner Map Task Executor #0] - Starting task: attempt_local645345651_0001_m_000000_0
INFO [LocalJobRunner Map Task Executor #0] - ProcfsBasedProcessTree currently is supported only on Linux.
INFO [LocalJobRunner Map Task Executor #0] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@23c0bcae
INFO [LocalJobRunner Map Task Executor #0] - Processing split: file:/F:/user.csv:0+219377
INFO [LocalJobRunner Map Task Executor #0] - (EQUATOR) 0 kvi 26214396(104857584)
INFO [LocalJobRunner Map Task Executor #0] - mapreduce.task.io.sort.mb: 100
INFO [LocalJobRunner Map Task Executor #0] - soft limit at 83886080
INFO [LocalJobRunner Map Task Executor #0] - bufstart = 0; bufvoid = 104857600
INFO [LocalJobRunner Map Task Executor #0] - kvstart = 26214396; length = 6553600
INFO [LocalJobRunner Map Task Executor #0] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
INFO [LocalJobRunner Map Task Executor #0] -
INFO [LocalJobRunner Map Task Executor #0] - Starting flush of map output
INFO [LocalJobRunner Map Task Executor #0] - Spilling map output
INFO [LocalJobRunner Map Task Executor #0] - bufstart = 0; bufend = 211375; bufvoid = 104857600
INFO [LocalJobRunner Map Task Executor #0] - kvstart = 26214396(104857584); kvend = 26187036(104748144); length = 27361/6553600
INFO [LocalJobRunner Map Task Executor #0] - Finished spill 0
INFO [LocalJobRunner Map Task Executor #0] - Task:attempt_local645345651_0001_m_000000_0 is done. And is in the process of committing
INFO [LocalJobRunner Map Task Executor #0] - map
INFO [LocalJobRunner Map Task Executor #0] - Task 'attempt_local645345651_0001_m_000000_0' done.
INFO [LocalJobRunner Map Task Executor #0] - Finishing task: attempt_local645345651_0001_m_000000_0
INFO [Thread-2] - map task executor complete.
INFO [Thread-2] - Waiting for reduce tasks
INFO [pool-3-thread-1] - Starting task: attempt_local645345651_0001_r_000000_0
INFO [pool-3-thread-1] - ProcfsBasedProcessTree currently is supported only on Linux.
INFO [pool-3-thread-1] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@269fe08e
INFO [pool-3-thread-1] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3d933088
INFO [main] - Job job_local645345651_0001 running in uber mode : false
INFO [main] - map 100% reduce 0%
INFO [pool-3-thread-1] - MergerManager: memoryLimit=1321939712, maxSingleShuffleLimit=330484928, mergeThreshold=872480256, ioSortFactor=10, memToMemMergeOutputsThreshold=10
INFO [EventFetcher for fetching Map Completion Events] - attempt_local645345651_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
INFO [localfetcher#1] - localfetcher#1 about to shuffle output of map attempt_local645345651_0001_m_000000_0 decomp: 225059 len: 225063 to MEMORY
INFO [localfetcher#1] - Read 225059 bytes from map-output for attempt_local645345651_0001_m_000000_0
INFO [localfetcher#1] - closeInMemoryFile -> map-output of size: 225059, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->225059
INFO [EventFetcher for fetching Map Completion Events] - EventFetcher is interrupted.. Returning
INFO [pool-3-thread-1] - 1 / 1 copied.
INFO [pool-3-thread-1] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
INFO [pool-3-thread-1] - Merging 1 sorted segments
INFO [pool-3-thread-1] - Down to the last merge-pass, with 1 segments left of total size: 225028 bytes
INFO [pool-3-thread-1] - Merged 1 segments, 225059 bytes to disk to satisfy reduce memory limit
INFO [pool-3-thread-1] - Merging 1 files, 225063 bytes from disk
INFO [pool-3-thread-1] - Merging 0 segments, 0 bytes from memory into reduce
INFO [pool-3-thread-1] - Merging 1 sorted segments
INFO [pool-3-thread-1] - Down to the last merge-pass, with 1 segments left of total size: 225028 bytes
INFO [pool-3-thread-1] - 1 / 1 copied.
INFO [pool-3-thread-1] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
INFO [pool-3-thread-1] - Task:attempt_local645345651_0001_r_000000_0 is done. And is in the process of committing
INFO [pool-3-thread-1] - 1 / 1 copied.
INFO [pool-3-thread-1] - Task attempt_local645345651_0001_r_000000_0 is allowed to commit now
INFO [pool-3-thread-1] - Saved output of task 'attempt_local645345651_0001_r_000000_0' to file:/F:/design/userOfResul/_temporary/0/task_local645345651_0001_r_000000
INFO [pool-3-thread-1] - reduce > reduce
INFO [pool-3-thread-1] - Task 'attempt_local645345651_0001_r_000000_0' done.
INFO [pool-3-thread-1] - Finishing task: attempt_local645345651_0001_r_000000_0
INFO [Thread-2] - reduce task executor complete.
INFO [main] - map 100% reduce 100%
INFO [main] - Job job_local645345651_0001 completed successfully
INFO [main] - Counters: 33
File System Counters
FILE: Number of bytes read=889190
FILE: Number of bytes written=1411402
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=6918
Map output records=6841
Map output bytes=211375
Map output materialized bytes=225063
Input split bytes=82
Combine input records=0
Combine output records=0
Reduce input groups=6841
Reduce shuffle bytes=225063
Reduce input records=6841
Reduce output records=6841
Spilled Records=13682
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=13
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=219377
File Output Format Counters
Bytes Written=213035
Process finished with exit code 0
2.生成文件
3.文件打开的数据结果
遇到这个问题我上网没有找到解决办法,自己欲哭无泪,尝试了一晚上,第二天终于从根本上解决了问题,放下毕设,写了这篇文章,总结自己的错误,希望能带给你帮助。