MapReduce的一个令我不解的问题

MapReduce的一个令我不解的问题

问题的产生

自己在做一个关于用户数据行为分析的项目。首先先看一下项目的数据吧!

数据 因为导师说之后的数据量更大,所以我需要使用Hive进行数据存储和使用MapReduce进行数据的计算分析。当然,我们当前的数据量是两亿条,差不多18个G的数据。所以我先写了一些MR的测试代码。

写MR的思路

因为我需要计算出车辆每天的行驶时间、里程、速度等信息,所以我需要写一个bean,里面包含车辆ID和日期,用来当做key并且根据key进行分组。

```java
package com.yangyang.bean;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class GroupBean implements WritableComparable<GroupBean> {

    private int car_ID;
    private String date;
    private long time;

    public GroupBean(int car_ID, String date,long time) {
        super();
        this.car_ID = car_ID;
        this.date = date;
        this.time = time;
    }

    public GroupBean() {
        super();
    }

    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeLong(time);
        dataOutput.writeInt(car_ID);
        dataOutput.writeChars(date);
    }

    public void readFields(DataInput dataInput) throws IOException {
        time = dataInput.readLong();
        car_ID = dataInput.readInt();
        date = dataInput.readLine();
    }

    public int getCar_ID() {
        return car_ID;
    }

    public String getDate() {
        return date;
    }

    public void setCar_ID(int car_ID) {
        this.car_ID = car_ID;
    }

    public void setDate(String date) {
        this.date = date;
    }

    public long getTime() {
        return time;
    }

    public void setTime(long time) {
        this.time = time;
    }

    @Override
    public String toString() {
        return car_ID + "," + date;
    }

    public int compareTo(GroupBean o) {
        int result;
        if(car_ID>o.getCar_ID()){
            result = 1;
        }else if(car_ID<o.getCar_ID()){
            result = -1;
        }else{
            result = time>o.getTime() ? 1:-1;
        }
        return result;
    }
}

这就是我写的key_bean,但是这里就非常奇怪,我之前序列化read()的写法是这样的:
在这里插入图片描述

就会报出这样的错误:

2020-08-08 17:09:20,586 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-08-08 17:09:20,588 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2020-08-08 17:09:22,825 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-08-08 17:09:22,869 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2020-08-08 17:09:22,880 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2020-08-08 17:09:22,920 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2020-08-08 17:09:23,010 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1785045039_0001
2020-08-08 17:09:23,190 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2020-08-08 17:09:23,191 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local1785045039_0001
2020-08-08 17:09:23,192 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2020-08-08 17:09:23,200 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-08-08 17:09:23,202 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-08-08 17:09:23,266 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2020-08-08 17:09:23,267 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local1785045039_0001_m_000000_0
2020-08-08 17:09:23,300 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-08-08 17:09:23,306 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-08-08 17:09:23,372 INFO [org.apache.hadoop.mapred.Task] -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@74df4484
2020-08-08 17:09:23,379 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/F:/java项目/20.txt:0+1793115
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2020-08-08 17:09:23,433 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 951072; bufvoid = 104857600
2020-08-08 17:09:23,742 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26135144(104540576); length = 79253/6553600
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 0 kv 26214396(104857584) kvi 26135140(104540560)
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 951072; bufvoid = 104857600
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26135144(104540576); length = 79253/6553600
2020-08-08 17:09:23,770 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@7bf9f47a
java.lang.RuntimeException: java.io.EOFException
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:165)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1268)
	at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:35)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:87)
	at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1600)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
	at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2019)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:797)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at java.io.DataInputStream.readLong(DataInputStream.java:416)
	at com.yangyang.bean.GroupBean.readFields(GroupBean.java:35)
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
	... 16 more
2020-08-08 17:09:23,774 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2020-08-08 17:09:23,777 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local1785045039_0001
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.io.EOFException
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:165)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1268)
	at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:35)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:87)
	at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1600)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at java.io.DataInputStream.readLong(DataInputStream.java:416)
	at com.yangyang.bean.GroupBean.readFields(GroupBean.java:35)
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
	... 15 more
2020-08-08 17:09:24,193 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local1785045039_0001 running in uber mode : false
2020-08-08 17:09:24,194 INFO [org.apache.hadoop.mapreduce.Job] -  map 0% reduce 0%
2020-08-08 17:09:24,195 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local1785045039_0001 failed with state FAILED due to: NA
2020-08-08 17:09:24,201 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 10
	Map-Reduce Framework
		Map input records=19814
		Map output records=19814
		Map output bytes=951072
		Map output materialized bytes=0
		Input split bytes=91
		Combine input records=0
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
	File Input Format Counters 
		Bytes Read=1793115
false

Process finished with exit code 1

这让我找了半天的错误,好奇葩,网上搜索也没有,然后不断看源码也找不出什么错误。
之后就觉得是不是readLine()之后再read其他数据会出问题,果然真就是这样的。
随后我改成了这样的形式:
在这里插入图片描述然后发现,居然没报错!!!!!!!!!!!
在这里插入图片描述我要疯了啊,有没有大佬知道是为什么啊!!!!!

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一定要努力啊!!

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值