问题的产生
自己在做一个关于用户数据行为分析的项目。首先先看一下项目的数据吧!
因为导师说之后的数据量更大,所以我需要使用Hive进行数据存储和使用MapReduce进行数据的计算分析。当然,我们当前的数据量是两亿条,差不多18个G的数据。所以我先写了一些MR的测试代码。
写MR的思路
因为我需要计算出车辆每天的行驶时间、里程、速度等信息,所以我需要写一个bean,里面包含车辆ID和日期,用来当做key并且根据key进行分组。
```java
package com.yangyang.bean;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class GroupBean implements WritableComparable<GroupBean> {
private int car_ID;
private String date;
private long time;
public GroupBean(int car_ID, String date,long time) {
super();
this.car_ID = car_ID;
this.date = date;
this.time = time;
}
public GroupBean() {
super();
}
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeLong(time);
dataOutput.writeInt(car_ID);
dataOutput.writeChars(date);
}
public void readFields(DataInput dataInput) throws IOException {
time = dataInput.readLong();
car_ID = dataInput.readInt();
date = dataInput.readLine();
}
public int getCar_ID() {
return car_ID;
}
public String getDate() {
return date;
}
public void setCar_ID(int car_ID) {
this.car_ID = car_ID;
}
public void setDate(String date) {
this.date = date;
}
public long getTime() {
return time;
}
public void setTime(long time) {
this.time = time;
}
@Override
public String toString() {
return car_ID + "," + date;
}
public int compareTo(GroupBean o) {
int result;
if(car_ID>o.getCar_ID()){
result = 1;
}else if(car_ID<o.getCar_ID()){
result = -1;
}else{
result = time>o.getTime() ? 1:-1;
}
return result;
}
}
这就是我写的key_bean,但是这里就非常奇怪,我之前序列化read()的写法是这样的:
就会报出这样的错误:
2020-08-08 17:09:20,586 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-08-08 17:09:20,588 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2020-08-08 17:09:22,825 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-08-08 17:09:22,869 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2020-08-08 17:09:22,880 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2020-08-08 17:09:22,920 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2020-08-08 17:09:23,010 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1785045039_0001
2020-08-08 17:09:23,190 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2020-08-08 17:09:23,191 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local1785045039_0001
2020-08-08 17:09:23,192 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2020-08-08 17:09:23,200 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-08-08 17:09:23,202 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-08-08 17:09:23,266 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2020-08-08 17:09:23,267 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local1785045039_0001_m_000000_0
2020-08-08 17:09:23,300 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-08-08 17:09:23,306 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-08-08 17:09:23,372 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@74df4484
2020-08-08 17:09:23,379 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/F:/java项目/20.txt:0+1793115
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2020-08-08 17:09:23,430 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2020-08-08 17:09:23,433 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.LocalJobRunner] -
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-08-08 17:09:23,741 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 951072; bufvoid = 104857600
2020-08-08 17:09:23,742 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26135144(104540576); length = 79253/6553600
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - (RESET) equator 0 kv 26214396(104857584) kvi 26135140(104540560)
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 951072; bufvoid = 104857600
2020-08-08 17:09:23,760 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26135144(104540576); length = 79253/6553600
2020-08-08 17:09:23,770 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@7bf9f47a
java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:165)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1268)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:35)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:87)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1600)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2019)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:797)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at com.yangyang.bean.GroupBean.readFields(GroupBean.java:35)
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
... 16 more
2020-08-08 17:09:23,774 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2020-08-08 17:09:23,777 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local1785045039_0001
java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:165)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1268)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:35)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:87)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1600)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at com.yangyang.bean.GroupBean.readFields(GroupBean.java:35)
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
... 15 more
2020-08-08 17:09:24,193 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local1785045039_0001 running in uber mode : false
2020-08-08 17:09:24,194 INFO [org.apache.hadoop.mapreduce.Job] - map 0% reduce 0%
2020-08-08 17:09:24,195 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local1785045039_0001 failed with state FAILED due to: NA
2020-08-08 17:09:24,201 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 10
Map-Reduce Framework
Map input records=19814
Map output records=19814
Map output bytes=951072
Map output materialized bytes=0
Input split bytes=91
Combine input records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
File Input Format Counters
Bytes Read=1793115
false
Process finished with exit code 1
这让我找了半天的错误,好奇葩,网上搜索也没有,然后不断看源码也找不出什么错误。
之后就觉得是不是readLine()之后再read其他数据会出问题,果然真就是这样的。
随后我改成了这样的形式:
然后发现,居然没报错!!!!!!!!!!!
我要疯了啊,有没有大佬知道是为什么啊!!!!!