mapreduce+java+head,MapReduce GC overhead limit exceeded

1. 背景

异常堆栈如下

2015-12-23 10:44:45,289 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 71 segments left of total size: 43979223288 bytes

2015-12-23 10:44:45,372 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords

2015-12-23 11:08:39,995 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.util.Arrays.copyOf(Arrays.java:2271)

at java.lang.StringCoding.safeTrim(StringCoding.java:79)

at java.lang.StringCoding.access$300(StringCoding.java:50)

at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:305)

at java.lang.StringCoding.encode(StringCoding.java:344)

at java.lang.String.getBytes(String.java:916)

at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)

at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)

at java.io.File.isDirectory(File.java:843)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList(ProcfsBasedProcessTree.java:495)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:210)

at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)

at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)

at org.apache.hadoop.mapred.Task.access$500(Task.java:79)

at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)

at java.lang.Thread.run(Thread.java:724)

2015-12-23 11:09:39,065 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.io.UnixFileSystem.list(Native Method)

at java.io.File.list(File.java:1116)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList(ProcfsBasedProcessTree.java:488)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:210)

at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)

at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)

at org.apache.hadoop.mapred.Task.access$500(Task.java:79)

at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)

at java.lang.Thread.run(Thread.java:724)

2015-12-23 11:12:00,745 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)

at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)

at sun.nio.cs.StreamDecoder.(StreamDecoder.java:250)

at sun.nio.cs.StreamDecoder.(StreamDecoder.java:230)

at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:69)

at java.io.InputStreamReader.(InputStreamReader.java:74)

at java.io.FileReader.(FileReader.java:72)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:524)

at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:223)

at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)

at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)

at org.apache.hadoop.mapred.Task.access$500(Task.java:79)

at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)

at java.lang.Thread.run(Thread.java:724)

通过jmap打印heap如下

Attaching to process ID 92841, please wait...

Debugger attached successfully.

Server compiler detected.

JVM version is 24.0-b56

using thread-local object allocation.

Parallel GC with 23 thread(s)

Heap Configuration:

MinHeapFreeRatio = 40

MaxHeapFreeRatio = 70

MaxHeapSize = 4294967296 (4096.0MB)

NewSize = 268435456 (256.0MB)

MaxNewSize = 268435456 (256.0MB)

OldSize = 5439488 (5.1875MB)

NewRatio = 2

SurvivorRatio = 6

PermSize = 21757952 (20.75MB)

MaxPermSize = 134217728 (128.0MB)

G1HeapRegionSize = 0 (0.0MB)

Heap Usage:

PS Young Generation

Eden Space:

capacity = 201326592 (192.0MB)

used = 201326592 (192.0MB)

free = 0 (0.0MB)

100.0% used

From Space:

capacity = 33554432 (32.0MB)

used = 0 (0.0MB)

free = 33554432 (32.0MB)

0.0% used

To Space:

capacity = 33554432 (32.0MB)

used = 0 (0.0MB)

free = 33554432 (32.0MB)

0.0% used

PS Old Generation

capacity = 4026531840 (3840.0MB)

used = 4026067544 (3839.55721282959MB)

free = 464296 (0.44278717041015625MB)

99.9884690841039% used

PS Perm Generation

capacity = 32505856 (31.0MB)

used = 28384888 (27.06993865966797MB)

free = 4120968 (3.9300613403320312MB)

87.32238277312248% used

9035 interned Strings occupying 887704 bytes.

2. 产生原因

关于GC overhead limit exceeded , oracle官网有如下解释

The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option-XX:-UseGCOverheadLimit to the command line.

大意为:JVM默认启动的时候-XX:+UseGCOverheadLimit,即启用了该特性。这其实是JVM的一种推断,如果垃圾回收耗费了98%的时间,但是回收的内存还不到2%,那么JVM会认为即将发生OOM,让程序提前结束。当然我们可以使用-XX:-UseGCOverheadLimit,关掉这个特性。

reduce代码如下:

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

List dataList = new ArrayList();

String name = "";

String url = "";

Long cost = 0L;

String ip = "";

Long sum = 0L;

for (Text value : values) {

String data = value.toString();

sum++;

try {

Span span = JSON.parseObject(data, Span.class);

if (span.getSpanId().equalsIgnoreCase("0")) {

name = span.getName();

url = span.getFuncName();

cost = span.getCost();

ip = span.getLocal().getHost().replaceAll("/", "");

}

dataList.add(span);

} catch (Exception e) {

e.printStackTrace();

}

}

if (sum > 100) {

System.out.println(key.toString() + ":" + sum);

}

if (name.length() != 0) {

TraceData trace = new TraceData(name, url, cost, dataList);

trace.setIp(ip);

context.write(key, new Text(JSON.toJSONString(trace)));

trace = null;

}

dataList.clear();

}

可以分析出,将value toString了之后,存入list列表中,如果某个key非常多,则造成reduce内存非常紧张, 上述问题就是因为业务线的某一个key超过3000W(换算出来需要16G内存,远超过可使用的内存),造成gc频繁,但回收不掉的问题。

3. 解决方案

由于集群中 mapred.job.reduce.memory.mb 参数被设置成 final,不允许用户随意更改设置,故用户可以通过增加reduce数量,或者优化reduce中代码。

但上述问题第一种解决方式显然不行,跟业务沟通,修改reduce业务逻辑,对单个Key对应的value数量进行限制。

如果要分析的key对应value值确实有很多,可以利用hadoop的partition将数据打散,再进行处理,这里先简单介绍下partition。Hadoop中为采用的partition策略为HashPartitioner, HashPartitioner是MapReduce的默认partitioner。计算方法是reduce=(key.hashcode()&Integer.MAX_VALUE)% numReduceTasks,得到当前的目的reducer,当然我们可以采用定制Partitioner来解决问题,Partitioner是partitioner的基类,如果需要定制partitioner则需要继承该类。示例代码如下:

public static class RandomPartitioner extends Partitioner {

@Override

public int getPatition(Text key,Text value,int redueceNumber) {

return (int)((key.hashcode() * Match.random()) & Integer.MAX_VALUE) % numReduceTasks;

}

}

Sqoop中出现"GC overhead limit exceeded"错误通常是由于内存不足导致的。这个错误表示垃圾回收器(Garbage Collector)花费了太多的时间来回收垃圾,但仍然无法释放足够的内存。这可能是因为Sqoop作业处理的数据量太大,导致内存不足。 解决这个问题的方法有以下几种: 1. 增加内存限制:可以通过调整Sqoop作业的内存限制来解决该问题。可以通过在Sqoop命令中使用`--mapreduce.map.memory.mb`和`--mapreduce.reduce.memory.mb`参数来增加内存限制。例如: ```shell sqoop import --connect <连接字符串> --mapreduce.map.memory.mb 4096 --mapreduce.reduce.memory.mb 4096 ... ``` 2. 增加垃圾回收器的堆内存:可以通过调整垃圾回收器的堆内存大小来解决该问题。可以通过在Sqoop命令中使用`--mapreduce.map.java.opts`和`--mapreduce.reduce.java.opts`参数来增加堆内存大小。例如: ```shell sqoop import --connect <连接字符串> --mapreduce.map.java.opts "-Xmx4096m" --mapreduce.reduce.java.opts "-Xmx4096m" ... ``` 3. 减少数据量:如果数据量过大,可以考虑减少导入或导出的数据量,以降低内存使用。可以通过使用Sqoop的查询参数或分区参数来限制数据量。 4. 调整垃圾回收器的配置:可以尝试调整垃圾回收器的配置参数,以提高垃圾回收的效率。可以通过设置`--mapreduce.map.java.opts`和`--mapreduce.reduce.java.opts`参数来调整垃圾回收器的配置。例如: ```shell sqoop import --connect <连接字符串> --mapreduce.map.java.opts "-XX:+UseG1GC" --mapreduce.reduce.java.opts "-XX:+UseG1GC" ... ``` 这些方法可以帮助您解决Sqoop中的"GC overhead limit exceeded"错误。请根据您的具体情况选择适合的方法进行调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值