问题发现:
这个问题最早是在系统日志输出里发现的,报“FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out”,服务并没有退出,仍然在响应请求,只是响应速度很慢。
<--- Last few GCs --->
[23246:0x3d1ca00] 269438356 ms: Mark-sweep 1407.7 (1460.2) -> 1407.7 (1460.2) MB, 1198.9 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1339 ms) last resort GC in old space requested
[23246:0x3d1ca00] 269439537 ms: Mark-sweep 1407.7 (1460.2) -> 1407.7 (1460.2) MB, 1181.3 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x2bd3ea725891 <JSObject>
2: _addHeaderLine [_http_incoming.js:297] [bytecode=0x3a2d866bea81 offset=109](this=0x3097a24e1369 <IncomingMessage map = 0x6d4fcd9fec9>,field=0x3097a24e1731 <String[6]: accept>,value=0x3097a24a8849 <String[3]: */*>,dest=0x3097a24e1681 <Object map = 0x88d0ac823b9>)
3: _addHeaderLines [_http_incoming.js:133] [bytecode=0x3a2d866be791 offset=83](this=0x3097a24e1369 <IncomingMessage map = 0x...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
问题排查:
1) 导出堆的快照
“JavaScript heap out”这句错误信息说明系统已经内存溢出了。Nodejs 12.x开始已经内置有heapdump导出函数,不必像之前的版本那样还需要安装第三方包。然后加入如下代码:
process.on('SIGUSR2', () => {
const { writeHeapSnapshot } = require('v8');
console.log('Heap snapshot has written:', writeHeapSnapshot());
});
这段代码的作用是当进程收到SIGUSR2信号时则导出一份heapdump。导出文件示例如下,文件名里包括了年月日.时分秒.进程PID.以及该进程是第几次导出。001是服务启动后流量接入前导出的, 002-004是流量接入后导出的,005-007是流量再次断开无请求的状态下导出的。可以看出,heap在接入流量后从5M涨到了42M,断开流量一小时后仍然维持在42M左右,内存得不到回收,这就可以肯定有内存泄漏了.
-rw-r--r-- 1 root root 5130200 Jun 3 13:30 Heap.20200603.133025.5898.0.001.heapsnapshot
-rw-r--r-- 1 root root 19670958 Jun 3 14:00 Heap.20200603.140019.5898.0.002.heapsnapshot
-rw-r--r-- 1 root root 43561030 Jun 3 14:51 Heap.20200603.145104.5898.0.003.heapsnapshot
-rw-r--r-- 1 root root 44032344 Jun 3 15:06 Heap.20200603.150655.5898.0.004.heapsnapshot
-rw-r--r-- 1 root root 44001163 Jun 3 16:08 Heap.20200603.160831.5898.0.005.heapsnapshot
-rw-r--r-- 1 root root 43894074 Jun 3 16:19 Heap.20200603.161936.5898.0.006.heapsnapshot
-rw-r--r-- 1 root root 43899897 Jun 3 16:30 Heap.20200603.163002.5898.0.007.heapsnapshot
2) 对比堆的快照
把流量接入前后流量断开1小时后的两份快照导入Chrome DevTool,选择Summary,过滤条件是两个快照之间创建的对象. 按Retained Size(包括所引用的对象的总内存大小)倒排序, 可以看到排在第一位的是数组,占81%, 点看一个对象可以看这个数组的元素, 这正好是业务日志模块对应的数组, 对应的代码也附在下面.
- 4.9M Jun 3 13:30 Heap.20200603.133025.5898.0.001.heapsnapshot
- 42M Jun 3 16:19 Heap.20200603.161936.5898.0.006.heapsnapshot
版权所有,转载请注明出处 https://mp.csdn.net/console/editor/html/106571327
对应的代码:
function (param, reqContext) {
var str;
str = [
logUtil.getLogType(LOG_TYPE, LOG_VERSION),
logUtil.UTCTime(reqContext.req),
logUtil.formatValue(reqContext.reqId, '-'),
logUtil.formatValue(hostID, '-'),
process.pid,
reqContext.user.ip,
logUtil.formatValue(reqContext.user.mac, '-'),
logUtil.formatValue(param.partenerName, '-'),
logUtil.formatValue(param.cookie_value, '-'),
logUtil.formatValue(param.partener_cookie_value, '-'),
'"' + reqContext.req.headers['user-agent'] + '"'
];
return logger.log(str);
};
3)问题定位
把 logger.log(str)这句代码注释掉,重新再测试,发现内存溢出的问题没有了. 所以问题就定位到这儿了.继续往下排查,logger是我们自行开发的一个日志模块,一条日志可以同时打印到多个appender. 其中有一个memory_appender目的是方便自动化测试时回读和校验业务日志输出, 它把日志存在内存里, 在一组测试用例执行完毕时外部通过调用clear()方法来清除内存. 线上环境本来是不能有这个appender的,但是配置的时候人为配错了,把它配上线了。memory_appender的代码如下, 因为MemoryAppender是在服务启动时一次性初始化的全局变量,所以this.data也就是个全局数组。
MemoryAppender = (function(_super) {
__extends(MemoryAppender, _super);
function MemoryAppender(params) {
this.data = [];
return;
}
MemoryAppender.prototype.append = function(loggerName, level, msg) {
var key, newMsg, _i, _len;
newMsg = [];
if (util.isArray(msg)) {
for (_i = 0, _len = msg.length; _i < _len; _i++) {
key = msg[_i];
if (key != null ? typeof key.toString === "function" ? key.toString() : void 0 : void 0) {
newMsg.push(key.toString());
}
}
this.data.push(newMsg);
} else {
this.data.push([msg]);
}
};
return MemoryAppender;
})(AbstractAppender);
MemoryAppender.prototype.isEmpty = function() {
return this.data.length === 0;
};
MemoryAppender.prototype.get = function() {
return this.data;
};
MemoryAppender.prototype.clear = function() {
this.data = [];
};
exports.MemoryAppender = MemoryAppender;
}).call(this);
总结:
为了自动化测试写的功能模块也不能马虎,也得考虑到内存溢出等问题,因为完全有可能因为人为错误把测试环境下的模块配置到生产环境上.
相关文章:
《Nodejs: MemoryUsage()返回的rss,heapTotal,heapUsed,external的含义和区别》