最近在用NODE.JS做项目,发现每当一到将近零点,服务器总会崩溃并且生成core文件。
一开始我像调试C++代码一样,用gdb加载core文件,输入bt,发现堆栈显示的都是v8的底层C++函数,完全找不到js函数的影子,所以问题也难以定位。
于是我只好在有可能崩溃的地方都加了try catch,但终究无济于事,每天照旧崩溃,百度搜了很久也没发现有相关的资料,最后折腾了许久终于在Google上发现了llnode这个lldb调试器的插件,并依靠此插件轻松地找出并解决了问题。
但这个插件有个缺点就是对RedHat/Fedora/CentOS的支持很差,而我平时又一般使用CentOS 6.9来做开发,我尝试了几遍都报错并崩溃退出(不知道是不是因为我使用SecureCRT的原因,lldb显示了两个相同的提示符):
(lldb) (lldb) v8 bt
Stack dump:
0. HandleCommand(command = "v8 bt")
段错误 (core dumped)
最后我只好按官方的说明,另外装了个Ubuntu 16.04虚拟机来调试,实验证明llnode可以在Ubuntu上很好地运行,于是现在唯一的问题就是,如果要调试core文件,必须将文件复制到虚拟机里面,并且确定node的版本与CentOS里的一致才可以成功调试。
lldb可以直接使用apt-get install lldb安装,编译llnode的过程我在这里就不再赘述,大家可以参考官方网站的介绍:
https://github.com/nodejs/llnode
我在编译的过程中基本没遇上什么坑,比较顺利。编译完毕之后,把生成的llnode.so文件复制到/usr/lib/llnode.so,然后修改~/.lldbinit文件(文件不存在的需要新增一个),新增一行plugin load /usr/lib/llnode.so,这样每次启动lldb时都可以自动加载llnode插件。
下面写一个简单的测试例子,代码保存为test.js
'use strict';
function ff(fa) {
for (let i = 0; i < 999999; i++) {
fa.push(i);
}
setTimeout(() => { ff(fa); }, 1);
}
var fa = [ 1,2,3,4,5 ];
ff(fa);
修改/etc/sysctl.conf,增加两行:
kernel.core_uses_pid = 1
kernel.core_pattern = core.%e.%p
修改/etc/profile,增加一行:
ulimit -c unlimited
执行以下命令:
sysctl -p
source /etc/profile
node test.js
过几秒后,会报出类似于下面的错误:
<--- Last few GCs --->
[4314:0x3c1d690] 3957 ms: Mark-sweep 667.3 (674.8) -> 667.3 (674.8) MB, 119.3 / 0.0 ms allocation failure GC in old space requested
[4314:0x3c1d690] 4090 ms: Mark-sweep 667.3 (674.8) -> 667.3 (671.8) MB, 132.8 / 0.0 ms last resort GC in old space requested
[4314:0x3c1d690] 4211 ms: Mark-sweep 667.3 (671.8) -> 667.3 (671.8) MB, 120.4 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0xcefec8a5879 <JSObject>
1: ff(aka ff) [/root/test.js:~3] [pc=0x2f52c9c8992c](this=0x3adfad5022d1 <undefined>,fa=0x39901e889939 <JSArray[86956787]>)
2: /* anonymous */ [/root/test.js:8] [bytecode=0x1fcc017eaf41 offset=11](this=0x215eecfbc7c1 <Timeout map = 0x199858944571>)
3: ontimeout(aka ontimeout) [timers.js:498] [bytecode=0x1fcc017eab21 offset=45](this=0x3adfad5022d1 <undefined>,timer=0x215eecfbc7c1 <Time...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0x8c21ec [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::Factory::NewUninitializedFixedArray(int) [node]
6: 0xd4aff3 [node]
7: v8::internal::Runtime_GrowArrayElements(int, v8::internal::Object**, v8::internal::Isolate*) [node]
8: 0x2f52c9b042fd
已放弃 (核心已转储)
并且在当前目录中会生成一个名为core.node.(数字)的文件,从以上js代码可以看出,这是由于不断往fa数组中push整数,导致内存不足node崩溃。
我们使用lldb加载core文件
# lldb `which node` -c core.node.12345
(lldb) target create --core "core.node.12345"
Core file '/root/core.node.12345' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGABRT
* frame #0: libc.so.6`__GI_raise(sig=6) at raise.c:54
frame #1: libc.so.6`__GI_abort at abort.c:89
frame #2: 0x00007fff39c701b1 nodejs`node::Abort() + 33
frame #3: 0x00007fff39c701ec nodejs`node::OnFatalError(char const*, char const*) + 44
frame #4: 0x00007fff39ddd342 nodejs`v8::Utils::ReportOOMFailure(char const*, bool) + 114
frame #5: 0x00007fff39ddd554 nodejs`v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) + 484
frame #6: 0x00007fff3a1288cc nodejs`v8::internal::Factory::NewUninitializedFixedArray(int) + 220
frame #7: 0x00007fff3a0f8ff3 nodejs`v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedSmiElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)0> >::GrowCapacity(v8::internal::Handle<v8::internal::JSObject>, unsigned int) + 163
frame #8: 0x00007fff3a37f15d nodejs`v8::internal::Runtime_GrowArrayElements(int, v8::internal::Object**, v8::internal::Isolate*) + 173
frame #9: 0x00002f52c9b042fd
frame #10: 0x00002f52c9c8992c
frame #11: 0x00002f52c9bbd1d6
frame #12: 0x00002f52c9bbd1d6
frame #13: 0x00002f52c9bbd1d6
frame #14: 0x00002f52c9bbd1d6
frame #15: 0x00002f52c9b04239
frame #16: 0x00002f52c9b04101
frame #17: 0x00007fff3a11d94a nodejs`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 266
frame #18: 0x00007fff39df08f3 nodejs`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 355
frame #19: 0x00007fff39c75011 nodejs`node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) + 177
frame #20: 0x00007fff39c51498 nodejs`node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) + 120
frame #21: 0x00007fff39cfdf9f nodejs`node::(anonymous namespace)::TimerWrap::OnTimeout(uv_timer_s*) + 127
frame #22: nodejs`uv__run_timers(loop=<unavailable>) at timer.c:165
frame #23: nodejs`uv_run(loop=<unavailable>, mode=<unavailable>) at core.c:359
frame #24: 0x00007fff39c8243d nodejs`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 1197
frame #25: 0x00007fff39c7a8fd nodejs`node::Start(int, char**) + 365
frame #26: libc.so.6`__libc_start_main(main=(nodejs`main), argc=2, argv=0x00007fff39757b48, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff39757b38) at libc-start.c:291
frame #27: 0x00007fff39c491b1 nodejs`_start + 41
可以看到,frame 9 ~ 16都是空的,lldb并不能解析出js堆栈,我们使用llnode插件试下:
(lldb) v8 bt
* thread #1: tid = 4314, 0x00007f5586980428 libc.so.6`__GI_raise(sig=6) at raise.c:54, name = 'node', stop reason = signal SIGABRT
* frame #0: libc.so.6`__GI_raise(sig=6) at raise.c:54
frame #1: libc.so.6`__GI_abort at abort.c:89
frame #2: 0x00007fff39c701b1 nodejs`node::Abort() + 33
frame #3: 0x00007fff39c701ec nodejs`node::OnFatalError(char const*, char const*) + 44
frame #4: 0x00007fff39ddd342 nodejs`v8::Utils::ReportOOMFailure(char const*, bool) + 114
frame #5: 0x00007fff39ddd554 nodejs`v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) + 484
frame #6: 0x00007fff3a1288cc nodejs`v8::internal::Factory::NewUninitializedFixedArray(int) + 220
frame #7: 0x00007fff3a0f8ff3 nodejs`v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedSmiElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)0> >::GrowCapacity(v8::internal::Handle<v8::internal::JSObject>, unsigned int) + 163
frame #8: 0x00007fff3a37f15d nodejs`v8::internal::Runtime_GrowArrayElements(int, v8::internal::Object**, v8::internal::Isolate*) + 173
frame #9: 0x00002f52c9b042fd <exit>
frame #10: 0x00002f52c9c8992c ff(this=0x00003adfad5022d1:<undefined>, 0x000039901e889939:<Array: length=86956787>) at /root/test.js:3:12 fn=0x000039901e889991
frame #11: 0x00002f52c9bbd1d6 setTimeout(this=0x0000215eecfbc7c1:<Object: Timeout>) at /root/test.js:8:13 fn=0x0000215eecfbc779
frame #12: 0x00002f52c9bbd1d6 ontimeout(this=0x00003adfad5022d1:<undefined>, 0x0000215eecfbc7c1:<Object: Timeout>) at timers.js:98:18 fn=0x00001fcc017982e9
frame #13: 0x00002f52c9bbd1d6 tryOnTimeout(this=0x00003adfad5022d1:<undefined>, 0x0000215eecfbc7c1:<Object: Timeout>, 0x0000215eecfbc8a1:<Object: TimersList>) at timers.js:98:18 fn=0x00001fcc017981c9
frame #14: 0x00002f52c9bbd1d6 listOnTimeout(this=0x0000215eecfbc949:<Object: Timer>) at timers.js:98:18 fn=0x00001fcc01798181
frame #15: 0x00002f52c9b04239 <internal>
frame #16: 0x00002f52c9b04101 <entry>
frame #17: 0x00007fff3a11d94a nodejs`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 266
frame #18: 0x00007fff39df08f3 nodejs`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 355
frame #19: 0x00007fff39c75011 nodejs`node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) + 177
frame #20: 0x00007fff39c51498 nodejs`node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) + 120
frame #21: 0x00007fff39cfdf9f nodejs`node::(anonymous namespace)::TimerWrap::OnTimeout(uv_timer_s*) + 127
frame #22: nodejs`uv__run_timers(loop=<unavailable>) at timer.c:165
frame #23: nodejs`uv_run(loop=<unavailable>, mode=<unavailable>) at core.c:359
frame #24: 0x00007fff39c8243d nodejs`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 1197
frame #25: 0x00007fff39c7a8fd nodejs`node::Start(int, char**) + 365
frame #26: libc.so.6`__libc_start_main(main=(nodejs`main), argc=2, argv=0x00007fff39757b48, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff39757b38) at libc-start.c:291
frame #27: 0x00007fff39c491b1 nodejs`_start + 41
使用llnode插件后,可以清楚看到js代码中崩溃的位置是
frame #10: 0x00002f52c9c8992c ff(this=0x00003adfad5022d1:<undefined>, 0x000039901e889939:<Array: length=86956787>) at /root/test.js:3:12 fn=0x000039901e889991
fa中已经push了86956787个元素!
llnode插件还有很多其他强大的功能,可以很快速方便地通过core文件定位到出错的位置和原因,更多的使用方法可以参考官方网站。