Server became hung when using Trend Micro real-time scanner with gluster storage.

环境

  • Red Hat Enterprise Linux (RHEL) 6.x
  • Red Hat Enterprise Linux (RHEL) 7.x
  • Using the native Gluster Clients

问题

  • Gluster clients are getting hang.
  • The following errors were reported:

    Raw

    [11430(bash)]: gsch_scan(1179828,1,0) - interrupted & wait(1000)
    [11430(bash)]: gsch_scan(1179828,1,0) - interrupted & wait(1000): timeout = -512
    
  • Trend Micro Deep Security Agent is tainting the kernel, dsa_filter P(U), redirfs (U), gsch (U)

决议

Please contact Trend Micro with this issue. Trend Micro will request both the vmcore and sosreport.

Workaround

One possible workaround from Trend Micro is to exclude gluster binaries from their Real-Time Scanner:

  • On the Deep Security Manager (DSM) console, go to the security profile or policy assigned to the machine.
  • Go to Anti-malware > Real-time Scan > Exclusions tab.
  • Under Directory List, click Create New or Edit the existing exclusions.
  • Add the following entries:
    • /usr/sbin/glusterfs
    • /usr/sbin/glusterfsd
  • Then save the changes.

根源

Currently, Trend Micro Deep Security Agent is not compatible with Gluster filesystem.

诊断步骤

The following vmcore was captured during slow system performance:

Raw

          KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-642.11.1.el6.x86_64/vmlinux
        DUMPFILE: /crash/vmcore  [PARTIAL DUMP]
            CPUS: 8
            DATE: Wed Jul 26 05:05:57 2017
          UPTIME: 03:13:51
    LOAD AVERAGE: 0.03, 0.02, 0.00
           TASKS: 2326
        NODENAME: gluster-client.example.com
         RELEASE: 2.6.32-642.11.1.el6.x86_64
         VERSION: #1 SMP Wed Oct 26 10:25:23 EDT 2016
         MACHINE: x86_64  (2197 Mhz)
          MEMORY: 128 GB
           PANIC: "SysRq : Trigger a crash"
             PID: 6503
         COMMAND: "bash"
            TASK: ffff881edbdb6ab0  [THREAD_INFO: ffff881edbddc000]
             CPU: 4
           STATE: TASK_RUNNING (SYSRQ)

The following Trend Micro kernel modules were loaded

Raw

    crash> mod -t
    NAME        TAINTS
    dsa_filter  P(U)
    redirfs     (U)
    gsch        (U)

Raw

    crash> sys -t
    TAINTED_MASK: 1  P

The following dmesg errors were found:

Raw

    [11430(bash)]: gsch_scan(1179828,1,0) - interrupted & wait(1000)
    [11430(bash)]: gsch_scan(1179828,1,0) - interrupted & wait(1000): timeout = -512
    [11452(bash)]: gsch_scan(1179671,1,0) - interrupted & wait(1000)
    [11452(bash)]: gsch_scan(1179671,1,0) - interrupted & wait(1000): timeout = -512
    [11455(bash)]: gsch_scan(535633,1,0) - interrupted & wait(1000)
    [11455(bash)]: gsch_scan(535633,1,0) - interrupted & wait(1000): timeout = -512
    [11485(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000)
    [11485(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000): timeout = -512
    [11678(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000)
    [11678(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000): timeout = -512
    [11794(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000)
    [11794(bash)]: gsch_scan(1179671,1,0) - interrupted & wait others(1000): timeout = -512
    [11806(bash)]: gsch_scan(1179828,1,0) - interrupted & wait others(1000)
    [11806(bash)]: gsch_scan(1179828,1,0) - interrupted & wait others(1000): timeout = -512
    SysRq : Trigger a crash
    BUG: unable to handle kernel NULL pointer dereference at (null)

The following shows a huge number of processes in the sleep state S (IN) currently waiting for 'gsch' module to return.

Raw

crash> foreach IN bt | grep ' #1' | grep gsch | wc -l
979

Some were even there for more than 2 hours.

Raw

crash> ps -m 9526
[0 02:12:40.030] [IN]  PID: 9526   TASK: ffff881d771a4ab0  CPU: 1   COMMAND: "siebmtshmw"
crash> bt 9526
PID: 9526   TASK: ffff881d771a4ab0  CPU: 1   COMMAND: "siebmtshmw"
 #0 [ffff881d2dba7a48] schedule at ffffffff81548980
 #1 [ffff881d2dba7b20] gsch_scan at ffffffffa04a9efc [gsch]
 #2 [ffff881d2dba7bf0] gsch_policy_handle_open at ffffffffa04a81b6 [gsch]
 #3 [ffff881d2dba7c00] gsch_flt_open at ffffffffa04a679c [gsch]
 #4 [ffff881d2dba7c50] rfs_precall_flts at ffffffffa0492aa0 [redirfs]
 #5 [ffff881d2dba7c80] rfs_open at ffffffffa048b323 [redirfs]
 #6 [ffff881d2dba7d30] __dentry_open at ffffffff81196c52
 #7 [ffff881d2dba7d90] nameidata_to_filp at ffffffff81196fc4
 #8 [ffff881d2dba7db0] do_filp_open at ffffffff811aceb0
 #9 [ffff881d2dba7f20] do_sys_open at ffffffff811969f7
#10 [ffff881d2dba7f70] compat_sys_open at ffffffff811eeb4a
#11 [ffff881d2dba7f80] sysenter_dispatch at ffffffff8105b080
    RIP: 0000000055555430  RSP: 0000000080dfd738  RFLAGS: 00000296
    RAX: 0000000000000005  RBX: ffffffff8105b080  RCX: 0000000000000000
    RDX: 00000000000001b6  RSI: 000000000060b40b  RDI: 0000000000000002
    RBP: 0000000080dfd798   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffffff811eeb4a
    R13: ffff881d2dba7f78  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000005  CS: 0023  SS: 002b

From the above 979 processes, randomly picked the following process.

Raw

crash> ps -m 9538
[0 02:12:25.296] [IN]  PID: 9538   TASK: ffff881d4605f520  CPU: 1   COMMAND: "siebmtshmw"
crash> bt 9538
PID: 9538   TASK: ffff881d4605f520  CPU: 1   COMMAND: "siebmtshmw"
 #0 [ffff881d2dbefa48] schedule at ffffffff81548980
 #1 [ffff881d2dbefb20] gsch_scan at ffffffffa04a9efc [gsch]
 #2 [ffff881d2dbefbf0] gsch_policy_handle_open at ffffffffa04a81b6 [gsch]
 #3 [ffff881d2dbefc00] gsch_flt_open at ffffffffa04a679c [gsch]
 #4 [ffff881d2dbefc50] rfs_precall_flts at ffffffffa0492aa0 [redirfs]
 #5 [ffff881d2dbefc80] rfs_open at ffffffffa048b323 [redirfs]
 #6 [ffff881d2dbefd30] __dentry_open at ffffffff81196c52
 #7 [ffff881d2dbefd90] nameidata_to_filp at ffffffff81196fc4
 #8 [ffff881d2dbefdb0] do_filp_open at ffffffff811aceb0
 #9 [ffff881d2dbeff20] do_sys_open at ffffffff811969f7
#10 [ffff881d2dbeff70] compat_sys_open at ffffffff811eeb4a
#11 [ffff881d2dbeff80] sysenter_dispatch at ffffffff8105b080
    RIP: 0000000055555430  RSP: 0000000082dfb738  RFLAGS: 00000296
    RAX: 0000000000000005  RBX: ffffffff8105b080  RCX: 0000000000000000
    RDX: 00000000000001b6  RSI: 000000000060b40b  RDI: 0000000000000002
    RBP: 0000000082dfb798   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffffff811eeb4a
    R13: ffff881d2dbeff78  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000005  CS: 0023  SS: 002b

The above process is trying to open '/etc/resolve.conf', but redirfs' intercepted the original 'open()' call and called 'gsch_scan()' function in 'gsch' module and this went to sleep with S state.

Raw

crash> do_filp_open
do_filp_open = $4 = 
 {struct file *(int, struct filename *, int, int, int)} 0xffffffff811ac7e0 <do_filp_open>

Raw

crash> dis -lr ffffffff811969f7 | tail
0xffffffff811969dd <do_sys_open+0x4d>:  js     0xffffffff81196a87 <do_sys_open+0xf7>
/usr/src/debug/kernel-2.6.32-642.11.1.el6/linux-2.6.32-642.11.1.el6.x86_64/fs/open.c: 907
0xffffffff811969e3 <do_sys_open+0x53>:  xor    %r8d,%r8d
0xffffffff811969e6 <do_sys_open+0x56>:  mov    %r15d,%ecx
0xffffffff811969e9 <do_sys_open+0x59>:  mov    %r14d,%edx
0xffffffff811969ec <do_sys_open+0x5c>:  mov    %r13,%rsi
0xffffffff811969ef <do_sys_open+0x5f>:  mov    %r12d,%edi
0xffffffff811969f2 <do_sys_open+0x62>:  callq  0xffffffff811ac7e0 <do_filp_open>
/usr/src/debug/kernel-2.6.32-642.11.1.el6/linux-2.6.32-642.11.1.el6.x86_64/fs/open.c: 908
0xffffffff811969f7 <do_sys_open+0x67>:  cmp    $0xfffffffffffff000,%rax

Raw

/usr/src/debug/kernel-2.6.32-642.11.1.el6/linux-2.6.32-642.11.1.el6.x86_64/fs/namei.c: 2168
0xffffffff811ac7e0 <do_filp_open>:      push   %rbp
0xffffffff811ac7e1 <do_filp_open+0x1>:  mov    %rsp,%rbp
0xffffffff811ac7e4 <do_filp_open+0x4>:  sub    $0x160,%rsp
0xffffffff811ac7eb <do_filp_open+0xb>:  mov    %rbx,-0x28(%rbp)
0xffffffff811ac7ef <do_filp_open+0xf>:  mov    %r12,-0x20(%rbp)
0xffffffff811ac7f3 <do_filp_open+0x13>: mov    %r13,-0x18(%rbp)

Raw

crash> bt -f | grep do_sys_open -B 1
    ffff881d2dba7f18: ffff881d2dba7f68 ffffffff811969f7 
 #9 [ffff881d2dba7f20] do_sys_open at ffffffff811969f7
crash> px 0xffff881d2dba7f18-0x18
$5 = 0xffff881d2dba7f00
crash> rd 0xffff881d2dba7f00
ffff881d2dba7f00:  ffff881d2d932000                    . .-....

Raw

crash> struct filename ffff881d2d932000
struct filename {
  name = 0xffff881d2d932020 "/etc/resolv.conf", 
  uptr = 0x60d18c <Address 0x60d18c out of bounds>, 
  aname = 0xffff881db791c460, 
  separate = 0x0
}

This needs to be further investigated by gsch module provider (Trend Micro).

 

转载至https://access.redhat.com/solutions/3227401

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值