resin 监听线程数配置不当导致loadavg升高

背景

resin 是一款java实现的web服务器,在使用resin提供服务的时候发现:在低流量的情况下偶尔会出现高负载的情况

挑战的点:没有流量但是有负载抖动

先说结论

  1. 用户resin 网络线程配置过大(配置了2000)
  2. resin网络线程的行为是 poll -> accept
  3. 单个请求到达时,所有网络线程由于poll返回被唤醒,但是只有一个线程能够调用accept成功,类似于惊群
  4. 由于3,导致突增很多R进程,导致负载升高

确认是R进程导致的过程

查看D进程数量

部署/tmp/cpu.sh

#!/bin/bash

date 2>&1 >> /tmp/pid_D.log
/usr/bin/ps -aux | awk '$8 == "D"'   >>  /tmp/pid_D.log

在crontab中每分钟执行一次
查看/tmp/pid_D.log,几乎没有D进程,得出结论:当前loadavg主要由R构成

查看R进程调度情况

r进程是被唤醒的进程,通过ps查看属于一个采样动作,不准确,于是打开内核trace点,通过trace点能够100%采集唤醒动作

#!/bin/bash

echo  'comm == "java"' > /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/filter
echo  'comm == "java"' > /sys/kernel/debug/tracing/events/sched/sched_wakeup/filter
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/enable
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

nohup cat /sys/kernel/debug/tracing/trace_pipe > wakeup_trace.log &

打开sched_wakeup_new sched_wakeup两个trace点,输出到wakeup_trace.log中
使用work.py分析输出文件,按照固定时间段统计被唤醒的进程个数

from datetime import datetime

def calc_date(t):
    boot_time = 1666330944.762007849
    """
        date | cat > /dev/null # Make sure programs are in cache
        set -- $(date +%s.%N) $(cat /proc/uptime)
        echo "$1-$2" | bc
    """
    return datetime.fromtimestamp(boot_time + t)

def main():
    input = 'wakeup_trace.log'
    index = 0
    s = 0
    interval = 50 #ms
    dic = {}
    diff = 0
    for line in open(input):
        try:
            arr = line.split(':')
            #print line.split()
            pid = line.split()[6]
            t = int(float(arr[0].split(' ')[-1])*1000)
            cindex = t / interval
            t1 = cindex*interval/1000.
            if cindex != index:
                print calc_date(t1), ' ', t1, 'per', interval, 'ms:', s, ' diff cnt: ', diff
                s = 0
                diff = 0
                index = cindex
                dic = {}
            else:
                if not dic.get(pid, False):
                    dic[pid] = True
                    diff += 1
                s += 1
        except Exception, e:
            # print e
            pass

if '__main__' == __name__:
    main()

以50ms为周期统计, 输出到sched_a_out文件中

通过这样的手段分析出确实是进程过多唤醒导致高负载

现象复现

  1. 下载resin
wget https://caucho.com/download/resin-pro-4.0.66.tar.gz
  1. 在resin-pro-4.0.66/webapps/ROOT/下实现一个测试用的页面time.jsp
<html><body><center>
Now time is: <%=new java.util.Date()%>
</center></body></html>
  1. 在resin-pro-4.0.66目录执行下面的命令启/停resin
java -jar lib/resin.jar start
java -jar lib/resin.jar stop
  1. 通过修改conf/resin.properties中的配置项来修改监听线程数量,如下下是官方配置
# Throttle the number of active threads for a port
port_thread_max   : 256
accept_thread_max : 32
accept_thread_min : 4
# keepalive_max   : 512
  1. 修改配置将accept_thread_max accept_thread_min都调整到比较大的数值,比如200
# Throttle the number of active threads for a port
port_thread_max   : 256
accept_thread_max : 200
accept_thread_min : 200
# keepalive_max   : 512
  1. 使用ab进行压测, -c 参数使用1,表示同时只有一个请求发生,用户就是在这种低流量的场景发现负载升高,我们看看能否复现
ab -n 10000000 -c 1 http://127.0.0.1:8080/time.jsp
  1. uptime查看负载确认复现(当前使用的测试机为2c2g)
[root@VM-0-13-centos ~]# uptime
 11:01:41 up 30 days, 12:20,  2 users,  load average: 57.34, 15.76, 5.40

8.修改配置将accept_thread_max accept_thread_min都调整到1 负载降低下来

[root@VM-0-13-centos ~]# uptime
 11:06:27 up 30 days, 12:24,  2 users,  load average: 5.04, 17.69, 10.36

原因分析

strace角度

  1. 在使用200线程时
    1. 找到“-server com.caucho.server.resin.Resin” 以这个为参数的java进程进行strace,关注accept和poll
     strace -p 1016685 -f -yy -e accept,poll
    
    1. 可以看到在200线程时,有很多线程被poll唤醒, 有很多accpet调用,但是只用一个accept成功接受到socket
    [pid 1016758] <... poll resumed>)       = 1 ([{fd=45, revents=POLLIN}])
    [pid 1016757] <... poll resumed>)       = 1 ([{fd=45, revents=POLLIN}])
    [pid 1016756] <... poll resumed>)       = 1 ([{fd=45, revents=POLLIN}])
    [pid 1016755] <... poll resumed>)       = 1 ([{fd=45, revents=POLLIN}])
    [pid 1016754] <... poll resumed>)       = 1 ([{fd=45, revents=POLLIN}])
    ...
    
    有很多accept
    [pid 1016733] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016732] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016731] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016730] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016729] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016724] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016723] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016721] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    [pid 1016720] accept(45<TCPv6:[[::]:8080]>,  <unfinished ...>
    
    只有一个accept调用成功
    [pid 1017179] <... accept resumed>{sa_family=AF_INET6, sin6_port=htons(37432), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [28]) = 60<TCPv6:[[::ffff:127.0.0.1]:8080->[::ffff:127.0.0.1]:37432]>
    
    其他都失败
    [pid 1016967] <... accept resumed>0x7fd965352100, [28]) = -1 EAGAIN (资源暂时不可用)
    [pid 1016966] <... accept resumed>0x7fd965453340, [28]) = -1 EAGAIN (资源暂时不可用)
    [pid 1016964] <... accept resumed>0x7fd965654f80, [28]) = -1 EAGAIN (资源暂时不可用)
    ...
    
  2. 在使用单个监听线程的时候, 可以看到每一个成功的poll都对应一个成功的accept
    [pid 1019707] poll([{fd=59<TCPv6:[[::ffff:127.0.0.1]:8080->[::ffff:127.0.0.1]:39736]>, events=POLLIN|POLLERR}], 1, 120000) = 1 ([{fd=59, revents=POLLIN}])
    [pid 1019697] poll([{fd=45<TCPv6:[[::]:8080]>, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=45, revents=POLLIN}])
    [pid 1019697] accept(45<TCPv6:[[::]:8080]>, {sa_family=AF_INET6, sin6_port=htons(39738), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [28]) = 59<TCPv6:[[::ffff:127.0.0.1]:8080->[::ffff:127.0.0.1]:39738]>
    

从以上的信息来看,是因为resin的设计是“accept_thread_max accept_thread_min控制同时poll的线程数量,每一个poll成功之后会调用accept”,这就导致如果线程数量过多会造成“小流量唤醒很多线程的poll调用,但是真正只有很少的accept调用成功”,可以看成是一种“惊群”的现象。

用户配置2000线程的原因

用户误以为resin的网络线程的处理方式是串行的,也就是“在一个请求未处理完之前不会处理下一个请求“,这个认知已经通过如下的实验帮助用户纠正过来。

  1. 使用一个监听线程
  2. ab -c使用20进行压测
    [root@VM-0-13-centos ~]#  ab -n 1000 -c 20 http://127.0.0.1:8080/time.jsp
    ...
    Concurrency Level:      20
    Time taken for tests:   0.215 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      348000 bytes
    HTML transferred:       91000 bytes
    Requests per second:    4649.93 [#/sec] (mean)
    Time per request:       4.301 [ms] (mean)
    Time per request:       0.215 [ms] (mean, across all concurrent requests)
    Transfer rate:          1580.25 [Kbytes/sec] received
    
    从输出可以看出, 单个监听线程每秒可以扛住4649个request,所以一定不是串行模式,而应该是多路io复用的模式
  3. 真正影响业务并发线程数的参数是
    # port_thread_max restricts the active threads available for a single port.
    port_thread_max   : 256
    

对于高负载定位的快捷办法

使用loadavg_tracer

insmod loadavg_tracer.ko load_threshold=20 dump_interval=5

上面的命令会在load高于20的时候打印R/D进程的栈

[2640203.171984] R [001] resin-port-8080 1025456
[2640203.172912]
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[2640203.177809] R [001] resin-port-8080 1025457
[2640203.178725]
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[2640203.183616] R [000] resin-port-8080 1025459
[2640203.184535]
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[2640203.189394] R [001] resin-port-8080 1025460
[2640203.190102]
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[2640203.193815] R [001] resin-port-8080 1025461
[2640203.194624]
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[2640203.200751] R [001] resin-port-8080 1025511
[2640203.201538]
                  timer_callback.cold.2+0x154/0x1c8 [loadavg_tracer]
                  __hrtimer_run_queues+0x117/0x2a0
                  hrtimer_interrupt+0x101/0x210
                  smp_apic_timer_interrupt+0x74/0x140
                  apic_timer_interrupt+0xf/0x20
                  finish_task_switch+0x8b/0x250
                  poll_schedule_timeout.constprop.17+0x46/0x70
                  do_sys_poll+0x46b/0x500
                  __x64_sys_poll+0x3b/0x150
                  do_syscall_64+0x4d/0x150
                  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值