Network write system call latency

转载 2016年01月17日 17:41:19

Oliver Yang / 2015-07-29 


Today, one guy told me he observed the write system calllatency for 80 bytes write is about 2ms on a 10G network. He suspected thatthis indicates some network latency problems.

Is he right?

The answer is no. At least, I am not sure. Depending on yourworkload.

I told him to capture the packets by the tools like tcpdumpto confirm his suspects. For network write system call, 2ms latency might notbe quite bad.

  • Why do I think 2ms write system call latency is not bad?

Because it is quite possible that write system call could bereturned at millisecond level, but packet got sent out at microsecond level.

In Linux network stack, under the write system call context,it could send the packets first, then got interrupted by NIC driver RXinterrupts, and run into packets receive loop. Especially, when write code putthe packets to NIC, it disabled interrupts in that period. That means, afterinterrupt got enabled, it would cause the interrupts got fired immediately ifany IRQs were bound to this CPU.

If the system is under heavy network traffic,the net_rx_action code actually allows polling loop for more than 2jiffies, which is actually 2ms.

  static voidnet_rx_action(struct softirq_action *h)

  {

      structsoftnet_data *sd = this_cpu_ptr(&softnet_data);

      unsigned longtime_limit = jiffies + 2;

      int budget =netdev_budget;

      LIST_HEAD(list);

     LIST_HEAD(repoll);

 

     local_irq_disable();

     list_splice_init(&sd->poll_list, &list);

      local_irq_enable();

 

      for (;;) {

          structnapi_struct *n;

 

          if(list_empty(&list)) {

              if(!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))

                 return;

              break;

          }

 

          n =list_first_entry(&list, struct napi_struct, poll_list);

          budget -=napi_poll(n, &repoll);

 

          /* Ifsoftirq window is exhausted then punt.

           * Allowthis to run for 2 jiffies since which will allow

           * anaverage latency of 1.5/HZ.

           */

          if(unlikely(budget <= 0 ||

                  time_after_eq(jiffies, time_limit))) { // time limitation check, breakif more than 2ms

             sd->time_squeeze++;

              break;

          }

      }

 

     [...........snipped..........]

 

  }

In fact, in latest kernel __do_softirq has thesimilar logic to limit the soft_irq loop by bothMAX_SOFTIRQ_RESTART and 2jiffies time limitations.

Anyway, today's Linux interrupt context, the latency couldbe many milliseconds.

For this reason it is quite possible that the packet is sentby write system call first at us speed. But after that, write system callcontinue to receive the packets, and run into longer loop for packets RXhandling.

Just few times of 2ms latency is not a problem. What if hesaw average write system call latency is 2ms?

It is hard to say. His testing is a system wide testing,which will not only stress network stack, bust also stress file system andstorage IO stack. That means, if write system call got pinned by the interruptsnot related to network workload, he can also see this issue. In this case, itis not network problem.

  • How does the code path look like?

Linux provides the rich set of dynamic tracing tools toallow us to learn kernel code path.

In this case, I used funcgraph,which is a shell scripts based on Linux Ftrace tool.

From the output, we can see in do_sync_write code path,after the packets got sent, it started to receive packet. At the end of rx codepath, it calls into ep_poll_callback to notify the data is ready.

This is a not a very busy system, but the do_sync_write tookmore than 82us to return.

# ./funcgraph -d 1 do_sync_write

  Tracing"do_sync_write" for 1 seconds...

   13)               |  do_sync_write() {

   13)               |    sock_aio_write() {

   13)               |      inet_sendmsg() {

   13)               |        tcp_sendmsg() {

   13)               |          lock_sock_nested() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.071 us   |              local_bh_disable();

   13)   1.097 us   |            }

   13)   0.136 us   |            _raw_spin_unlock();

   13)   0.037 us   |            local_bh_enable();

   13)   2.558 us   |          }

   13)               |          tcp_send_mss() {

   13)               |            tcp_current_mss() {

   13)   0.202 us   |              ipv4_mtu();

   13)               |              tcp_established_options() {

   13)               |                tcp_v4_md5_lookup() {

   13)   0.039 us   |                 tcp_v4_md5_do_lookup();

   13)   0.357 us   |                }

   13)   0.752 us   |              }

   13)   1.962 us   |            }

   13)   2.325 us   |          }

   13)               |          sk_stream_alloc_skb() {

   13)               |            __alloc_skb() {

   13)   0.141 us   |             kmem_cache_alloc_node();

   13)               |              __kmalloc_node_track_caller() {

   13)   0.085 us   |               kmem_find_general_cachep();

   13)   0.148 us   |               kmem_cache_alloc_node_trace();

   13)   0.897 us   |              }

   13)               |              ksize() {

   13)   0.036 us   |                __phys_addr();

   13)   0.583 us   |              }

   13)   3.140 us   |            }

   13)   3.615 us   |          }

   13)               |          __tcp_push_pending_frames() {

   13)               |            tcp_write_xmit() {

   13)               |              tcp_init_tso_segs() {

   13)   0.058 us   |               tcp_set_skb_tso_segs();

   13)   0.451 us   |              }

   13)               |              tcp_transmit_skb() {

   13)               |                skb_clone() {

   13)   0.098 us   |                 __copy_skb_header();

   13)   0.609 us   |                }

   13)               |                tcp_established_options() {

   13)               |                  tcp_v4_md5_lookup() {

   13)   0.039 us   |                   tcp_v4_md5_do_lookup();

   13)   0.439 us   |                  }

   13)   0.812 us   |                }

   13)   0.043 us   |                skb_push();

   13)   0.168 us   |                __tcp_select_window();

   13)   0.186 us   |               tcp_options_write();

   13)               |                tcp_v4_send_check() {

   13)   0.207 us   |                 __tcp_v4_send_check();

   13)   0.773 us   |                }

   13)               |                ip_queue_xmit() {

   13)               |                  __sk_dst_check() {

   13)               |                    ipv4_dst_check() {

   13)   0.079 us   |                     ipv4_validate_peer();

   13)   0.461 us   |                    }

   13)   0.883 us   |                  }

   13)               |                  skb_dst_set_noref() {

   13)   0.035 us   |                   debug_lockdep_rcu_enabled();

   13)   0.462 us   |                  }

   13)   0.049 us   |                  skb_push();

   13)               |                  ip_local_out() {

   13)               |                    __ip_local_out() {

   13)   0.072 us   |                     debug_lockdep_rcu_enabled();

   13)   0.543us    |                    }

   13)   0.036 us   |                   debug_lockdep_rcu_enabled();

   13)               |                    ip_output() {

   13)   0.042 us   |                     debug_lockdep_rcu_enabled();

   13)               |                      ip_finish_output() {

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.060 us   |                       ipv4_mtu();

   13)   0.044 us   |                       debug_lockdep_rcu_enabled();

   13)   0.045 us   |                       skb_push();

   13)               |                        dev_queue_xmit() {

   13)   0.035 us   |                         local_bh_disable();

   13)               |                          dev_hard_start_xmit(){

   13)               |                           netif_skb_features() {

   13)   0.038 us   |                             harmonize_features();

   13)   0.706 us   |                            }

   13)               |                            loopback_xmit() {

   13)   0.065 us   |                             sock_wfree();

   13)   0.103 us   |                             eth_type_trans();

   13)               |                              netif_rx() {

   13)               |                               ktime_get_real() {

   13)   0.054 us   |                                 getnstimeofday();

   13)   0.419 us   |                                }

   13)   0.079 us   |                               get_rps_cpu();

   13)               |                               enqueue_to_backlog() {

   13)   0.385 us   |                                 _raw_spin_lock();

   13)   0.148 us   |                                  _raw_spin_unlock();

   13)   1.251 us   |                                }

   13)   3.116 us   |                              }

   13)   4.452 us   |                            }

   13)   6.336 us   |                          }

   13)               |                          local_bh_enable() {

   13)               |                            do_softirq() {

   13)               |                              __do_softirq() {

   13)   0.040 us   |                                msecs_to_jiffies();

   13)               |                                net_rx_action(){

   13)               |                                 process_backlog() {

   13)   0.288 us   |                                   _raw_spin_lock();

   13)   0.119 us   |                                   _raw_spin_unlock();

   13)               |                                   __netif_receive_skb() {

   13)               |                                      ip_rcv(){

   13)               |                                        ip_rcv_finish(){

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)               |                                         ip_local_deliver() {

   13)               |                                           ip_local_deliver_finish() {

   13)   0.081 us   |                                             raw_local_deliver();

   13)               |                                             tcp_v4_rcv() {

   13)   0.037 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.385 us   |                                                __inet_lookup_established();

   13)   0.219 us   |                                               sk_filter();

   13)   0.425 us   |                                               _raw_spin_lock_nested();

   13)               |                                                tcp_v4_do_rcv(){

   13)   0.052 us   |                                                 tcp_v4_md5_do_lookup();

   13)   0.048 us   |                                                 tcp_parse_md5sig_option();

   13)               |                                                 tcp_rcv_established() {

   13)   0.046 us   |                                                   tcp_parse_aligned_timestamp();

   13)   0.049 us   |                                                    get_seconds();

   13)   0.075 us   |                                                   tcp_rcv_rtt_update();

   13)   0.459 us   |                                                   tcp_event_data_recv();

   13)               |                                                    __tcp_ack_snd_check(){

   13)               |                                                     tcp_send_delayed_ack() {

   13)               |                                                       sk_reset_timer() {

   13)               |                                                         mod_timer() {

   13)               |                                                           lock_timer_base.clone.22() {

   13)   0.275 us   |                                                              _raw_spin_lock_irqsave();

   13)   0.628 us   |                                                           }

   13)   0.060 us   |                                                           internal_add_timer();

   13)   0.143 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   1.944 us   |                                                         }

   13)   2.392 us   |                                                       }

   13)   3.356 us   |                                                     }

   13)   3.914 us   |                                                   }

   13)               |                                                   sock_def_readable() {

   13)               |                                                     __wake_up_sync_key() {

   13)   0.242 us   |                                                       _raw_spin_lock_irqsave();

   13)               |                                                        __wake_up_common(){

   13)               |                                                         ep_poll_callback() {

   13)   0.281 us   |                                                           _raw_spin_lock_irqsave();

   13)               |                                                           __wake_up_locked() {

   13)               |                                                             __wake_up_common() {

   13)               |                                                                default_wake_function(){

   13)               |                                                                 try_to_wake_up() {

   13)   0.305 us   |                                                                    _raw_spin_lock_irqsave();

   13)   0.093 us   |                                                                   task_waking_fair();

   13)               |                                                                   select_task_rq_fair() {

   13)   0.086 us   |                                                                     source_load();

   13)   0.050 us   |                                                                     target_load();

   13)   0.069 us   |                                                                     cpu_avg_load_per_task();

   13)   0.070 us   |                                                                     idle_cpu();

   13)   2.109 us   |                                                                   }

   13)               |                                                                   native_smp_send_reschedule() {

   13)               |                                                                     physflat_send_IPI_mask() {

   13)   0.178 us   |                                                                       default_send_IPI_mask_sequence_phys();

   13)   0.558 us   |                                                                     }

   13)   0.966 us   |                                                                   }

   13)   0.245 us   |                                                                   ttwu_stat();

   13)   0.140 us   |                                                                   _raw_spin_unlock_irqrestore();

   13)   6.722 us   |                                                                 }

   13)   7.211 us   |                                                               }

   13)   7.815 us   |                                                              }

   13)   8.181 us   |                                                           }

   13)   0.151 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   9.762 us   |                                                         }

   13) + 10.294us   |                                                       }

   13)   0.153 us   |                                                       _raw_spin_unlock_irqrestore();

   13) + 11.824us   |                                                     }

   13) + 12.431us   |                                                   }

   13) + 20.877us   |                                                 }

   13) + 22.064us   |                                               }

   13)   0.123 us   |                                               _raw_spin_unlock();

   13) + 25.964us   |                                              }

   13) + 26.968us   |                                            }

   13) + 27.379us   |                                          }

   13) + 29.442us   |                                        }

   13) + 29.913us   |                                      }

   13) + 30.701us   |                                    }

   13) + 32.114us   |                                  }

   13)   0.048 us   |                                 net_rps_action_and_irq_enable.clone.45();

   13) + 33.281us   |                                }

   13)   0.036 us   |                               rcu_bh_qs();

   13)   0.040 us   |                               __local_bh_enable();

   13) + 35.347us   |                              }

   13) + 35.732us   |                            }

   13) + 36.233us   |                         }

   13) + 44.380us   |                        }

   13) + 47.394us   |                      }

   13) + 48.493us   |                    }

   13) + 50.743us   |                  }

   13) + 54.129us   |                }

   13) + 60.178us   |              }

   13)               |              tcp_event_new_data_sent() {

   13)               |                sk_reset_timer() {

   13)               |                  mod_timer() {

   13)               |                    lock_timer_base.clone.22(){

   13)   0.213 us   |                     _raw_spin_lock_irqsave();

   13)   0.607 us   |                    }

   13)   0.057 us   |                   internal_add_timer();

   13)   0.138 us   |                    _raw_spin_unlock_irqrestore();

   13)   1.998 us   |                  }

   13)   2.408 us   |                }

   13)   2.884 us   |              }

   13) + 65.221us   |            }

   13) + 65.802us   |          }

   13)               |          release_sock() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.037 us   |              local_bh_disable();

   13)   0.643 us   |            }

   13)               |            _raw_spin_unlock_bh() {

   13)   0.049 us   |              local_bh_enable_ip();

   13)   0.561 us   |            }

   13)   2.114 us   |          }

   13) + 79.729us   |        }

   13) + 80.595us   |      }

   13) + 81.506us   |   }

   13) + 82.396us   | }


信号中断 与 慢系统调用

1. 术语 1.1. 慢系统调用(Slow system call) 该术语适用于那些可能永远阻塞的系统调用。永远阻塞的系统调用是指调用永远无法返回,多数网络支持函数都属于这一类。如:若没有客户连接到...
  • benkaoya
  • benkaoya
  • 2013年12月11日 14:12
  • 8014

从 system_call走进linux系统调用

慕课18原创作品转载请注明出处 + 《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000 、课程实验:添加系统调用到...
  • CLP_CSDNID
  • CLP_CSDNID
  • 2016年03月27日 18:44
  • 1901

NSIS使用System::Call调用自定义dll中的导出函数

看NSIS的说明中关于System的部分 D.3 Calling an external DLL using the System.dll plug-in Some install...
  • patdz
  • patdz
  • 2013年01月24日 19:46
  • 7344

LINUX System Call Quick Reference

  • 2009年06月08日 21:44
  • 55KB
  • 下载

OS report, 英文配图, linux编译内核添加system call

  • 2014年10月25日 19:09
  • 3.27MB
  • 下载

Avaya™ Call Management System &amp;#40;CMS&amp;#41; Database Items and Calculations

  • 2012年11月21日 07:15
  • 2.87MB
  • 下载

How to Write a Killer Software Testing QA Resume That Will Turn Into an Interview Call

Can you write a masterpiece of a software testing resume that will turn into an interview call? If n...
  • ChinaHuanyang
  • ChinaHuanyang
  • 2012年08月16日 16:30
  • 1177

Call for Papers IEEE/ACM ASONAM (Social Network Analysis and Mining )

IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM) 2014 In...
  • u010600052
  • u010600052
  • 2014年04月26日 14:40
  • 845

遇到的问题-----mongodb.MongoException$Network: can't call something

java操作mongoDB的开发,实时插入mongoDB。 一开始是正常的,运行一段时间后开始报错。 长时间运行测试遇到问题,报错信息如下: at com.mongodb.DBTCP...
  • q383965374
  • q383965374
  • 2013年09月05日 11:53
  • 8090

怎样写一个简单的操作系统?(原文标题:How to write a simple operating system) 分类: 翻译 2011-01-26 01:10 3175人阅读 评论(3) 收藏

怎样写一个简单的操作系统?(原文标题:How to write a simple operating system) 分类: 翻译2011-01-26 01:10 3175人阅读 评论(3) 收...
  • HuaiZuoWoZuo
  • HuaiZuoWoZuo
  • 2013年11月19日 21:30
  • 2083
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Network write system call latency
举报原因:
原因补充:

(最多只允许输入30个字)