Network write system call latency

Oliver Yang / 2015-07-29 


Today, one guy told me he observed the write system calllatency for 80 bytes write is about 2ms on a 10G network. He suspected thatthis indicates some network latency problems.

Is he right?

The answer is no. At least, I am not sure. Depending on yourworkload.

I told him to capture the packets by the tools like tcpdumpto confirm his suspects. For network write system call, 2ms latency might notbe quite bad.

  • Why do I think 2ms write system call latency is not bad?

Because it is quite possible that write system call could bereturned at millisecond level, but packet got sent out at microsecond level.

In Linux network stack, under the write system call context,it could send the packets first, then got interrupted by NIC driver RXinterrupts, and run into packets receive loop. Especially, when write code putthe packets to NIC, it disabled interrupts in that period. That means, afterinterrupt got enabled, it would cause the interrupts got fired immediately ifany IRQs were bound to this CPU.

If the system is under heavy network traffic,the net_rx_action code actually allows polling loop for more than 2jiffies, which is actually 2ms.

  static voidnet_rx_action(struct softirq_action *h)

  {

      structsoftnet_data *sd = this_cpu_ptr(&softnet_data);

      unsigned longtime_limit = jiffies + 2;

      int budget =netdev_budget;

      LIST_HEAD(list);

     LIST_HEAD(repoll);

 

     local_irq_disable();

     list_splice_init(&sd->poll_list, &list);

      local_irq_enable();

 

      for (;;) {

          structnapi_struct *n;

 

          if(list_empty(&list)) {

              if(!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))

                 return;

              break;

          }

 

          n =list_first_entry(&list, struct napi_struct, poll_list);

          budget -=napi_poll(n, &repoll);

 

          /* Ifsoftirq window is exhausted then punt.

           * Allowthis to run for 2 jiffies since which will allow

           * anaverage latency of 1.5/HZ.

           */

          if(unlikely(budget <= 0 ||

                  time_after_eq(jiffies, time_limit))) { // time limitation check, breakif more than 2ms

             sd->time_squeeze++;

              break;

          }

      }

 

     [...........snipped..........]

 

  }

In fact, in latest kernel __do_softirq has thesimilar logic to limit the soft_irq loop by bothMAX_SOFTIRQ_RESTART and 2jiffies time limitations.

Anyway, today's Linux interrupt context, the latency couldbe many milliseconds.

For this reason it is quite possible that the packet is sentby write system call first at us speed. But after that, write system callcontinue to receive the packets, and run into longer loop for packets RXhandling.

Just few times of 2ms latency is not a problem. What if hesaw average write system call latency is 2ms?

It is hard to say. His testing is a system wide testing,which will not only stress network stack, bust also stress file system andstorage IO stack. That means, if write system call got pinned by the interruptsnot related to network workload, he can also see this issue. In this case, itis not network problem.

  • How does the code path look like?

Linux provides the rich set of dynamic tracing tools toallow us to learn kernel code path.

In this case, I used funcgraph,which is a shell scripts based on Linux Ftrace tool.

From the output, we can see in do_sync_write code path,after the packets got sent, it started to receive packet. At the end of rx codepath, it calls into ep_poll_callback to notify the data is ready.

This is a not a very busy system, but the do_sync_write tookmore than 82us to return.

# ./funcgraph -d 1 do_sync_write

  Tracing"do_sync_write" for 1 seconds...

   13)               |  do_sync_write() {

   13)               |    sock_aio_write() {

   13)               |      inet_sendmsg() {

   13)               |        tcp_sendmsg() {

   13)               |          lock_sock_nested() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.071 us   |              local_bh_disable();

   13)   1.097 us   |            }

   13)   0.136 us   |            _raw_spin_unlock();

   13)   0.037 us   |            local_bh_enable();

   13)   2.558 us   |          }

   13)               |          tcp_send_mss() {

   13)               |            tcp_current_mss() {

   13)   0.202 us   |              ipv4_mtu();

   13)               |              tcp_established_options() {

   13)               |                tcp_v4_md5_lookup() {

   13)   0.039 us   |                 tcp_v4_md5_do_lookup();

   13)   0.357 us   |                }

   13)   0.752 us   |              }

   13)   1.962 us   |            }

   13)   2.325 us   |          }

   13)               |          sk_stream_alloc_skb() {

   13)               |            __alloc_skb() {

   13)   0.141 us   |             kmem_cache_alloc_node();

   13)               |              __kmalloc_node_track_caller() {

   13)   0.085 us   |               kmem_find_general_cachep();

   13)   0.148 us   |               kmem_cache_alloc_node_trace();

   13)   0.897 us   |              }

   13)               |              ksize() {

   13)   0.036 us   |                __phys_addr();

   13)   0.583 us   |              }

   13)   3.140 us   |            }

   13)   3.615 us   |          }

   13)               |          __tcp_push_pending_frames() {

   13)               |            tcp_write_xmit() {

   13)               |              tcp_init_tso_segs() {

   13)   0.058 us   |               tcp_set_skb_tso_segs();

   13)   0.451 us   |              }

   13)               |              tcp_transmit_skb() {

   13)               |                skb_clone() {

   13)   0.098 us   |                 __copy_skb_header();

   13)   0.609 us   |                }

   13)               |                tcp_established_options() {

   13)               |                  tcp_v4_md5_lookup() {

   13)   0.039 us   |                   tcp_v4_md5_do_lookup();

   13)   0.439 us   |                  }

   13)   0.812 us   |                }

   13)   0.043 us   |                skb_push();

   13)   0.168 us   |                __tcp_select_window();

   13)   0.186 us   |               tcp_options_write();

   13)               |                tcp_v4_send_check() {

   13)   0.207 us   |                 __tcp_v4_send_check();

   13)   0.773 us   |                }

   13)               |                ip_queue_xmit() {

   13)               |                  __sk_dst_check() {

   13)               |                    ipv4_dst_check() {

   13)   0.079 us   |                     ipv4_validate_peer();

   13)   0.461 us   |                    }

   13)   0.883 us   |                  }

   13)               |                  skb_dst_set_noref() {

   13)   0.035 us   |                   debug_lockdep_rcu_enabled();

   13)   0.462 us   |                  }

   13)   0.049 us   |                  skb_push();

   13)               |                  ip_local_out() {

   13)               |                    __ip_local_out() {

   13)   0.072 us   |                     debug_lockdep_rcu_enabled();

   13)   0.543us    |                    }

   13)   0.036 us   |                   debug_lockdep_rcu_enabled();

   13)               |                    ip_output() {

   13)   0.042 us   |                     debug_lockdep_rcu_enabled();

   13)               |                      ip_finish_output() {

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.060 us   |                       ipv4_mtu();

   13)   0.044 us   |                       debug_lockdep_rcu_enabled();

   13)   0.045 us   |                       skb_push();

   13)               |                        dev_queue_xmit() {

   13)   0.035 us   |                         local_bh_disable();

   13)               |                          dev_hard_start_xmit(){

   13)               |                           netif_skb_features() {

   13)   0.038 us   |                             harmonize_features();

   13)   0.706 us   |                            }

   13)               |                            loopback_xmit() {

   13)   0.065 us   |                             sock_wfree();

   13)   0.103 us   |                             eth_type_trans();

   13)               |                              netif_rx() {

   13)               |                               ktime_get_real() {

   13)   0.054 us   |                                 getnstimeofday();

   13)   0.419 us   |                                }

   13)   0.079 us   |                               get_rps_cpu();

   13)               |                               enqueue_to_backlog() {

   13)   0.385 us   |                                 _raw_spin_lock();

   13)   0.148 us   |                                  _raw_spin_unlock();

   13)   1.251 us   |                                }

   13)   3.116 us   |                              }

   13)   4.452 us   |                            }

   13)   6.336 us   |                          }

   13)               |                          local_bh_enable() {

   13)               |                            do_softirq() {

   13)               |                              __do_softirq() {

   13)   0.040 us   |                                msecs_to_jiffies();

   13)               |                                net_rx_action(){

   13)               |                                 process_backlog() {

   13)   0.288 us   |                                   _raw_spin_lock();

   13)   0.119 us   |                                   _raw_spin_unlock();

   13)               |                                   __netif_receive_skb() {

   13)               |                                      ip_rcv(){

   13)               |                                        ip_rcv_finish(){

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)               |                                         ip_local_deliver() {

   13)               |                                           ip_local_deliver_finish() {

   13)   0.081 us   |                                             raw_local_deliver();

   13)               |                                             tcp_v4_rcv() {

   13)   0.037 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.385 us   |                                                __inet_lookup_established();

   13)   0.219 us   |                                               sk_filter();

   13)   0.425 us   |                                               _raw_spin_lock_nested();

   13)               |                                                tcp_v4_do_rcv(){

   13)   0.052 us   |                                                 tcp_v4_md5_do_lookup();

   13)   0.048 us   |                                                 tcp_parse_md5sig_option();

   13)               |                                                 tcp_rcv_established() {

   13)   0.046 us   |                                                   tcp_parse_aligned_timestamp();

   13)   0.049 us   |                                                    get_seconds();

   13)   0.075 us   |                                                   tcp_rcv_rtt_update();

   13)   0.459 us   |                                                   tcp_event_data_recv();

   13)               |                                                    __tcp_ack_snd_check(){

   13)               |                                                     tcp_send_delayed_ack() {

   13)               |                                                       sk_reset_timer() {

   13)               |                                                         mod_timer() {

   13)               |                                                           lock_timer_base.clone.22() {

   13)   0.275 us   |                                                              _raw_spin_lock_irqsave();

   13)   0.628 us   |                                                           }

   13)   0.060 us   |                                                           internal_add_timer();

   13)   0.143 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   1.944 us   |                                                         }

   13)   2.392 us   |                                                       }

   13)   3.356 us   |                                                     }

   13)   3.914 us   |                                                   }

   13)               |                                                   sock_def_readable() {

   13)               |                                                     __wake_up_sync_key() {

   13)   0.242 us   |                                                       _raw_spin_lock_irqsave();

   13)               |                                                        __wake_up_common(){

   13)               |                                                         ep_poll_callback() {

   13)   0.281 us   |                                                           _raw_spin_lock_irqsave();

   13)               |                                                           __wake_up_locked() {

   13)               |                                                             __wake_up_common() {

   13)               |                                                                default_wake_function(){

   13)               |                                                                 try_to_wake_up() {

   13)   0.305 us   |                                                                    _raw_spin_lock_irqsave();

   13)   0.093 us   |                                                                   task_waking_fair();

   13)               |                                                                   select_task_rq_fair() {

   13)   0.086 us   |                                                                     source_load();

   13)   0.050 us   |                                                                     target_load();

   13)   0.069 us   |                                                                     cpu_avg_load_per_task();

   13)   0.070 us   |                                                                     idle_cpu();

   13)   2.109 us   |                                                                   }

   13)               |                                                                   native_smp_send_reschedule() {

   13)               |                                                                     physflat_send_IPI_mask() {

   13)   0.178 us   |                                                                       default_send_IPI_mask_sequence_phys();

   13)   0.558 us   |                                                                     }

   13)   0.966 us   |                                                                   }

   13)   0.245 us   |                                                                   ttwu_stat();

   13)   0.140 us   |                                                                   _raw_spin_unlock_irqrestore();

   13)   6.722 us   |                                                                 }

   13)   7.211 us   |                                                               }

   13)   7.815 us   |                                                              }

   13)   8.181 us   |                                                           }

   13)   0.151 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   9.762 us   |                                                         }

   13) + 10.294us   |                                                       }

   13)   0.153 us   |                                                       _raw_spin_unlock_irqrestore();

   13) + 11.824us   |                                                     }

   13) + 12.431us   |                                                   }

   13) + 20.877us   |                                                 }

   13) + 22.064us   |                                               }

   13)   0.123 us   |                                               _raw_spin_unlock();

   13) + 25.964us   |                                              }

   13) + 26.968us   |                                            }

   13) + 27.379us   |                                          }

   13) + 29.442us   |                                        }

   13) + 29.913us   |                                      }

   13) + 30.701us   |                                    }

   13) + 32.114us   |                                  }

   13)   0.048 us   |                                 net_rps_action_and_irq_enable.clone.45();

   13) + 33.281us   |                                }

   13)   0.036 us   |                               rcu_bh_qs();

   13)   0.040 us   |                               __local_bh_enable();

   13) + 35.347us   |                              }

   13) + 35.732us   |                            }

   13) + 36.233us   |                         }

   13) + 44.380us   |                        }

   13) + 47.394us   |                      }

   13) + 48.493us   |                    }

   13) + 50.743us   |                  }

   13) + 54.129us   |                }

   13) + 60.178us   |              }

   13)               |              tcp_event_new_data_sent() {

   13)               |                sk_reset_timer() {

   13)               |                  mod_timer() {

   13)               |                    lock_timer_base.clone.22(){

   13)   0.213 us   |                     _raw_spin_lock_irqsave();

   13)   0.607 us   |                    }

   13)   0.057 us   |                   internal_add_timer();

   13)   0.138 us   |                    _raw_spin_unlock_irqrestore();

   13)   1.998 us   |                  }

   13)   2.408 us   |                }

   13)   2.884 us   |              }

   13) + 65.221us   |            }

   13) + 65.802us   |          }

   13)               |          release_sock() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.037 us   |              local_bh_disable();

   13)   0.643 us   |            }

   13)               |            _raw_spin_unlock_bh() {

   13)   0.049 us   |              local_bh_enable_ip();

   13)   0.561 us   |            }

   13)   2.114 us   |          }

   13) + 79.729us   |        }

   13) + 80.595us   |      }

   13) + 81.506us   |   }

   13) + 82.396us   | }


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值