Network write system call latency

转载 2016年01月17日 17:41:19

Oliver Yang / 2015-07-29 


Today, one guy told me he observed the write system calllatency for 80 bytes write is about 2ms on a 10G network. He suspected thatthis indicates some network latency problems.

Is he right?

The answer is no. At least, I am not sure. Depending on yourworkload.

I told him to capture the packets by the tools like tcpdumpto confirm his suspects. For network write system call, 2ms latency might notbe quite bad.

  • Why do I think 2ms write system call latency is not bad?

Because it is quite possible that write system call could bereturned at millisecond level, but packet got sent out at microsecond level.

In Linux network stack, under the write system call context,it could send the packets first, then got interrupted by NIC driver RXinterrupts, and run into packets receive loop. Especially, when write code putthe packets to NIC, it disabled interrupts in that period. That means, afterinterrupt got enabled, it would cause the interrupts got fired immediately ifany IRQs were bound to this CPU.

If the system is under heavy network traffic,the net_rx_action code actually allows polling loop for more than 2jiffies, which is actually 2ms.

  static voidnet_rx_action(struct softirq_action *h)

  {

      structsoftnet_data *sd = this_cpu_ptr(&softnet_data);

      unsigned longtime_limit = jiffies + 2;

      int budget =netdev_budget;

      LIST_HEAD(list);

     LIST_HEAD(repoll);

 

     local_irq_disable();

     list_splice_init(&sd->poll_list, &list);

      local_irq_enable();

 

      for (;;) {

          structnapi_struct *n;

 

          if(list_empty(&list)) {

              if(!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))

                 return;

              break;

          }

 

          n =list_first_entry(&list, struct napi_struct, poll_list);

          budget -=napi_poll(n, &repoll);

 

          /* Ifsoftirq window is exhausted then punt.

           * Allowthis to run for 2 jiffies since which will allow

           * anaverage latency of 1.5/HZ.

           */

          if(unlikely(budget <= 0 ||

                  time_after_eq(jiffies, time_limit))) { // time limitation check, breakif more than 2ms

             sd->time_squeeze++;

              break;

          }

      }

 

     [...........snipped..........]

 

  }

In fact, in latest kernel __do_softirq has thesimilar logic to limit the soft_irq loop by bothMAX_SOFTIRQ_RESTART and 2jiffies time limitations.

Anyway, today's Linux interrupt context, the latency couldbe many milliseconds.

For this reason it is quite possible that the packet is sentby write system call first at us speed. But after that, write system callcontinue to receive the packets, and run into longer loop for packets RXhandling.

Just few times of 2ms latency is not a problem. What if hesaw average write system call latency is 2ms?

It is hard to say. His testing is a system wide testing,which will not only stress network stack, bust also stress file system andstorage IO stack. That means, if write system call got pinned by the interruptsnot related to network workload, he can also see this issue. In this case, itis not network problem.

  • How does the code path look like?

Linux provides the rich set of dynamic tracing tools toallow us to learn kernel code path.

In this case, I used funcgraph,which is a shell scripts based on Linux Ftrace tool.

From the output, we can see in do_sync_write code path,after the packets got sent, it started to receive packet. At the end of rx codepath, it calls into ep_poll_callback to notify the data is ready.

This is a not a very busy system, but the do_sync_write tookmore than 82us to return.

# ./funcgraph -d 1 do_sync_write

  Tracing"do_sync_write" for 1 seconds...

   13)               |  do_sync_write() {

   13)               |    sock_aio_write() {

   13)               |      inet_sendmsg() {

   13)               |        tcp_sendmsg() {

   13)               |          lock_sock_nested() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.071 us   |              local_bh_disable();

   13)   1.097 us   |            }

   13)   0.136 us   |            _raw_spin_unlock();

   13)   0.037 us   |            local_bh_enable();

   13)   2.558 us   |          }

   13)               |          tcp_send_mss() {

   13)               |            tcp_current_mss() {

   13)   0.202 us   |              ipv4_mtu();

   13)               |              tcp_established_options() {

   13)               |                tcp_v4_md5_lookup() {

   13)   0.039 us   |                 tcp_v4_md5_do_lookup();

   13)   0.357 us   |                }

   13)   0.752 us   |              }

   13)   1.962 us   |            }

   13)   2.325 us   |          }

   13)               |          sk_stream_alloc_skb() {

   13)               |            __alloc_skb() {

   13)   0.141 us   |             kmem_cache_alloc_node();

   13)               |              __kmalloc_node_track_caller() {

   13)   0.085 us   |               kmem_find_general_cachep();

   13)   0.148 us   |               kmem_cache_alloc_node_trace();

   13)   0.897 us   |              }

   13)               |              ksize() {

   13)   0.036 us   |                __phys_addr();

   13)   0.583 us   |              }

   13)   3.140 us   |            }

   13)   3.615 us   |          }

   13)               |          __tcp_push_pending_frames() {

   13)               |            tcp_write_xmit() {

   13)               |              tcp_init_tso_segs() {

   13)   0.058 us   |               tcp_set_skb_tso_segs();

   13)   0.451 us   |              }

   13)               |              tcp_transmit_skb() {

   13)               |                skb_clone() {

   13)   0.098 us   |                 __copy_skb_header();

   13)   0.609 us   |                }

   13)               |                tcp_established_options() {

   13)               |                  tcp_v4_md5_lookup() {

   13)   0.039 us   |                   tcp_v4_md5_do_lookup();

   13)   0.439 us   |                  }

   13)   0.812 us   |                }

   13)   0.043 us   |                skb_push();

   13)   0.168 us   |                __tcp_select_window();

   13)   0.186 us   |               tcp_options_write();

   13)               |                tcp_v4_send_check() {

   13)   0.207 us   |                 __tcp_v4_send_check();

   13)   0.773 us   |                }

   13)               |                ip_queue_xmit() {

   13)               |                  __sk_dst_check() {

   13)               |                    ipv4_dst_check() {

   13)   0.079 us   |                     ipv4_validate_peer();

   13)   0.461 us   |                    }

   13)   0.883 us   |                  }

   13)               |                  skb_dst_set_noref() {

   13)   0.035 us   |                   debug_lockdep_rcu_enabled();

   13)   0.462 us   |                  }

   13)   0.049 us   |                  skb_push();

   13)               |                  ip_local_out() {

   13)               |                    __ip_local_out() {

   13)   0.072 us   |                     debug_lockdep_rcu_enabled();

   13)   0.543us    |                    }

   13)   0.036 us   |                   debug_lockdep_rcu_enabled();

   13)               |                    ip_output() {

   13)   0.042 us   |                     debug_lockdep_rcu_enabled();

   13)               |                      ip_finish_output() {

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.049 us   |                       debug_lockdep_rcu_enabled();

   13)   0.060 us   |                       ipv4_mtu();

   13)   0.044 us   |                       debug_lockdep_rcu_enabled();

   13)   0.045 us   |                       skb_push();

   13)               |                        dev_queue_xmit() {

   13)   0.035 us   |                         local_bh_disable();

   13)               |                          dev_hard_start_xmit(){

   13)               |                           netif_skb_features() {

   13)   0.038 us   |                             harmonize_features();

   13)   0.706 us   |                            }

   13)               |                            loopback_xmit() {

   13)   0.065 us   |                             sock_wfree();

   13)   0.103 us   |                             eth_type_trans();

   13)               |                              netif_rx() {

   13)               |                               ktime_get_real() {

   13)   0.054 us   |                                 getnstimeofday();

   13)   0.419 us   |                                }

   13)   0.079 us   |                               get_rps_cpu();

   13)               |                               enqueue_to_backlog() {

   13)   0.385 us   |                                 _raw_spin_lock();

   13)   0.148 us   |                                  _raw_spin_unlock();

   13)   1.251 us   |                                }

   13)   3.116 us   |                              }

   13)   4.452 us   |                            }

   13)   6.336 us   |                          }

   13)               |                          local_bh_enable() {

   13)               |                            do_softirq() {

   13)               |                              __do_softirq() {

   13)   0.040 us   |                                msecs_to_jiffies();

   13)               |                                net_rx_action(){

   13)               |                                 process_backlog() {

   13)   0.288 us   |                                   _raw_spin_lock();

   13)   0.119 us   |                                   _raw_spin_unlock();

   13)               |                                   __netif_receive_skb() {

   13)               |                                      ip_rcv(){

   13)               |                                        ip_rcv_finish(){

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)   0.035 us   |                                         debug_lockdep_rcu_enabled();

   13)               |                                         ip_local_deliver() {

   13)               |                                           ip_local_deliver_finish() {

   13)   0.081 us   |                                             raw_local_deliver();

   13)               |                                             tcp_v4_rcv() {

   13)   0.037 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.036 us   |                                               debug_lockdep_rcu_enabled();

   13)   0.385 us   |                                                __inet_lookup_established();

   13)   0.219 us   |                                               sk_filter();

   13)   0.425 us   |                                               _raw_spin_lock_nested();

   13)               |                                                tcp_v4_do_rcv(){

   13)   0.052 us   |                                                 tcp_v4_md5_do_lookup();

   13)   0.048 us   |                                                 tcp_parse_md5sig_option();

   13)               |                                                 tcp_rcv_established() {

   13)   0.046 us   |                                                   tcp_parse_aligned_timestamp();

   13)   0.049 us   |                                                    get_seconds();

   13)   0.075 us   |                                                   tcp_rcv_rtt_update();

   13)   0.459 us   |                                                   tcp_event_data_recv();

   13)               |                                                    __tcp_ack_snd_check(){

   13)               |                                                     tcp_send_delayed_ack() {

   13)               |                                                       sk_reset_timer() {

   13)               |                                                         mod_timer() {

   13)               |                                                           lock_timer_base.clone.22() {

   13)   0.275 us   |                                                              _raw_spin_lock_irqsave();

   13)   0.628 us   |                                                           }

   13)   0.060 us   |                                                           internal_add_timer();

   13)   0.143 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   1.944 us   |                                                         }

   13)   2.392 us   |                                                       }

   13)   3.356 us   |                                                     }

   13)   3.914 us   |                                                   }

   13)               |                                                   sock_def_readable() {

   13)               |                                                     __wake_up_sync_key() {

   13)   0.242 us   |                                                       _raw_spin_lock_irqsave();

   13)               |                                                        __wake_up_common(){

   13)               |                                                         ep_poll_callback() {

   13)   0.281 us   |                                                           _raw_spin_lock_irqsave();

   13)               |                                                           __wake_up_locked() {

   13)               |                                                             __wake_up_common() {

   13)               |                                                                default_wake_function(){

   13)               |                                                                 try_to_wake_up() {

   13)   0.305 us   |                                                                    _raw_spin_lock_irqsave();

   13)   0.093 us   |                                                                   task_waking_fair();

   13)               |                                                                   select_task_rq_fair() {

   13)   0.086 us   |                                                                     source_load();

   13)   0.050 us   |                                                                     target_load();

   13)   0.069 us   |                                                                     cpu_avg_load_per_task();

   13)   0.070 us   |                                                                     idle_cpu();

   13)   2.109 us   |                                                                   }

   13)               |                                                                   native_smp_send_reschedule() {

   13)               |                                                                     physflat_send_IPI_mask() {

   13)   0.178 us   |                                                                       default_send_IPI_mask_sequence_phys();

   13)   0.558 us   |                                                                     }

   13)   0.966 us   |                                                                   }

   13)   0.245 us   |                                                                   ttwu_stat();

   13)   0.140 us   |                                                                   _raw_spin_unlock_irqrestore();

   13)   6.722 us   |                                                                 }

   13)   7.211 us   |                                                               }

   13)   7.815 us   |                                                              }

   13)   8.181 us   |                                                           }

   13)   0.151 us   |                                                           _raw_spin_unlock_irqrestore();

   13)   9.762 us   |                                                         }

   13) + 10.294us   |                                                       }

   13)   0.153 us   |                                                       _raw_spin_unlock_irqrestore();

   13) + 11.824us   |                                                     }

   13) + 12.431us   |                                                   }

   13) + 20.877us   |                                                 }

   13) + 22.064us   |                                               }

   13)   0.123 us   |                                               _raw_spin_unlock();

   13) + 25.964us   |                                              }

   13) + 26.968us   |                                            }

   13) + 27.379us   |                                          }

   13) + 29.442us   |                                        }

   13) + 29.913us   |                                      }

   13) + 30.701us   |                                    }

   13) + 32.114us   |                                  }

   13)   0.048 us   |                                 net_rps_action_and_irq_enable.clone.45();

   13) + 33.281us   |                                }

   13)   0.036 us   |                               rcu_bh_qs();

   13)   0.040 us   |                               __local_bh_enable();

   13) + 35.347us   |                              }

   13) + 35.732us   |                            }

   13) + 36.233us   |                         }

   13) + 44.380us   |                        }

   13) + 47.394us   |                      }

   13) + 48.493us   |                    }

   13) + 50.743us   |                  }

   13) + 54.129us   |                }

   13) + 60.178us   |              }

   13)               |              tcp_event_new_data_sent() {

   13)               |                sk_reset_timer() {

   13)               |                  mod_timer() {

   13)               |                    lock_timer_base.clone.22(){

   13)   0.213 us   |                     _raw_spin_lock_irqsave();

   13)   0.607 us   |                    }

   13)   0.057 us   |                   internal_add_timer();

   13)   0.138 us   |                    _raw_spin_unlock_irqrestore();

   13)   1.998 us   |                  }

   13)   2.408 us   |                }

   13)   2.884 us   |              }

   13) + 65.221us   |            }

   13) + 65.802us   |          }

   13)               |          release_sock() {

   13)               |            _raw_spin_lock_bh() {

   13)   0.037 us   |              local_bh_disable();

   13)   0.643 us   |            }

   13)               |            _raw_spin_unlock_bh() {

   13)   0.049 us   |              local_bh_enable_ip();

   13)   0.561 us   |            }

   13)   2.114 us   |          }

   13) + 79.729us   |        }

   13) + 80.595us   |      }

   13) + 81.506us   |   }

   13) + 82.396us   | }


相关文章推荐

How to Write a Killer Software Testing QA Resume That Will Turn Into an Interview Call

Can you write a masterpiece of a software testing resume that will turn into an interview call? If n...

LINUX System Call Quick Reference

  • 2009年06月08日 21:44
  • 55KB
  • 下载

Call for Papers IEEE/ACM ASONAM (Social Network Analysis and Mining )

IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM) 2014 In...

遇到的问题-----mongodb.MongoException$Network: can't call something

java操作mongoDB的开发,实时插入mongoDB。 一开始是正常的,运行一段时间后开始报错。 长时间运行测试遇到问题,报错信息如下: at com.mongodb.DBTCP...

对 SYSTEM_ALERT_WINDOW 和 WRITE_SETTINGS 的处理

最近开始开始研究 Android 6.0 的适配,关于两个特殊权限,查阅了一下资料,对于处理方法,在这里做个记录。SYSTEM_ALERT_WINDOW@Override protected ...
  • Hultron
  • Hultron
  • 2017年03月28日 17:27
  • 293

怎样写一个简单的操作系统?(原文标题:How to write a simple operating system)

本文主要介绍怎样编写和构建您的第一个,基于x86汇编语言的操作系统。它解释了计算机开机的基本过程,一些基本的汇编语言,以及怎样进一步提升自己这方面的技术。最终编写的操作系统将是非常小的一个程序(仅仅是...

vi修改编辑文件提示Write Error!(File System full?)错误的研究

vi修改编辑文件提示Write Error!(File System full?)错误的研究 作者:罪惡 发布于:2011-1-7 17:55 Friday 疑难问题分析 今天群里有朋友遇...

How to mount partition with ntfs file system and read write access

原文地址:http://linuxconfig.org/How_to_mount_partition_with_ntfs_file_system_and_read_write_access 1. I...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Network write system call latency
举报原因:
原因补充:

(最多只允许输入30个字)