关于gethostbyname在多线程环境下的阻塞问题

  Unix/Linux下的gethostbyname函数常用来向DNS查询一个域名的IP地址。 由于DNS的递归查询,常常会发生gethostbyname函数在查询一个域名时严重超时。而该函数又不能像connect和read等函数那样通过setsockopt或者select函数那样设置超时时间,因此常常成为程序的瓶颈。有人提出一种解决办法是用alarm设置定时信号,如果超时就用setjmp和longjmp跳过gethostbyname函数(这种方式我没有试过,不知道具体效果如何)。
    在多线程下面,gethostbyname会一个更严重的问题,就是如果有一个线程的gethostbyname发生阻塞,其它线程都会在gethostbyname处发生阻塞。我在编写爬虫时也遇到了这个让我疑惑很久的问题,所有的爬虫线程都阻塞在gethostbyname处,导致爬虫速度非常慢。在网上google了很长时间这个问题,也没有找到解答。今天凑巧在实验室的googlegroup里面发现了一本电子书"Mining the Web - Discovering Knowledge from Hypertext Data",其中在讲解爬虫时有下面几段文字:

    Many clients for DNS resolution are coded poorly.Most UNIX systems provide an implementation of gethostbyname (the DNS client API—application program interface), which cannot concurrently handle multiple outstanding requests. Therefore, the crawler cannot issue many resolution requests together and poll at a later time for completion of individual requests, which is critical for acceptable performance. Furthermore, if the system-provided client is used, there is no way to distribute load among a number of DNS servers. For all these reasons, many crawlers choose to include their own custom client for DNS name resolution. The Mercator crawler from Compaq System Research Center reduced the time spent in DNS from as high as 87% to a modest 25% by implementing a custom client. The ADNS asynchronous DNS client library is ideal for use in crawlers.
    In spite of these optimizations, a large-scale crawler will spend a substantial fraction of its network time not waiting for Http data transfer, but for address resolution. For every hostname that has not been resolved before (which happens frequently with crawlers), the local DNS may have to go across many network hops to fill its cache for the first time. To overlap this unavoidable delay with useful work, prefetching can be used. When a page that has just been fetched is parsed, a stream of HREFs is extracted. Right at this time, that is, even before any of the corresponding URLs are fetched, hostnames are extracted from the HREF targets, and DNS resolution requests are made to the caching server. The prefetching client is usually implemented using UDP  instead of TCP, and it does not wait for resolution to be completed. The request serves only to fill the DNS cache so that resolution will be fast when the page is actually needed later on.

    大意是说unix的gethostbyname无法处理在并发程序下使用,这是先天的缺陷是无法改变的。大型爬虫往往不会使用gethostbyname,而是实现自己独立定制的DNS客户端。这样可以实现DNS的负载平衡,而且通过异步解析能够大大提高DNS解析速度。DNS客户端往往用UDP实现,可以在爬虫爬取网页前提前解析URL的IP。文章中还提到了一个开源的异步DNS库adns,主页是http://www.chiark.greenend.org.uk/~ian/adns/
    从以上可看出,gethostbyname并不适用于多线程环境以及其它对DNS解析速度要求较高的程序。
  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
好的,我可以为您提供一些关于多线程UDP聊天器的编写思路和代码实现。 UDP协议是一种无连接的通信协议,因此,当我们使用UDP实现聊天器时,需要使用多线程实现同时收发消息。以下是一个基于Python的多线程UDP聊天器的简单实现: ```python import socket import threading # 定义一个全局变量,用于存储接收到的消息 received_messages = [] # 定义一个函数,用于接收消息 def receive_messages(sock): while True: data, addr = sock.recvfrom(1024) received_messages.append(data.decode()) # 定义一个函数,用于发送消息 def send_messages(sock, addr): while True: message = input() sock.sendto(message.encode(), addr) # 创建UDP套接字 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # 获取本地IP地址 my_addr = socket.gethostbyname(socket.gethostname()) # 获取目标IP地址 target_addr = input("Please enter the IP address you want to chat with: ") # 创建接收消息的线程 receive_thread = threading.Thread(target=receive_messages, args=(sock,)) receive_thread.start() # 创建发送消息的线程 send_thread = threading.Thread(target=send_messages, args=(sock, (target_addr, 8080))) send_thread.start() ``` 在上面的代码中,我们首先创建了一个UDP套接字,然后获取了本地IP地址和目标IP地址。接着,我们创建了两个线程,一个用于接收消息,一个用于发送消息。在接收消息的线程中,我们使用一个全局变量来存储接收到的消息。在发送消息的线程中,我们使用`input()`函数获取用户输入的消息,并使用`sock.sendto()`函数将消息发送给目标地址。 需要注意的是,在代码中,我们使用了`recvfrom()`函数和`sendto()`函数来接收和发送UDP数据报。这两个函数都是阻塞式的,因此需要放在一个无限循环中,以便不断地接收和发送消息。 另外,需要注意的是,由于UDP是无连接的协议,因此在发送消息时需要指定目标地址和端口号。在本例中,我们使用了一个元组`(target_addr, 8080)`来指定目标地址和端口号。 希望这个简单的示例代码能够帮助您完成多线程UDP聊天器的编写。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值