这个是在近日工作中涉及到libcurl的使用时,得出的一些思考。
关于curl_easy_perform的官方文档,见这里
主要背景是这样的:在写完了一个需要用libcurl提出HTTP请求的功能,自测一下假设发生某些异常,会对系统带来什么影响。结果测到以下条件的时候:
- 第1次调用curl_easy_perform的参数中,CURLOPT_URL是一个无效域名(注意是域名、也就是网址,不是某个IP地址)
- 在第1个curl_easy_perform发出去后的20秒内,尝试发送第2个curl_easy_perform请求(这个请求是在另外一个线程处理的)
然后意想不到的情况发生了,通过日志跟踪,这次运行结果是:第1个curl在20秒后返回了CURLE_COULDNT_RESOLVE_HOST(6),表示host没有找到,在这个结果返回之后,第2个curl才开始执行。也就是说,后一个curl_easy_perform在前一个发出后、直到第1个curl返回CURLE_COULDNT_RESOLVE_HOST的期间,进入了不必要的等待状态。
文档告诉我们:curl_easy_perform是一个同步返回执行结果的接口,直到执行成功或者失败之前会一直阻塞。后面还有一句话:
You must never call this function simultaneously from two places using the same easy_handle. Let the function return first before invoking it another time. If you want parallel transfers, you must use several curl easy_handles.
人话:不能在两个地方同时针对同一个easy_handle调用这个函数,要先返回一个结果再去调用另一个。如果你需要并行运行的时候,你需要使用多个curl easy_handle。
原来前面提到的情况就是这次运行理应得到的结果,这个运行结果并不是因为你是在线程池里放入N个调用了curl_easy_perform接口的函数就能实现异步http请求的。如果你调用的是一个全局的easy_handle,并且别人也有可能使用这个easy_handle的时候,就会发生后来者被无故阻塞的现象。目前猜测是curl_easy_perform的实现里是加了独占锁的,而且在解析DNS之前就加了,但是我没有证据。
curl_easy_perform在easy.h里的声明是:
CURL_EXTERN CURLcode curl_easy_perform(CURL *curl);
其中CURL的定义,似乎跟编译器有关:
#if defined(BUILDING_LIBCURL) || defined(CURL_STRICTER)
typedef struct Curl_easy CURL;
typedef struct Curl_share CURLSH;
#else
typedef void CURL;
typedef void CURLSH;
#endif
Curl_easy结构体的成员有:
struct Curl_easy {
/* First a simple identifier to easier detect if a user mix up this easy
handle with a multi handle. Set this to CURLEASY_MAGIC_NUMBER */
unsigned int magic;
/* first, two fields for the linked list of these */
struct Curl_easy *next;
struct Curl_easy *prev;
struct connectdata *conn;
struct Curl_llist_element connect_queue;
struct Curl_llist_element conn_queue; /* list per connectdata */
CURLMstate mstate; /* the handle's state */
CURLcode result; /* previous result */
struct Curl_message msg; /* A single posted message. */
/* Array with the plain socket numbers this handle takes care of, in no
particular order. Note that all sockets are added to the sockhash, where
the state etc are also kept. This array is mostly used to detect when a
socket is to be removed from the hash. See singlesocket(). */
curl_socket_t sockets[MAX_SOCKSPEREASYHANDLE];
unsigned char actions[MAX_SOCKSPEREASYHANDLE]; /* action for each socket in
sockets[] */
int numsocks;
struct Names dns;
struct Curl_multi *multi; /* if non-NULL, points to the multi handle
struct to which this "belongs" when used by
the multi interface */
struct Curl_multi *multi_easy; /* if non-NULL, points to the multi handle
struct to which this "belongs" when used
by the easy interface */
struct Curl_share *share; /* Share, handles global variable mutexing */
#ifdef USE_LIBPSL
struct PslCache *psl; /* The associated PSL cache. */
#endif
struct SingleRequest req; /* Request-specific data */
struct UserDefined set; /* values set by the libcurl user */
struct CookieInfo *cookies; /* the cookies, read from files and servers.
NOTE that the 'cookie' field in the
UserDefined struct defines if the "engine"
is to be used or not. */
#ifndef CURL_DISABLE_HSTS
struct hsts *hsts;
#endif
#ifndef CURL_DISABLE_ALTSVC
struct altsvcinfo *asi; /* the alt-svc cache */
#endif
struct Progress progress; /* for all the progress meter data */
struct UrlState state; /* struct for fields used for state info and
other dynamic purposes */
#ifndef CURL_DISABLE_FTP
struct WildcardData wildcard; /* wildcard download state info */
#endif
struct PureInfo info; /* stats, reports and info data */
struct curl_tlssessioninfo tsi; /* Information about the TLS session, only
valid after a client has asked for it */
#if defined(CURL_DOES_CONVERSIONS) && defined(HAVE_ICONV)
iconv_t outbound_cd; /* for translating to the network encoding */
iconv_t inbound_cd; /* for translating from the network encoding */
iconv_t utf8_cd; /* for translating to UTF8 */
#endif /* CURL_DOES_CONVERSIONS && HAVE_ICONV */
#ifdef USE_HYPER
struct hyptransfer hyp;
#endif
};
暂时没有看到哪个是锁,改天再研究
CURLcode是一个枚举值集合,保存了CURL调用的返回值,目前共99个,最常见的CURLE_OK(0)代表正常,其他的都是异常。