连接:http://www.cnblogs.com/wbinblog/archive/2012/03/13/2392710.html
官方网站:http://www.w3.org/Library/
更多信息:http://www.w3.org/Library/User/
运行平台:Unix/Linux,Windows
简介:
Libwww 是一个高度模组化用户端的网页存取API ,用C语言写成,可在 Unix 和 Windows 上运行。 It can be used for both large and small applications including: browsers/editors, robots and batch tools. There are pluggable modules provided with Libwww which include complete HTTP/1.1 with caching, pipelining, POST, Digest Authentication, deflate, etc. The purpose of libwww is to serve as a testbed for protocol experiments. 蒂姆·伯纳斯-李 在 1992 年十一月创造出了 Libwww,用於展示网际网路的潜能。使用 Libwww 的应用程式,如被广泛使用的命令列文字浏览器 Lynx 及 Mosaic web browser 即是用 Libwww 所写成的。 Libwww 目前为一开放原始码程式,并於日前移至 W3C 管理。基於其为开放原始码的特性,任何人都能为 Libwww 付出一点心力,这也确保了 Libwww 能一直进步,成为更有用的软体。
最近我需要写点页面分析的东西,这些东西某些程度上类似搜索引擎的“爬虫->parser->存储”的过程。
更多特点:http://curl.haxx.se/docs/features.html
运行平台:Unix/Linux,Windows(Windows上貌似也有实现)
#include "../curl-7.14.0/include/curl/curl.h"
#pragma comment(lib, "../curl-7.14.0/lib/libcurl_imp.lib")
int main(void)
{
curl = curl_easy_init();
if(curl) {
CURLcode res;
res = curl_easy_setopt(curl, CURLOPT_PROXY, "Test-pxy08:8080");
res = curl_easy_setopt(curl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
res = curl_easy_setopt(curl, CURLOPT_URL, "http://www.vckbase.com");
res = curl_easy_perform(curl);
if(CURLE_OK == res) {
char *ct;
/**//* ask for the content-type */
/**//* http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html */
res = curl_easy_getinfo(curl, CURLINFO_CONTENT_TYPE, &ct);
if((CURLE_OK == res) && ct)
printf("We received Content-Type: %s ", ct);
}
/**//* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
官方网站:http://libfetch.darwinports.com/
更多信息:http://www.freebsd.org/cgi/man.cgi?query=fetch&sektion=3
运行平台:BSD
这是在FreeBSD里找到的一个库:libfetch,源代码在/usr/src/lib/libfetch里,它对http和ftp协议进行了封装,提供了一些很容易使用的函数,因为昨天刚看到,还没仔细研究,我试了一个用http取网页的函数,示例如下:
#include
#include
#include "fetch.h"
const char * myurl = "http://qjlemon:aaa@192.169.0.1:8080/test.html";
main()
{
FILE * fp;
char buf[1024];
fp = fetchGetURL(myurl, "";
if (!fp) {
printf("error: %s ", fetchLastErrString);
return 1;
}
while (!feof(fp)) {
memset(buf, 0, sizeof(buf));
fgets(buf, sizeof(buf), fp);
if (ferror(fp))
break;
if (buf[0])
printf("%s", buf);
else
break;
}
fclose(fp);
fp = NULL;
}
这里最重要的就是fetchGetURL函数,它按指定的URL来取文件,比如URL
是以http开头的,这个函数就知道按http取文件,如果是ftp://,就会按ftp取文件,还可以指定用户名和口令。
如果文件被取到,它会返回一个FILE指针,可以象操作普通的文件一样把网页的内容取出来。
另外这个库还提供了一些函数,可以对网络操作进行更为精细的控制。
当然最有用的是还是几个PUT函数,想要灌水就得用这个哟!哈哈哈!
资料来源:http://curl.haxx.se/libcurl/competitors.html
- a highly portable and easy-to-use client-side URL transfer library, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TELNET, DICT, FILE, TFTP and LDAP. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunnelling and more!
- Having a glance at libghttp (a gnome http library), it looks as if it works rather similar to libcurl (for http). There's no web page for this and the person who's email is mentioned in the README of the latest release I found claims he has passed the leadership of the project to "eazel". Popular choice among GNOME projects.
- More complex, and and harder to use than libcurl is. Includes everything from multi-threading to HTML parsing. The most notable transfer-related feature that libcurl does not offer but libwww does, is caching.
- C++ library "for transferring files via http, ftp, gopher, proxy server". Based on 'snarf' 2.0.9-code (formerly known as libsnarf). Quote from freshmeat:
"As the author of snarf, I have to say this frightens me. Snarf's networking system is far from robust and complete. It's probably full of bugs, and although it works for maybe 85% of all current situations, I wouldn't base a library on it."
- An HTTP and WebDAV client library, with a C interface. I've mainly heard and seen people use this with WebDAV as their main interest.
- Part of glib (GNOME). Supports: HTTP 1.1, Persistent connections, Asynchronous DNS and transfers, Connection cache, Redirects, Basic, Digest, NTLM authentication, SSL with OpenSSL or Mozilla NSS, Proxy support including SSL, SOCKS support, POST data. Probably not very portable. Lacks: cookie support, NTLM for proxies, GSS, gzip encoding, trailers in chunked responses and more.
- Handles URLs, protocols, transports for the Mozilla browser.
- Minimal download library targeted to be much smaller than the above mentioned netlib. HTTP and FTP support.
- While not a library at all, I've been told that people sometimes extract the network code from it and base their own hacks from there.
- Does HTTP and FTP transfers (both ways), supports file: URLs, and an API for URL parsing. The utility
fetch
- that is built on libfetch is an integral part of the
- operating system.
- "
a small, robust, flexible library for downloading files via HTTP using the GET method.
- "
- "
a very small C library to make http queries (GET, HEAD, PUT, DELETE, etc.) easily portable and embeddable
- "
- (Windows) Provides client-side protocol support for communication with HTTP servers. A client computer can use the XMLHTTP object to send an arbitrary HTTP request, receive the response, and have the Microsoft® XML Document Object Model (DOM) parse that response.
- QHttp is a class in the Qt library from Troll Tech. Seems to be restricted to plain HTTP. Supports GET, POST and proxy. Asynchronous.
- "
a set of routines that implement the FTP protocol. They allow applications to create and access remote files through function calls instead of needing to fork and exec an interactive ftp client program."
- A C++ library for "easy FTP client functionality. It features resuming of up- and downloads, FXP support, SSL/TLS encryption, and logging functionality."
- Has a URLStream class. This C++ class allow you to download a file using HTTP. See demo/urlfetch.cpp in commoncpp2-1.3.19.tar.gz
- Java HTTP client library.
- A Java HTTP client library written by the Jakarta project.