分别用 C++、Java 和 Python 实现爬虫功能。
C++ 代码:
#include <iostream>
#include <string>
#include <curl/curl.h>
using namespace std;
size_t writeCallback(char* buf, size_t size, size_t nmemb, void* up) {
string* responseBuf = (string*) up;
responseBuf->append(buf, size * nmemb);
return size * nmemb;
}
int main() {
CURL* curl = curl_easy_init();
string responseBuf;
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, "https://example.com");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &responseBuf);
CURLcode res = curl_easy_perform(curl);
if (res != CURLE_OK) {
cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << endl;
} else {
cout << "Response body: " << responseBuf << endl;
}
curl_easy_cleanup(curl);
}
return 0;
}
Java 代码:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
public class SimpleSpider {
public static void main(String[] args) throws Exception {
URL url = new URL("https://example.com");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close();
}
}
Python 代码:
import requests
url = 'https://example.com'
response = requests.get(url)
if response.status_code == requests.codes.ok:
print(response.text)
else:
print(f'Request failed with code {response.status_code}')
这三段代码都实现了一个最简单的爬虫功能,用不同的语言实现但调用的接口都是一样的。具体来说,就是向指定 URL 发送请求,并获取响应的内容输出到控制台。这是一个最简单的示例,如果要实现更复杂的爬虫功能,代码会更为复杂。不过,相比之下,Python 的代码实现更为简单直观,C++ 的代码实现相对较为繁琐,而 Java 的代码实现则介于两者之间。