HttpClient用法详解
现在很多爬虫程序都是用Python写的,但是其实什么语言都可以写爬虫,在Python流行之前,我了解到很多公司的爬虫都是拿Java来写,当然也可能有其他语言,闲言少叙,圆规正转,由于我最近在学习.Net core,所以就尝试着,用C#来写爬虫程序,因为.Net core框架也是跨平台的,输个命令也能在Linux下跑,跟Python脚本的效果差不多。既然写爬虫,就免不了涉及发送HTTP请求相关的类库,在python中比较常用的是requests库,异步的有aiohttp库,在C#中与之对应的就是HttpClient库,也是支持异步高并发的库,而且支持的非常好。
1. 搭建测试服务
在讲发送Http请求之前,我们先要搭建好一个请求的服务或网站,当然咱也可以随便找个网站发请求,但是随便的网站不太利于学习,有个现成的服务就非常好,它能把你每次请求的参数和标头信息都格式工整的返回来,非常利于测试和学习,这个服务就是大名鼎鼎的httpbin.org ,官方的服务比较卡 http://httpbin.org,可以自己搭建一个,非常简单,也可以看我写的搭建笔记 Docker搭建httpbin服务,也可以先玩我自己搭好的 http://zhousonglin.cn:8080/
2. 发送GET请求
发送GET请求的时候比较多,大部分的时候我们都发GET请求来获取数据,POST请求一般只有在我们登陆验证的时候会用到。下面的代码就是我对Get请求的异步封装方法,微软官方也建议尽量用异步来实现业务,因为好处多多,这里就不再细说了。
/// <summary>
/// Get请求发送
/// </summary>
/// <param name="requestUrl">url地址</param>
/// <returns></returns>
public static async Task<string> HtmlGet(string requestUrl)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Get");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
// var response = await httpClient.GetStringAsync(requestUrl);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
string urlRequestGet = "http://zhousonglin.cn:8080/get";
string responseStr = string.Empty;
responseStr = HtmlGet(urlRequestGet).Result;
Console.WriteLine(responseStr);
执行效果:
3. 发送POST请求
POST请求传参有两种方式,一种是传form类型的参数,一种是传Json字符串类型的参数。
3.1 传递form类型参数
/ <summary>
/// Post请求发送
/// </summary>
/// <param name="requestUrl">url</param>
/// <param name="postParams">传递参数</param>
/// <returns></returns>
public static async Task<string> HtmlPost(string requestUrl,Dictionary<string, string> postParams)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Post");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpContent postContent = new FormUrlEncodedContent(postParams);
HttpResponseMessage response = await httpClient.PostAsync(requestUrl, postContent);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
string urlRequestPost = "http://zhousonglin.cn:8080/post";
string responseStr = string.Empty;
Dictionary<string, string> postParams = new Dictionary<string, string>()
{
{"say","Hello" },
{"ask","question" }
};
responseStr = HtmlPost(urlRequestPost, postParams).Result;
Console.WriteLine(responseStr);
执行效果:
3.2 传递Json类型参数
/// <summary>
/// Post请求Json参数
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="jsonParams"></param>
/// <returns></returns>
public static async Task<string> HtmlPostJson(string requestUrl, string jsonParams)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Post");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpContent content = new StringContent(jsonParams);
content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
HttpResponseMessage response = await httpClient.PostAsync(requestUrl, content);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
public class User
{
public User()
{ }
public string Name {get;set;}
public string Sex {get; set;}
}
string urlRequestPost = "http://zhousonglin.cn:8080/post";
User user = new User()
{
Name = "Dahlin",
Sex = "male"
};
string jsonParam = JsonConvert.SerializeObject(user);
responseStr = HtmlPostJson(urlRequestPost, jsonParam).Result;
Console.WriteLine(responseStr);
执行效果:
4. 文件下载请求
爬虫程序一般是用来爬取字符数据的,但有时候我们也爬取一些图片或视频类的文件,HttpClient也是支持文件下载的,方法封装如下:
/// <summary>
/// 下载文件
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="fileName"></param>
/// <returns></returns>
public static async Task HtmlDownloadFile(string requestUrl, string fileName)
{
using HttpClient httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("Method", "Get");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
response.EnsureSuccessStatusCode();
await response.Content.ReadAsByteArrayAsync().ContinueWith(
(readBytestTask) =>
{
byte[] data = readBytestTask.Result;
using FileStream fs = new FileStream(fileName, FileMode.Create);
fs.Write(data, 0, data.Length);
fs.Flush();
fs.Close();
});
}
使用方法:
string urlPicture = "http://qn.zhousonglin.cn/DaGuanYuan34.jpg?imageslim";
HtmlDownloadFile(urlPicture, "1.jpg").Wait();
关于HttpClient库,以上这些方法基本就足够用了,当然还有一些比较深度的玩法,比如自行扩展消息处理器是HttpClientHandler,再比如添加Cookie发送,如下:
CookieContainer cookieContainer = new CookieContainer();
cookieContainer.Add(new Cookie("XXXXXX", "XXXXXXX"));
HttpClientHandler httpClientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AllowAutoRedirect = true,
UseCookies = true
};
HttpClient httpClient = new HttpClient(httpClientHandler);
还有加入代理等等用法,大同小异,F12 HttpClientHandler一下就明白了,这里就不再细说了,或者以后用到了我再总结一篇深度玩法,其实就是对官方公开的接口基类做一些自定义扩展和重写。