用.Net core写爬虫之HttpClient用法详解

HttpClient用法详解

现在很多爬虫程序都是用Python写的,但是其实什么语言都可以写爬虫,在Python流行之前,我了解到很多公司的爬虫都是拿Java来写,当然也可能有其他语言,闲言少叙,圆规正转,由于我最近在学习.Net core,所以就尝试着,用C#来写爬虫程序,因为.Net core框架也是跨平台的,输个命令也能在Linux下跑,跟Python脚本的效果差不多。既然写爬虫,就免不了涉及发送HTTP请求相关的类库,在python中比较常用的是requests库,异步的有aiohttp库,在C#中与之对应的就是HttpClient库,也是支持异步高并发的库,而且支持的非常好。

1. 搭建测试服务

在讲发送Http请求之前,我们先要搭建好一个请求的服务或网站,当然咱也可以随便找个网站发请求,但是随便的网站不太利于学习,有个现成的服务就非常好,它能把你每次请求的参数和标头信息都格式工整的返回来,非常利于测试和学习,这个服务就是大名鼎鼎的httpbin.org ,官方的服务比较卡 http://httpbin.org,可以自己搭建一个,非常简单,也可以看我写的搭建笔记 Docker搭建httpbin服务,也可以先玩我自己搭好的 http://zhousonglin.cn:8080/

2. 发送GET请求

发送GET请求的时候比较多,大部分的时候我们都发GET请求来获取数据,POST请求一般只有在我们登陆验证的时候会用到。下面的代码就是我对Get请求的异步封装方法,微软官方也建议尽量用异步来实现业务,因为好处多多,这里就不再细说了。

/// <summary>
/// Get请求发送
/// </summary>
/// <param name="requestUrl">url地址</param>
/// <returns></returns>
public static async Task<string> HtmlGet(string requestUrl)
{
	string responseBody = string.Empty;
	using (HttpClient httpClient = new HttpClient())
	{
		httpClient.DefaultRequestHeaders.Add("Method", "Get");
		httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
		httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
		
		HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
		// var response = await httpClient.GetStringAsync(requestUrl);
		response.EnsureSuccessStatusCode();
		responseBody = await response.Content.ReadAsStringAsync();
	}
	return responseBody;
}

使用方法:

string urlRequestGet = "http://zhousonglin.cn:8080/get";
string responseStr = string.Empty;
responseStr = HtmlGet(urlRequestGet).Result;
Console.WriteLine(responseStr);

执行效果:
Get请求截图

3. 发送POST请求

POST请求传参有两种方式,一种是传form类型的参数,一种是传Json字符串类型的参数。

3.1 传递form类型参数
/ <summary>
/// Post请求发送
/// </summary>
/// <param name="requestUrl">url</param>
/// <param name="postParams">传递参数</param>
/// <returns></returns>
public static async Task<string> HtmlPost(string requestUrl,Dictionary<string, string> postParams)
{
	string responseBody = string.Empty;
	using (HttpClient httpClient = new HttpClient())
	{
		httpClient.DefaultRequestHeaders.Add("Method", "Post");
		httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
		httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
		HttpContent postContent = new FormUrlEncodedContent(postParams);
		HttpResponseMessage response = await httpClient.PostAsync(requestUrl, postContent);
		response.EnsureSuccessStatusCode();
		responseBody = await response.Content.ReadAsStringAsync();
	}
	return responseBody;
}

使用方法:


string urlRequestPost = "http://zhousonglin.cn:8080/post";
string responseStr = string.Empty;
Dictionary<string, string> postParams = new Dictionary<string, string>()
{
	{"say","Hello" },
	{"ask","question" }
};
responseStr = HtmlPost(urlRequestPost, postParams).Result;
Console.WriteLine(responseStr);

执行效果:
Post请求发送截图

3.2 传递Json类型参数
/// <summary>
/// Post请求Json参数
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="jsonParams"></param>
/// <returns></returns>
public static async Task<string> HtmlPostJson(string requestUrl, string jsonParams)
{
	string responseBody = string.Empty;
	using (HttpClient httpClient = new HttpClient())
	{
		httpClient.DefaultRequestHeaders.Add("Method", "Post");
		httpClient.DefaultRequestHeaders.Add("KeepAlive", "false"); 
		httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");

		HttpContent content = new StringContent(jsonParams);
		content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
		HttpResponseMessage response = await httpClient.PostAsync(requestUrl, content);

		response.EnsureSuccessStatusCode();
		responseBody = await response.Content.ReadAsStringAsync();
	}
	return responseBody;
}

使用方法:


public class User
{
	public User()
	{ }
	public string Name {get;set;}
	public string Sex {get; set;}
}
string urlRequestPost = "http://zhousonglin.cn:8080/post";
User user = new User()
{
	Name = "Dahlin",
	Sex = "male"
};
string jsonParam = JsonConvert.SerializeObject(user);
responseStr = HtmlPostJson(urlRequestPost, jsonParam).Result;
Console.WriteLine(responseStr);

执行效果:
Post请求截图

4. 文件下载请求

爬虫程序一般是用来爬取字符数据的,但有时候我们也爬取一些图片或视频类的文件,HttpClient也是支持文件下载的,方法封装如下:

/// <summary>
/// 下载文件
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="fileName"></param>
/// <returns></returns>
public static async Task HtmlDownloadFile(string requestUrl, string fileName)
{

	using HttpClient httpClient = new HttpClient();

	httpClient.DefaultRequestHeaders.Add("Method", "Get");
	httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
	httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");

	HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
	response.EnsureSuccessStatusCode();

	await response.Content.ReadAsByteArrayAsync().ContinueWith(
		(readBytestTask) =>
		{
			byte[] data = readBytestTask.Result;
			using FileStream fs = new FileStream(fileName, FileMode.Create);
			fs.Write(data, 0, data.Length);
			fs.Flush();
			fs.Close();
		});
}

使用方法:

string urlPicture = "http://qn.zhousonglin.cn/DaGuanYuan34.jpg?imageslim";
HtmlDownloadFile(urlPicture, "1.jpg").Wait();

关于HttpClient库,以上这些方法基本就足够用了,当然还有一些比较深度的玩法,比如自行扩展消息处理器是HttpClientHandler,再比如添加Cookie发送,如下:

CookieContainer cookieContainer = new CookieContainer();
cookieContainer.Add(new Cookie("XXXXXX", "XXXXXXX"));   
HttpClientHandler httpClientHandler = new HttpClientHandler()
{
   CookieContainer = cookieContainer,
   AllowAutoRedirect = true,
   UseCookies = true
};
HttpClient httpClient = new HttpClient(httpClientHandler);

还有加入代理等等用法,大同小异,F12 HttpClientHandler一下就明白了,这里就不再细说了,或者以后用到了我再总结一篇深度玩法,其实就是对官方公开的接口基类做一些自定义扩展和重写。

  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值