.net的http来爬数据还是相对简单的,下面介绍一下http的post请求。
一、代码
public static string HttpPost(string formUrl, string formData)
{
try
{
//注意提交的编码 这边是需要改变的 这边默认的是Default:系统当前编码
byte[] postData = System.Text.Encoding.UTF8.GetBytes(formData);
// 设置提交的相关参数
HttpWebRequest request = WebRequest.Create(formUrl) as HttpWebRequest;
Encoding myEncoding = Encoding.UTF8;
request.Method = "POST";
request.KeepAlive = false;
request.AllowAutoRedirect = true;
request.ContentType = "application/x-www-form-urlencoded";
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)";
request.ContentLength = postData.Length;
// 提交请求数据
System.IO.Stream outputStream = request.GetRequestStream();
outputStream.Write(postData, 0, postData.Length);
outputStream.Close();
HttpWebResponse response;
Stream responseStream;
StreamReader reader;
string srcString;
response = request.GetResponse() as HttpWebResponse;
responseStream = response.GetResponseStream();
//reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("UTF-8"));
reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("GB2312"));
srcString = reader.ReadToEnd();
string result = srcString; //返回值赋值
reader.Close();
return result;
}
catch
{
return "error";
}
}
1) 调用方法 HttpPost(string formUrl, string formData)分别是请求的URL和post请求的body参数数据
2)reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("GB2312"));看看请求返回的数据是什么编码格式的,
可以对应改为GB2312或者UTF-8。
三、返回的数据可以进行解析
1)一般根据div或者table的id来对应获取想要的数据
2)如果获取的div里面嵌套很多div或者table时,也可以对应补充结束符</div>或/table>
例如你获取到的数据如下:
<div id="test">
test
<div id="test2">
test2
</div>
<div id="test3">
</div>
<div>
你只想要的数据只是
<div id="test">
test
<div id="test2">
test2
</div>
这时你需要自动补充一下<div id="test">的结束符</div>,要不然会影响到你页面数据的布局