之前已经介绍过了网络相关的一些基础知识了:
以及简单的网页内容抓取,用C#是如何实现的:
现在接着来介绍,以模拟登陆百度首页:
为例,说明如何通过C#模拟登陆网站。
不过,此处需要介绍一下此文前提:
假定你已经看完了:
了解了基本的网络相关基本概念;
看完了:
知道了如何使用IE9的F12等工具去分析网页执行的过程。
1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑
此想要通过程序,即C#代码,实现模拟登陆百度首页之前。
你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。
而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:
2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑
看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。
注:
(1)关于在C#中如何利用cookie,不熟悉的,先去看:
(2)对于正则表达式不熟悉的,去参考:
(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:
此处,再把分析出来的流程,贴出来,以便方便和代码对照:顺序
访问地址
访问类型
发送的数据
需要获得/提取的返回的值
1GET无返回的cookie中的BAIDUID
3POST一堆的post data,其中token的值是之前提取出来的需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID
然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。
【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】
其中,通过UI中,点击“获取cookie BAIDUID”:
然后调用下面这部分代码:private void btnGetBaiduid_Click(object sender, EventArgs e)
{
//http://www.baidu.com/
string baiduMainUrl = txbBaiduMainUrl.Text;
//generate http request
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
//add follow code to handle cookies
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method = "GET";
//use request to get response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
txbGotBaiduid.Text = "";
foreach (Cookie ck in resp.Cookies)
{
txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value;
if (ck.Name == "BAIDUID")
{
gotCookieBaiduid = true;
}
}
if (gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show("错误:没有找到cookie BAIDUID !");
}
}
获得上述所看到的BAIDUID这个cookie的值了。
然后接着点击“获取token值”,然后调用下面的代码:private void btnGetToken_Click(object sender, EventArgs e)
{
if (gotCookieBaiduid)
{
string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
//add previously got cookies
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method = "GET";
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string respHtml = sr.ReadToEnd();
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string tokenValP = @"bdPass\.api\.params\.login_token='(?\w+)';";
Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml);
if (foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value;
extractTokenValueOK = true;
}
else
{
txbExtractedTokenVal.Text = "错误:没有找到token的值!";
}
}
else
{
MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !");
}
}
就可以获取对应的token的值了:
接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:private void btnEmulateLoginBaidu_Click(object sender, EventArgs e)
{
if (gotCookieBaiduid && extractTokenValueOK)
{
string staticpage = "http://www.baidu.com/cache/user/html/jump.html";
//init post dict info
Dictionary postDict = new Dictionary();
//postDict.Add("ppui_logintime", "");
postDict.Add("charset", "utf-8");
//postDict.Add("codestring", "");
postDict.Add("token", txbExtractedTokenVal.Text);
postDict.Add("isPhone", "false");
postDict.Add("index", "0");
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add("staticpage", staticpage);
postDict.Add("loginType", "1");
postDict.Add("tpl", "mn");
postDict.Add("callback", "parent.bdPass.api.login._postCallback");
postDict.Add("username", txbBaiduUsername.Text);
postDict.Add("password", txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add("mem_pass", "on");
string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
//add cookie
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(curCookies);
//set to POST
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
//prepare post data
string postDataStr = quoteParas(postDict);
byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
//send post data
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
//got response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
//got returned html
StreamReader sr = new StreamReader(resp.GetResponseStream());
string loginBaiduRespHtml = sr.ReadToEnd();
//check whether got all expected cookies
Dictionary cookieCheckDict = new Dictionary();
string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"};
foreach (String cookieToCheck in cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck, false);
}
foreach (Cookie singleCookie in resp.Cookies)
{
if (cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] = true;
}
}
bool allCookiesFound = true;
foreach (bool foundCurCookie in cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if (loginBaiduOk)
{
txbEmulateLoginResult.Text = "成功模拟登陆百度首页!";
}
else
{
txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!";
txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:";
txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:";
txbEmulateLoginResult.Text +=