【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
http://www.crifan.com/emulate_login_website_using_csharp/
这个网站很好 ,他研究了 很多 post的 基础知识。。。
还有 就是 他 驱动应该也 涉及过
之前已经介绍过了网络相关的一些基础知识了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
以及简单的网页内容抓取,用C#是如何实现的:
现在接着来介绍,以模拟登陆百度首页:
为例,说明如何通过C#模拟登陆网站。
不过,此处需要介绍一下此文前提:
假定你已经看完了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
了解了基本的网络相关基本概念;
看完了:
【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器
知道了如何使用IE9的F12等工具去分析网页执行的过程。
1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑
此想要通过程序,即C#代码,实现模拟登陆百度首页之前。
你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。
而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:
【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑
看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。
注:
(1)关于在C#中如何利用cookie,不熟悉的,先去看:
【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识
(2)对于正则表达式不熟悉的,去参考:
(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:
此处,再把分析出来的流程,贴出来,以便方便和代码对照:
顺序 |
访问地址 |
访问类型 |
发送的数据 |
需要获得/提取的返回的值 |
1 | http://www.baidu.com/ | GET | 无 | 返回的cookie中的BAIDUID | |
2 | https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true | GET | 包含BAIDUID这个cookie | 从返回的html中提取出token的值 | |
3 | https://passport.baidu.com/v2/api/?login | POST | 一堆的post data,其中token的值是之前提取出来的 | 需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID |
然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。
【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】
其中,通过UI中,点击“获取cookie BAIDUID”:
然后调用下面这部分代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
string
baiduMainUrl = txbBaiduMainUrl.Text;
//generate http request
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
//add follow code to handle cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
//use request to get response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
|
获得上述所看到的BAIDUID这个cookie的值了。
然后接着点击“获取token值”,然后调用下面的代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
//add previously got cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
respHtml = sr.ReadToEnd();
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
|
就可以获取对应的token的值了:
接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
|
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
//add cookie
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
//set to POST
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
//prepare post data
string
postDataStr = quoteParas(postDict);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
//send post data
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
//got response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
//got returned html
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
loginBaiduRespHtml = sr.ReadToEnd();
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
resp.Cookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的Header信息为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
|
如果用户名和密码都是正确的话,即可成功登陆:
当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:
完整的C#模拟登陆百度首页的代码,如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
|
using
System;
using
System.Collections.Generic;
using
System.ComponentModel;
using
System.Data;
using
System.Drawing;
using
System.Text;
using
System.Windows.Forms;
using
System.Net;
using
System.IO;
using
System.Text.RegularExpressions;
using
System.Web;
namespace
emulateLoginBaidu
{
public
partial
class
frmEmulateLoginBaidu : Form
{
CookieCollection curCookies =
null
;
bool
gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
public
frmEmulateLoginBaidu()
{
InitializeComponent();
}
private
void
frmEmulateLoginBaidu_Load(
object
sender, EventArgs e)
{
//init
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
}
/******************************************************************************
functions in crifanLib.cs
*******************************************************************************/
//quote the input dict values
//note: the return result for first para no '&'
public
string
quoteParas(Dictionary<
string
,
string
> paras)
{
string
quotedParas =
""
;
bool
isFirst =
true
;
string
val =
""
;
foreach
(
string
para
in
paras.Keys)
{
if
(paras.TryGetValue(para,
out
val))
{
if
(isFirst)
{
isFirst =
false
;
quotedParas += para +
"="
+ HttpUtility.UrlPathEncode(val);
}
else
{
quotedParas +=
"&"
+ para +
"="
+ HttpUtility.UrlPathEncode(val);
}
}
else
{
break
;
}
}
return
quotedParas;
}
/******************************************************************************
Demo emulate login baidu related functions
*******************************************************************************/
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
string
baiduMainUrl = txbBaiduMainUrl.Text;
//generate http request
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl);
//add follow code to handle cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
//use request to get response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl);
//add previously got cookies
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
req.Method =
"GET"
;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
respHtml = sr.ReadToEnd();
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl);
//add cookie
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.Add(curCookies);
//set to POST
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
//prepare post data
string
postDataStr = quoteParas(postDict);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
//send post data
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
//got response
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
//got returned html
StreamReader sr =
new
StreamReader(resp.GetResponseStream());
string
loginBaiduRespHtml = sr.ReadToEnd();
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
resp.Cookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的Header信息为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString();
txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
private
void
lklEmulateLoginTutorialUrl_LinkClicked(
object
sender, LinkLabelLinkClickedEventArgs e)
{
System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
}
private
void
btnClearAll_Click(
object
sender, EventArgs e)
{
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
txbGotBaiduid.Text =
""
;
txbExtractedTokenVal.Text =
""
;
txbBaiduUsername.Text =
""
;
txbBaiduPassword.Text =
""
;
txbEmulateLoginResult.Text =
""
;
}
}
}
|
对应的,完整的VS2010的C#项目,可以去这里下载:
emulateLoginBaidu_csharp_2012-11-07.7z
【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】
后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。
下面是完整的,利用到crifanLib.cs的版本,的C#代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
|
using
System;
using
System.Collections.Generic;
using
System.ComponentModel;
using
System.Data;
using
System.Drawing;
using
System.Text;
using
System.Windows.Forms;
using
System.Net;
using
System.IO;
using
System.Text.RegularExpressions;
using
System.Web;
namespace
emulateLoginBaidu
{
public
partial
class
frmEmulateLoginBaidu : Form
{
CookieCollection curCookies =
null
;
bool
gotCookieBaiduid, extractTokenValueOK, loginBaiduOk;
public
frmEmulateLoginBaidu()
{
InitializeComponent();
}
private
void
frmEmulateLoginBaidu_Load(
object
sender, EventArgs e)
{
this
.AcceptButton =
this
.btnEmulateLoginBaidu;
//init for crifanLib.cs
curCookies =
new
CookieCollection();
//init for demo login
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
}
/******************************************************************************
functions in crifanLib.cs
Download: http://code.google.com/p/crifanlib/
*******************************************************************************/
//quote the input dict values
//note: the return result for first para no '&'
public
string
quoteParas(Dictionary<
string
,
string
> paras)
{
string
quotedParas =
""
;
bool
isFirst =
true
;
string
val =
""
;
foreach
(
string
para
in
paras.Keys)
{
if
(paras.TryGetValue(para,
out
val))
{
if
(isFirst)
{
isFirst =
false
;
quotedParas += para +
"="
+ HttpUtility.UrlPathEncode(val);
}
else
{
quotedParas +=
"&"
+ para +
"="
+ HttpUtility.UrlPathEncode(val);
}
}
else
{
break
;
}
}
return
quotedParas;
}
/*********************************************************************/
/* cookie */
/*********************************************************************/
//add a single cookie to cookies, if already exist, update its value
public
void
addCookieToCookies(Cookie toAdd,
ref
CookieCollection cookies,
bool
overwriteDomain)
{
bool
found =
false
;
if
(cookies.Count > 0)
{
foreach
(Cookie originalCookie
in
cookies)
{
if
(originalCookie.Name == toAdd.Name)
{
// !!! for different domain, cookie is not same,
// so should not set the cookie value here while their domains is not same
// only if it explictly need overwrite domain
if
((originalCookie.Domain == toAdd.Domain) ||
((originalCookie.Domain != toAdd.Domain) && overwriteDomain))
{
//here can not force convert CookieCollection to HttpCookieCollection,
//then use .remove to remove this cookie then add
// so no good way to copy all field value
originalCookie.Value = toAdd.Value;
originalCookie.Domain = toAdd.Domain;
originalCookie.Expires = toAdd.Expires;
originalCookie.Version = toAdd.Version;
originalCookie.Path = toAdd.Path;
//following fields seems should not change
//originalCookie.HttpOnly = toAdd.HttpOnly;
//originalCookie.Secure = toAdd.Secure;
found =
true
;
break
;
}
}
}
}
if
(!found)
{
if
(toAdd.Domain !=
""
)
{
// if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!!
cookies.Add(toAdd);
}
}
}
//addCookieToCookies
//add singel cookie to cookies, default no overwrite domain
public
void
addCookieToCookies(Cookie toAdd,
ref
CookieCollection cookies)
{
addCookieToCookies(toAdd,
ref
cookies,
false
);
}
//check whether the cookies contains the ckToCheck cookie
//support:
//ckTocheck is Cookie/string
//cookies is Cookie/string/CookieCollection/string[]
public
bool
isContainCookie(
object
ckToCheck,
object
cookies)
{
bool
isContain =
false
;
if
((ckToCheck !=
null
) && (cookies !=
null
))
{
string
ckName =
""
;
Type type = ckToCheck.GetType();
//string typeStr = ckType.ToString();
//if (ckType.FullName == "System.string")
if
(type.Name.ToLower() ==
"string"
)
{
ckName = (
string
)ckToCheck;
}
else
if
(type.Name ==
"Cookie"
)
{
ckName = ((Cookie)ckToCheck).Name;
}
if
(ckName !=
""
)
{
type = cookies.GetType();
// is single Cookie
if
(type.Name ==
"Cookie"
)
{
if
(ckName == ((Cookie)cookies).Name)
{
isContain =
true
;
}
}
// is CookieCollection
else
if
(type.Name ==
"CookieCollection"
)
{
foreach
(Cookie ck
in
(CookieCollection)cookies)
{
if
(ckName == ck.Name)
{
isContain =
true
;
break
;
}
}
}
// is single cookie name string
else
if
(type.Name.ToLower() ==
"string"
)
{
if
(ckName == (
string
)cookies)
{
isContain =
true
;
}
}
// is cookie name string[]
else
if
(type.Name.ToLower() ==
"string[]"
)
{
foreach
(
string
name
in
((
string
[])cookies))
{
if
(ckName == name)
{
isContain =
true
;
break
;
}
}
}
}
}
return
isContain;
}
//isContainCookie
// update cookiesToUpdate to localCookies
// if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate
public
void
updateLocalCookies(CookieCollection cookiesToUpdate,
ref
CookieCollection localCookies,
object
omitUpdateCookies)
{
if
(cookiesToUpdate.Count > 0)
{
if
(localCookies ==
null
)
{
localCookies = cookiesToUpdate;
}
else
{
foreach
(Cookie newCookie
in
cookiesToUpdate)
{
if
(isContainCookie(newCookie, omitUpdateCookies))
{
// need omit process this
}
else
{
addCookieToCookies(newCookie,
ref
localCookies);
}
}
}
}
}
//updateLocalCookies
//update cookiesToUpdate to localCookies
public
void
updateLocalCookies(CookieCollection cookiesToUpdate,
ref
CookieCollection localCookies)
{
updateLocalCookies(cookiesToUpdate,
ref
localCookies,
null
);
}
/*********************************************************************/
/* HTTP */
/*********************************************************************/
/* get url's response */
public
HttpWebResponse getUrlResponse(
string
url,
Dictionary<
string
,
string
> headerDict,
Dictionary<
string
,
string
> postDict,
int
timeout,
string
postDataStr)
{
//CookieCollection parsedCookies;
HttpWebResponse resp =
null
;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.AllowAutoRedirect =
true
;
req.Accept =
"*/*"
;
//const string gAcceptLanguage = "en-US"; // zh-CN/en-US
//req.Headers["Accept-Language"] = gAcceptLanguage;
req.KeepAlive =
true
;
//IE8
//const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";
//IE9
//const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64
const
string
gUserAgent =
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
;
// x86
//Chrome
//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";
//Mozilla Firefox
//const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
req.UserAgent = gUserAgent;
req.Headers[
"Accept-Encoding"
] =
"gzip, deflate"
;
req.AutomaticDecompression = DecompressionMethods.GZip;
req.Proxy =
null
;
if
(timeout > 0)
{
req.Timeout = timeout;
}
if
(curCookies !=
null
)
{
req.CookieContainer =
new
CookieContainer();
req.CookieContainer.PerDomainCapacity = 40;
// following will exceed max default 20 cookie per domain
req.CookieContainer.Add(curCookies);
}
if
(headerDict !=
null
)
{
foreach
(
string
header
in
headerDict.Keys)
{
string
headerValue =
""
;
if
(headerDict.TryGetValue(header,
out
headerValue))
{
// following are allow the caller overwrite the default header setting
if
(header.ToLower() ==
"referer"
)
{
req.Referer = headerValue;
}
else
if
(header.ToLower() ==
"allowautoredirect"
)
{
bool
isAllow =
false
;
if
(
bool
.TryParse(headerValue,
out
isAllow))
{
req.AllowAutoRedirect = isAllow;
}
}
else
if
(header.ToLower() ==
"accept"
)
{
req.Accept = headerValue;
}
else
if
(header.ToLower() ==
"keepalive"
)
{
bool
isKeepAlive =
false
;
if
(
bool
.TryParse(headerValue,
out
isKeepAlive))
{
req.KeepAlive = isKeepAlive;
}
}
else
if
(header.ToLower() ==
"accept-language"
)
{
req.Headers[
"Accept-Language"
] = headerValue;
}
else
if
(header.ToLower() ==
"useragent"
)
{
req.UserAgent = headerValue;
}
else
{
req.Headers[header] = headerValue;
}
}
else
{
break
;
}
}
}
if
(postDict !=
null
|| postDataStr !=
""
)
{
req.Method =
"POST"
;
req.ContentType =
"application/x-www-form-urlencoded"
;
if
(postDict !=
null
)
{
postDataStr = quoteParas(postDict);
}
//byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData);
byte
[] postBytes = Encoding.UTF8.GetBytes(postDataStr);
req.ContentLength = postBytes.Length;
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postBytes, 0, postBytes.Length);
postDataStream.Close();
}
else
{
req.Method =
"GET"
;
}
//may timeout, has fixed in:
resp = (HttpWebResponse)req.GetResponse();
updateLocalCookies(resp.Cookies,
ref
curCookies);
return
resp;
}
public
HttpWebResponse getUrlResponse(
string
url,
Dictionary<
string
,
string
> headerDict,
Dictionary<
string
,
string
> postDict)
{
return
getUrlResponse(url, headerDict, postDict, 0,
""
);
}
public
HttpWebResponse getUrlResponse(
string
url)
{
return
getUrlResponse(url,
null
,
null
, 0,
""
);
}
// valid charset:"GB18030"/"UTF-8", invliad:"UTF8"
public
string
getUrlRespHtml(
string
url,
Dictionary<
string
,
string
> headerDict,
string
charset,
Dictionary<
string
,
string
> postDict,
int
timeout,
string
postDataStr)
{
string
respHtml =
""
;
//HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout);
HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr);
//long realRespLen = resp.ContentLength;
StreamReader sr;
if
((charset !=
null
) && (charset !=
""
))
{
Encoding htmlEncoding = Encoding.GetEncoding(charset);
sr =
new
StreamReader(resp.GetResponseStream(), htmlEncoding);
}
else
{
sr =
new
StreamReader(resp.GetResponseStream());
}
respHtml = sr.ReadToEnd();
return
respHtml;
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict,
string
charset, Dictionary<
string
,
string
> postDict,
string
postDataStr)
{
return
getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr);
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict, Dictionary<
string
,
string
> postDict)
{
return
getUrlRespHtml(url, headerDict,
""
, postDict,
""
);
}
public
string
getUrlRespHtml(
string
url, Dictionary<
string
,
string
> headerDict)
{
return
getUrlRespHtml(url, headerDict,
null
);
}
public
string
getUrlRespHtml(
string
url,
string
charset,
int
timeout)
{
return
getUrlRespHtml(url,
null
, charset,
null
, timeout,
""
);
}
public
string
getUrlRespHtml(
string
url,
string
charset)
{
return
getUrlRespHtml(url, charset, 0);
}
public
string
getUrlRespHtml(
string
url)
{
return
getUrlRespHtml(url,
""
);
}
/******************************************************************************
Demo emulate login baidu related functions
*******************************************************************************/
private
void
btnGetBaiduid_Click(
object
sender, EventArgs e)
{
string
baiduMainUrl = txbBaiduMainUrl.Text;
HttpWebResponse resp = getUrlResponse(baiduMainUrl);
txbGotBaiduid.Text =
""
;
foreach
(Cookie ck
in
resp.Cookies)
{
txbGotBaiduid.Text +=
"["
+ ck.Name +
"]="
+ ck.Value;
if
(ck.Name ==
"BAIDUID"
)
{
gotCookieBaiduid =
true
;
}
}
if
(gotCookieBaiduid)
{
//store cookies
curCookies = resp.Cookies;
}
else
{
MessageBox.Show(
"错误:没有找到cookie BAIDUID !"
);
}
}
private
void
btnGetToken_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid)
{
string
respHtml = getUrlRespHtml(getapiUrl);
//bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
string
tokenValP =
@"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"
;
Match foundTokenVal = (
new
Regex(tokenValP)).Match(respHtml);
if
(foundTokenVal.Success)
{
//extracted the token value
txbExtractedTokenVal.Text = foundTokenVal.Groups[
"tokenVal"
].Value;
extractTokenValueOK =
true
;
}
else
{
txbExtractedTokenVal.Text =
"错误:没有找到token的值!"
;
}
}
else
{
MessageBox.Show(
"错误:之前没有正确获得Cookie:BAIDUID !"
);
}
}
private
void
btnEmulateLoginBaidu_Click(
object
sender, EventArgs e)
{
if
(gotCookieBaiduid && extractTokenValueOK)
{
//init post dict info
Dictionary<
string
,
string
> postDict =
new
Dictionary<
string
,
string
>();
//postDict.Add("ppui_logintime", "");
postDict.Add(
"charset"
,
"utf-8"
);
//postDict.Add("codestring", "");
postDict.Add(
"token"
, txbExtractedTokenVal.Text);
postDict.Add(
"isPhone"
,
"false"
);
postDict.Add(
"index"
,
"0"
);
//postDict.Add("u", "");
//postDict.Add("safeflg", "0");
postDict.Add(
"staticpage"
, staticpage);
postDict.Add(
"loginType"
,
"1"
);
postDict.Add(
"tpl"
,
"mn"
);
postDict.Add(
"callback"
,
"parent.bdPass.api.login._postCallback"
);
postDict.Add(
"username"
, txbBaiduUsername.Text);
postDict.Add(
"password"
, txbBaiduPassword.Text);
//postDict.Add("verifycode", "");
postDict.Add(
"mem_pass"
,
"on"
);
string
loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl,
null
, postDict);
//check whether got all expected cookies
Dictionary<
string
,
bool
> cookieCheckDict =
new
Dictionary<
string
,
bool
>();
string
[] cookiesNameList = {
"BDUSS"
,
"PTOKEN"
,
"STOKEN"
,
"SAVEUSERID"
};
foreach
(String cookieToCheck
in
cookiesNameList)
{
cookieCheckDict.Add(cookieToCheck,
false
);
}
foreach
(Cookie singleCookie
in
curCookies)
{
if
(cookieCheckDict.ContainsKey(singleCookie.Name))
{
cookieCheckDict[singleCookie.Name] =
true
;
}
}
bool
allCookiesFound =
true
;
foreach
(
bool
foundCurCookie
in
cookieCheckDict.Values)
{
allCookiesFound = allCookiesFound && foundCurCookie;
}
loginBaiduOk = allCookiesFound;
if
(loginBaiduOk)
{
txbEmulateLoginResult.Text =
"成功模拟登陆百度首页!"
;
}
else
{
txbEmulateLoginResult.Text =
"模拟登陆百度首页 失败!"
;
txbEmulateLoginResult.Text += Environment.NewLine +
"所返回的HTML源码为:"
;
txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml;
}
}
else
{
MessageBox.Show(
"错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"
);
}
}
private
void
lklEmulateLoginTutorialUrl_LinkClicked(
object
sender, LinkLabelLinkClickedEventArgs e)
{
System.Diagnostics.Process.Start(emulateLoginTutorialUrl);
}
private
void
btnClearAll_Click(
object
sender, EventArgs e)
{
curCookies =
new
CookieCollection();
gotCookieBaiduid =
false
;
extractTokenValueOK =
false
;
loginBaiduOk =
false
;
txbGotBaiduid.Text =
""
;
txbExtractedTokenVal.Text =
""
;
txbBaiduUsername.Text =
""
;
txbBaiduPassword.Text =
""
;
txbEmulateLoginResult.Text =
""
;
}
}
}
|
完整的VS2010的项目,可去这里下载:
emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z
关于crifanLib.cs:
在线浏览:crifanLib.cs
【总结】
可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比用Python实现出来,要复杂的多。
主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。
所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。
【后记 2013-09-11】
1.经过研究:
【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作
的确是:
之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;
2.现已找到原因并修复。
原因是:
.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:
H_PS_PSSID
失效,导致后续操作都异常了。
而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;
3.修复后的代码:
供下载:
(1)模拟百度登陆 独立完整代码版本 .NET 4.0
emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z
(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0
emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z
(抽空再上传上面两个文件,因为此处上传出错:
xxx.7z: unknown Bytes complete FAILED! :Upload canceled : VIRUS DETECTED! (Heuristics.Broken.Executable FOUND) |
抽空换个时间上传试试。还是同样错误的话,再去解决。)
【总结】
.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:
一直就是狗屎,bug一堆。
详见:
以后,能少用那个resp.Cookies,就少用吧。
否则被C#玩死,都不知道怎么死的。
还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。
详见: