尝试自动重定向的次数太多

转帖地址:http://blog.sina.com.cn/s/blog_414cc36d0100lkf1.html

 

 

其实经常使用.NET和JAVA抓取过网页,一直没有深层次的探索原理,今天有一网友发了一个网址在用.NET抓取的时候就出现了问题,程序似乎陷入一个死循环中,不停抓取页面但没返回任何代码,具体的异常为" 尝试自动重定向的次数太多 " ,GOOGLE了一回说是要在 HttpWebRequest 中设置一个实际的 CookieContainer 对象,用来容纳COOKIE , 试过了,但没效果,最初的代码见下

 

 

 public string login()
        {
            WebResponse myResponse = null ;
                byte[] data = Encoding.UTF8.GetBytes("uid=&langx=zh-cn&mac=&ver=&JE=true&username=acbtrnn9&passwd=qqq111");
                //HttpWebRequest myRequest = WebRequest.Create("http://asd10000.com/app/member/login.php ") as HttpWebRequest;
                string url;
                    url = "http://asd10000.com/app/member/login.php ";

                HttpWebRequest myRequest = WebRequest.Create(url) as HttpWebRequest;

                myRequest.Method = "POST";
                myRequest.ContentType = "application/x-www-form-urlencoded";
                myRequest.AllowAutoRedirect = true;
                myRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)";
                myRequest.KeepAlive = true;
                myRequest.CookieContainer = cc;

                //myRequest.Headers.Add("Cookie", "nn1aab3b=5xPgv3TKq3uURmNsaeuycQ==");

                myRequest.Referer = "http://asd10000.com/app/member/ ";
                Stream newStream = myRequest.GetRequestStream();
                newStream.Write(data, 0, data.Length);
                newStream.Close();


                  myResponse = myRequest.GetResponse();

            string str;

            //采用流读取,并确定编码方式
            using (Stream s = myResponse.GetResponseStream())
            {
                StreamReader objReader = new StreamReader(s, Encoding.UTF8);
                str = objReader.ReadToEnd();
            }

            return str;
        }

 

 

 

在FF调试了下,发现FF在访问对应页面的时候用到了COOKIE,但是使用Http Analyzer工具在抓取上面程序的请求包时,虽然创建了CookieContainer,但这个实例中没有任何初始的COOKIE信息,所以导致页面在服务端由于不能访问到COOKIE使得服务器页不停重定向到登录页,所以程序陷入了死循环.

    所以知道原理后,很容易解决问题,有两种办法

    第一种,直接将COOKIE写到请求头中 myRequest.Headers.Add("Cookie", "nn1aab3b=5xPgv3TKq3uURmNsaeuycQ==");

    第二种,还是使用CookieContainer ,事先可以初始化一些COOKIE,同样它有两种办法可以做到

            cc.Add(new Cookie("nn1aab3b", "5xPgv3TKq3uURmNsaeuycQ==", "/", "asd10000.com"));
            cc.SetCookies(new Uri( "
http://asd10000.com ") ,"nn1aab3b=5xPgv3TKq3uURmNsaeuycQ==");

 

    但上面有一个问题,就是事先要知道COOKIE的内容,这个内容是在FF登录后,看到的COOKIE内容,如果事先并不知道怎么办呢.

 

    后来后了一下SDK的注释,说是当出现响应状态码为 302 的时候,如果是POST方式,那么默认的将会用GET方式再请求一次.当第一次POST的时候,本地COOKIE的值为空,当请求过后服务器将会写入一段COOKIE到本地,然后IE会再次用GET请求,这时本地COOKIE已经有值了,所以整个过程请求完成.

 

这样就可以模仿IE来做两次请求, 这里需要 设置 myRequest.AllowAutoRedirect = false;

具体代码如下

 public string mylogin()
        {
            HttpWebResponse myResponse = null;
            byte[] data = Encoding.UTF8.GetBytes("uid=&langx=zh-cn&mac=&ver=&JE=true&username=acbtrnn9&passwd=qqq111");
            //HttpWebRequest myRequest = WebRequest.Create("http://asd10000.com/app/member/login.php ") as HttpWebRequest;
            string url;
            url = "http://asd10000.com/app/member/login.php ";

            HttpWebRequest myRequest = WebRequest.Create(url) as HttpWebRequest;

            myRequest.Method = "POST";
            myRequest.ContentType = "application/x-www-form-urlencoded";
            myRequest.AllowAutoRedirect = false;
            myRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)";
            myRequest.KeepAlive = true;
            myRequest.CookieContainer = cc;

            //myRequest.Headers.Add("Cookie", "nn1aab3b=5xPgv3TKq3uURmNsaeuycQ==");

            myRequest.Referer = "http://asd10000.com/app/member/ ";
            Stream newStream = myRequest.GetRequestStream();
            newStream.Write(data, 0, data.Length);
            newStream.Close();


            myResponse = myRequest.GetResponse() as HttpWebResponse;
            if (myResponse.StatusCode == HttpStatusCode.Redirect)
            {
                cc.Add(myResponse.Cookies);


                url = "http://asd10000.com/app/member/login.php ";

                  myRequest = WebRequest.Create(url) as HttpWebRequest;

                myRequest.Method = "POST";
                myRequest.ContentType = "application/x-www-form-urlencoded";
                myRequest.AllowAutoRedirect = false;
                myRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)";
                myRequest.KeepAlive = true;
                myRequest.CookieContainer = cc;

                //myRequest.Headers.Add("Cookie", "nn1aab3b=5xPgv3TKq3uURmNsaeuycQ==");

                myRequest.Referer = "http://asd10000.com/app/member/ ";
                Stream s = myRequest.GetRequestStream();
                s.Write(data, 0, data.Length);
                s.Close();


                myResponse = myRequest.GetResponse() as HttpWebResponse;
            }

            string str;

            //采用流读取,并确定编码方式
            using (Stream s = myResponse.GetResponseStream())
            {
                StreamReader objReader = new StreamReader(s, Encoding.UTF8);
                str = objReader.ReadToEnd();
            }

            return str;
        }

 

 

上面代码冗余度很高,只是做一个请求的例子,基本思路是这样的,其实真正在搞清楚从IE输入域名到最终呈现页面整个过程是很复杂的,起码要弄清楚HTTP协议,以及各个响应的状态码含义.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值