模拟登录微博通，抓取新浪微博c#

最新推荐文章于 2018-02-21 13:00:00 发布

banny

最新推荐文章于 2018-02-21 13:00:00 发布

阅读量4.5k

点赞数

分类专栏： .NET/C# 文章标签：新浪微博 c# string null stream login

本文链接：https://blog.csdn.net/malimalihun/article/details/8085070

版权

.NET/C# 专栏收录该内容

33 篇文章 0 订阅

订阅专栏

在前天我到公司就接到一个惊人的消息，说是新浪1.0的搜索接口关闭了，那我们的业务岂不是受很大影响，这个事太紧急了，大家在一起商讨该怎么办，跟新浪买2.0的接口肯定时间很长，最后决定抓取新浪网页的，新浪网页未登录状态10分钟只能抓取不到200次，然后就让你手动输入验证码了，很显然，这个方案有落空了。然后我们想是否有其他的办法，后来去看孔明和微博通，这俩家居然正常跑，就把我乐坏了。稍微研究了下，发现抓取还很容易，然后决定搞微博通的。为什么放弃孔明了，因为孔明的微博默认都是不带图片的，你还得抓一次，一次带图片的，一次不带图片的，然后倆者拼在一起才算有效数据。

废话少说，下面的东西就在俩个小时内完成了。

一、首先我们要提到模拟登录微博通，我用fiddler查看了下，发现登录走下面三个流程：

1、第一个请求地址：http://www.wbto.cn/?c=login&m=index_login ，这个请求是Post提交的，生成一个用户身份唯一标识的cookie，但是这个cookie不全，需要下面一个请求来补全，那么下面一个请求的地址从哪来了？从第一个post请求返回

2、第二个请求地址：http://www.wbto.cn/bbs/api/uc.php?time=1350530131&code=9888GfRzqS0FVoZFT1EoH6UsF0JklnvPdaiKrhN%2FD5BPYdzdwzyBTIbk5roC8YVn9AeTjsNFDqD1LONBwdGCQqjL%2BqX6uqSXUT6w9Si6JOlNA5BFLYEHIfl1A96Cf5bACA4O447xUah%2FM9JxMc47qOFtlnjjqWSRb%2FNPNMrJsxjCuex8V6Wri%2Ft8ZP7Q87s8，这个请求地址在上面描述了，是第一个Post请求返回的，下面代码里会演示如何获取。

3、第三个请求地址：http://www.wbto.cn 这个是登录地址，其实第二个请求完成后，用户身份标识的cookie已经完全形成，那么拿着这个cookie就可以访问任何一个页面了，包括首页以及我们主要关心的微博搜索页。

下面来上代码，完成上面的描述，很简单，如下：

（1 ）我们先定义一个全局变量：

 private static CookieContainer cc = new CookieContainer();

（2 )模拟第一个post请求：

/// <summary>
        /// 模拟post请求
        /// </summary>
        /// <param name="url">Post请求后会返回下一次get请求的地址</param>
        /// <param name="cookie">返回的cookie</param>
        /// <param name="nick">用户昵称/邮箱</param>
        /// <param name="password">用户登录密码</param>
        /// <returns></returns>
        public void SimulatePost(string nick,string password, out string url,out string cookie)
        {
            HttpWebRequest req = null;
            HttpWebResponse rep = null;
            Stream stream = null;
            try
            {
                string postdata = "action=submit&to=/&username={0}&password={1}&cookietime=2592000";
                postdata = string.Format(postdata, nick, password);             //拼装数据
                string LoginUrl = "http://www.wbto.cn/?c=login&m=index_login";  //登录地址
                #region 信息头
                req = (HttpWebRequest)WebRequest.Create(LoginUrl);
                req.Method = "POST";
                req.ContentType = "application/x-www-form-urlencoded";
                byte[] postdatabytes = Encoding.UTF8.GetBytes(postdata);
                req.ContentLength = postdatabytes.Length;
                req.Referer = "http://www.wbto.cn/";
                req.AllowAutoRedirect = false;
                req.CookieContainer = cc;
                req.KeepAlive = true;
                #endregion
                #region 提交数据并返回COOKIE
                stream = req.GetRequestStream();
                stream.Write(postdatabytes, 0, postdatabytes.Length);
                rep = (HttpWebResponse)req.GetResponse();
                cc = req.CookieContainer;
                cookie = req.CookieContainer.GetCookieHeader(req.RequestUri);  //字符串形式的COOKIE跟你在浏览器看到的一样
                StreamReader stre = new StreamReader(rep.GetResponseStream(), Encoding.GetEncoding("utf-8"));
                #endregion
                string html = stre.ReadToEnd();
                url = html.Substring(html.IndexOf("src=\"") + 5, (html.IndexOf("\"", html.IndexOf("src=\"") + 5) - html.IndexOf("src=\"") - 5));
                rep.Close();
                stream.Close();
            }
            catch 
            {
                if (rep != null)
                {
                    rep.Close();
                }
                if (stream != null)
                {
                    stream.Close();
                }
                url = null;
                cookie = null;
            }
        }

（3）模拟第二个请求：

 /// <summary>
        /// 模拟Get请求
        /// </summary>
        /// <param name="url">请求地址</param>
        /// <param name="cookie">cookie对象</param>
        /// <param name="cookieStr">cookie字符串</param>
        /// <param name="host">host，可选</param>
        /// <param name="newCookie">新的cookie字符串</param>
        /// <returns></returns>
        public string SimulateGet(string url, CookieContainer cookie,string cookieStr, string host,out string newCookie)
        {
            string json = string.Empty;
            HttpWebRequest req = null;
            HttpWebResponse rep = null;
            Stream st = null;
            try
            {
                req = req = (HttpWebRequest)HttpWebRequest.Create(url);
                req.ContentType = "application/x-www-form-urlencoded";
                req.Method = "Get";
                req.Timeout = 1000 * 30;
                if (!string.IsNullOrEmpty(host))
                    req.Host = host;
                req.Referer = "http://www.wbto.cn/?c=login&m=index_login";
                req.CookieContainer = cookie;
                req.Headers.Add("Cookie:" + cookieStr);
                rep = (HttpWebResponse)req.GetResponse();
                newCookie = req.CookieContainer.GetCookieHeader(req.RequestUri);
                st = rep.GetResponseStream();
                StreamReader stre = new StreamReader(st, Encoding.GetEncoding("utf-8"));
                json = stre.ReadToEnd();

            }
            catch
            {
                if (rep != null)
                {
                    rep.Close();
                }
                if (st != null)
                {
                    st.Close();
                }
                newCookie = null;
            }
            return json;
        }

这里讲一下，当我们第二个请求完成后，我前面讲过cookie已经完全形成了，这个时候你拿着这个cookie就可以访问任何一个页面了，所以聪明的你会发现，第二个方法是通用的，你可以接着用他请求其他的地址，只要传入正确的请求地址和cookie字符串即可。

下面贴上最后一段代码，是我测试用的，如下：