java 模拟登陆百度，附带“专业抢二楼”功能。

最新推荐文章于 2021-03-11 09:51:29 发布

esuvf

最新推荐文章于 2021-03-11 09:51:29 发布

阅读量3.6k

点赞数

本文链接：https://blog.csdn.net/dwheger/article/details/17960351

版权

最近看到java吧的吧主写了个自动抢二楼的脚本。觉得很奇妙，但是他不肯开源，是在不行了，自己摸索吧（当然网上也有类似教程，不过很多太老了）。

由于百度的登陆框是用js动态生成的，所以直接抓源码的话是拿不到你想要的表单的。

这次用的是 ie9。打开ie9进入“www.baidu.com”，按F12可以看到调试界面。我们先把缓存清理一下：

将下图的这两个勾打掉，以防止我们抓到的包由于页面跳转时被清理掉：

做好之后我们可以点击“开始捕获”按钮，然后先刷新一下当前的页面（这样可以避免输入验证码），跟平常一样点击登陆，在弹出框中输入用户名密码。

点击登陆，跳转到登陆后的页面之后，点击“停止捕获”按钮。

可以看到抓到了很多包。如果不知道从哪里入手的话呢，可以在下面所示的搜索框中输入自己刚刚填进去的密码（打码的是我的用户名和密码）：

可以看到我们登陆进行post操作的参数，我们可以拷贝到记事本进行分析：

排除掉一些我们知道其含义的参数后,可得出一下未确定的参数：

&token=84a7862f4bf8c2aa8a1d22cf9fbade51
&tpl=mn
&tt=1389080269653
&codestring=
&u=http%3A%2F%2Fwww.baidu.com%2Findex.php%3Ftn%3D10018802_hao
&quick_user=0
&loginmerge=true
&splogin=rate
&ppui_logintime=8801

对于这些参数，我们登陆的时候要么不传，要么照着原来的值传过去。由于以前接触过新浪微博的sdk，它是采用OAuth2.0进行授权认证，认证完后会返回一个成功授权的token。所以可先重点考虑token从哪里来的。

先把token的值复制粘贴到搜索框中，可以看到

由上，我们得知token最早是通过get箭头所示的url得到的。

我们可以尝试着访问这个url看看能否得到一个token：

可以看到是不行的。

那么同样是get方式，为什么刚刚登陆的时候可以拿到token，但是现在却不行了呢？

我能想到的只有cookie了，查了一下cookie：

还是刚刚的第17个请求，可以看到浏览器其实发送了 BAIDUID等cookie。所以我们get的时候应该将之前的cookie一起发送过去，所以第一步应该先访问“www.baidu.com”，拿到cookie，再访问“https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&tt=1389080260852&class=login&logintype=dialogLogin&callback=bd__cbs__jndgh2”

同时将cookie发送过去，拿到token以进行登陆操作。

核心代码如下：

// 第一步，登陆百度，获取需要的cookie
org.apache.http.protocol.HttpContext httpContext = new BasicHttpContext();
httpContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);

HttpClient client = new DefaultHttpClient();
HttpGet httpGet1 = new HttpGet("http://www.baidu.com/");
client.execute(httpGet1, httpContext);

// 第二步，用cookie获特定的token，用于模拟登陆的post参数
HttpClient client2 = new DefaultHttpClient();
HttpGet httpGet2 = new HttpGet(
"https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3"
+ "&tt=1388488343671&class=login&logintype=dialogLogin&callback=bd__cbs__4aeorp");
HttpContext context2 = new BasicHttpContext();
context2.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse response2 = client2.execute(httpGet2, context2);
String temp = EntityUtils.toString(response2.getEntity(), "UTF-8");
System.out.println(temp);
String temp1 = temp.substring(temp.indexOf("token") + 10);
可以看到输出结果：

可以通过解析字符串拿到token。然后在将参数post到“https://passport.baidu.com/v2/api/?login”。核心代码如下：

HttpClient client3 = new DefaultHttpClient();
HttpPost post = new HttpPost("https://passport.baidu.com/v2/api/?login");
List<NameValuePair> parameters = new ArrayList<NameValuePair>();
parameters.add(new BasicNameValuePair("staticpage",
"http://www.baidu.com/cache%2Fuser/html/v3Jump.html"));
parameters.add(new BasicNameValuePair("charset", "utf-8"));
parameters.add(new BasicNameValuePair("token", token));
parameters.add(new BasicNameValuePair("tpl", "mn"));
parameters.add(new BasicNameValuePair("apiver", "v3"));
parameters.add(new BasicNameValuePair("tt", "1388552675432"));
parameters.add(new BasicNameValuePair("safeflg", "0"));
parameters.add(new BasicNameValuePair("u", "http://www.baidu.com/"));
parameters.add(new BasicNameValuePair("isPhone", "false"));
parameters.add(new BasicNameValuePair("quick_user", "0"));
parameters.add(new BasicNameValuePair("loginmerge", "true"));
parameters.add(new BasicNameValuePair("logintype", "dailoglogin"));
parameters.add(new BasicNameValuePair("splogin", "rate"));
parameters.add(new BasicNameValuePair("username", "你的用户名"));
parameters.add(new BasicNameValuePair("password", "你的密码"));
parameters.add(new BasicNameValuePair("men_pass", "on"));
parameters.add(new BasicNameValuePair("callback",
"parent.bd__pcbs__5i3pfd"));
HttpEntity postBodyEnt = new UrlEncodedFormEntity(parameters,"utf-8");
post.setEntity(postBodyEnt);

HttpContext context3 = new BasicHttpContext();
context3.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse re = client3.execute(post, context3);

那么我们怎么知道自己登陆成功了呢？

如果登陆成功了，那么返回的html代码中应该会有一个err_no = 0 如下如ie9所示:

模拟登陆就写到这里。

现在我们看看抢二楼的，抢二楼需要用到 jsoup工具包来解析html。

首先我们可以先访问任意一个贴吧，比如linux吧，进入一个帖子进行回帖，为了方便讲解这里回复了纯字母的。

然后在抓包界面中搜索刚刚的回帖内容：

可以看到回帖时浏览器所提交的参数：

复制到记事本进行分析：

ie=utf-8 编码
&kw=linux 贴吧名称
&fid=3171
&tid=2801628284
&vcode_md5= 验证码
&floor_num=22
&rich_text=1
&tbs=2caa0071fe0f49621389084151
&content=lu+guo+hun+yan+shu 回复内容
&files=%5B%5D 附件，这个其实是“[ ]”进过编码后就变成了%5B%5D
&mouse_pwd=23%2C18%2C19%2C9%2C16%2C28%2C18%2C21%2C44%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C44%2C20%2C23%2C16%2C23%2C28%2C44%2C20%2C22%2C19%2C19%2C9%2C18%2C19%2C29%2C13890841559370 通过反编码，可以得出这个貌似是记录鼠标位置的，先不考虑
&mouse_pwd_t=1389084155937
&mouse_pwd_isclick=0
&__type__=reply 类型

所以要找出的就剩下fid ，tid，floor_num，rich_text，tbs等参数的来源

可以先回到我们刚刚访问的贴吧页面，然后用审查元素（几乎每个浏览器都支持，这里用火狐）：

可以看到，每个帖子都是用 li 标签的，回复是0的帖子里面有一个 reply_num = 0，而且里面的id貌似就是我们要找的参数中的 tid，对比了一下：

跟我们刚刚恢复的帖子的tid是一样的（参照上面贴出的tid参数）。我们进入该帖子，右键，查看网页源代码，可以看到网页源代码，然后在搜索看看是否有我们所需要的参数

依照这种方法，我们可以找到原来的网页中有 floor_num，rich_text，tbs；除去鼠标那些我们不管的参数，已经找齐了。

所以可分为以下步骤，在贴吧首页找到有 reply_num = 0 标志的 li 标签，取出其tid。然后访问帖子的详细页面“http://tieba.baidu.com/p/（替换为tid）”

在返回的html源码中找到我们需要的 fid，tbs，floor_num，rich_text 等参数的值，然后进行post提交即可。

首先借助jsoup解析某贴吧的首页，比如贝爷吧，代码如下：

//-----------水贴开始-----------------------
HttpClient client4 = new DefaultHttpClient();
HttpGet get = new HttpGet("http://tieba.baidu.com/f?kw=贝爷&fr=index");
HttpContext context4 = new BasicHttpContext();
context4.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse finalResponse = client4.execute(get, context4);

String result = EntityUtils
.toString(finalResponse.getEntity(), "utf-8");
Document doc = Jsoup.parse(result);
Elements elements = doc.getElementsByTag("li"); //找出所有li标签
for (Element e : elements) {
String lingherf = e.attr("data-field");
if (lingherf.contains("\"reply_num\":0")) { //如果恢复数量是0的话，取出tid

final String tid = lingherf.substring(
lingherf.indexOf("\"id\":") + 5,
lingherf.indexOf(",\"reply_num"));
new GetHtmlThread(new MyCallBack() { //开启线程抢二楼

@Override
public void nextStep(String html) {
// TODO Auto-generated method stub
System.out.println(html);

}

}, "http://tieba.baidu.com/p/" + tid, cookieStore,tid).start();

}
}

然后在访问帖子详细页面：

DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet(URL); //这个url其实是上面开启线程传进来的那个url
HttpContext context = new BasicHttpContext();
context.setAttribute(ClientContext.COOKIE_STORE, store); //为了消除cookie的影响，一路保存cookie
try {
HttpResponse response = client.execute(get, context);
html = EntityUtils.toString(response.getEntity(), "utf-8");

// 取出我们所需要的参数，不会正则表达式哈，见笑。
String fid = html.substring(html.indexOf("fid:'"),
html.indexOf("fid:'") + 20);
fid = fid.substring(fid.indexOf("'") + 1, fid.lastIndexOf("'"));
String floor_num = html.substring(html.indexOf("floor_num:\""),
html.indexOf("floor_num:\"") + 20);
floor_num = floor_num.substring(floor_num.indexOf("\"") + 1,
floor_num.lastIndexOf("\""));
String rich_text = html.substring(html.indexOf("rich_text:'"),
html.indexOf("rich_text:'") + 20);
rich_text = rich_text.substring(rich_text.indexOf("'") + 1,
rich_text.lastIndexOf("'"));
String tbs = html.substring(html.indexOf("'tbs' : \""),
html.indexOf("'tbs' : \"") + 40);
tbs = tbs.substring(tbs.indexOf("\"") + 1, tbs.lastIndexOf("\""));
ArrayList<NameValuePair> para = new ArrayList<>();
para.add(new BasicNameValuePair("ie", "utf-8"));
para.add(new BasicNameValuePair("kw", "贝爷"));
para.add(new BasicNameValuePair("fid", fid));
para.add(new BasicNameValuePair("tid", tid));
para.add(new BasicNameValuePair("floor_num", floor_num));
para.add(new BasicNameValuePair("rich_text", rich_text));
para.add(new BasicNameValuePair("tbs", tbs));
para.add(new BasicNameValuePair("content", "我就路过抢个二楼，混个脸熟。"));

//鼠标的参数可以照着原来的参数反编码后原样传回去
para.add(new BasicNameValuePair(
"mouse_pwd",
"125,118,125,99,125,120,125,122,70,126,99,127,99,126,99,127,99,126,99,127,99,126,99,127,99,126,99,127,70,124,126,123,120,124,70,126,124,121,121,99,120,121,119,13885747519790"));
para.add(new BasicNameValuePair("mouse_pwd_t", "1388574751979"));
para.add(new BasicNameValuePair("vcode_md5", ""));
para.add(new BasicNameValuePair("flies", "[]"));
para.add(new BasicNameValuePair("mouse_pwd_isclick", "0"));
para.add(new BasicNameValuePair("__type__", "reply"));

// TODO Auto-generated method stub
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
DefaultHttpClient client1 = new DefaultHttpClient();
HttpEntity postBodyEnt;

postBodyEnt = new UrlEncodedFormEntity(para, "utf-8");
post.setEntity(postBodyEnt);
System.out.println(EntityUtils.toString(postBodyEnt));
HttpContext context1 = new BasicHttpContext();
context1.setAttribute(ClientContext.COOKIE_STORE, store);
HttpResponse response1 = client1.execute(post, context1);