python从登录的系统抓取数据_使用python登录网页来抓取数据

最新推荐文章于 2024-05-27 08:52:59 发布

你想要的yao都有

最新推荐文章于 2024-05-27 08:52:59 发布

阅读量217

点赞数

文章标签： python从登录的系统抓取数据

本文链接：https://blog.csdn.net/weixin_33737110/article/details/113026698

版权

I am trying to build a webscraper to extract my stats data from MWO Mercs. To do so it is necessary to login to the page and then go through the 6 different stats pages to get the data (this will go into a data base later but that is not my question).

The login form is given below (from https://mwomercs.com/login?return=/profile/stats?type=mech)- from what I see there are two fields that need data EMAIL and PASSWORD and need to be posted. It should then open http://mwomercs.com/profile/stats?type=mech . After that I need have a session to cycle through the various stats pages.

I have tried using urllib, mechanize and requests but I have been totally unable to find the right answer - I would prefer to use requests.

I do realise that similar questions have been asked in stackoverflow but I have searched for a very long time with no success.

Thank you for any help that could be provided

LOGIN

MechWarrior Online REGISTER

Email Address:

Password:

[ Forgot Your Password? ]

解决方案

The Requests documentation is very simple and easy to follow when it comes to submitting form data. Please give this a read-through: More Complicated POST requests

Logins usually come down to saving the cookie and sending it with future requests.

After you POST to the login page with requests.post(), use the request object to retieve the cookies. This is one way to do it:

post_headers = {'content-type': 'application/x-www-form-urlencoded'}

payload = {'username':username, 'password':password}

login_request = requests.post(login_url, data=payload, headers=post_headers)

cookie_dict = login_request.cookies.get_dict()

stats_reqest = requests.get(stats_url, cookies=cookie_dict)

If you still have problems, check the return code from the request with login_request.status_code or the page content for an error with login_request.text

Edit:

Some sites will redirect you several times when you make a request. Make sure to check the request.history object to see what happened and why you got bounced out. For example, I get redirects like this all of the time:

>>> some_request.history

(, )

Each item in the history tuple is another request. You can inspect them like normal requests objects, such as request.history[0].url and you can disable the redirects by putting allow_redirects=False in your request parameters:

login_request = requests.post(login_url, data=payload, headers=post_headers, allow_redirects=False)

In some cases, I've had to disallow redirects and add new cookies before progressing to the proper page. Try using something like this to keep your existing cookies and add the new cookies to it:

cookie_dict = dict(cookie_dict.items() + new_request.cookies.get_dict().items())

Doing this after each request will keep your cookies up-to-date for your next request, similar to how your browser would.

你想要的yao都有

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python从登录的系统抓取数据_使用python登录网页来抓取数据

I am trying to build a webscraper to extract my stats data from MWO Mercs. To do so it is necessary to login to the page and then go through the 6 different stats pages to get the data (this will go i...
复制链接

扫一扫