python requests模块解析html_使用Python Requests模块的HTTP摘要/基本身份验证

最新推荐文章于 2022-10-24 18:03:47 发布

weixin_39629969

最新推荐文章于 2022-10-24 18:03:47 发布

阅读量81

点赞数

文章标签： python requests模块解析html

My goal here is to be able to parse html/xml data from a password protected page then based on that data (a timestamp) I need to send xml commands to another device. The page I am trying to access is a webserver generated by an IP device.

Also, if this would be easier to accomplish in another language please let me know.

I have very little experience programming (one C programming class)

I have tried using Requests for Basic and Digest Auth. I still can't get authenticated, which is stopping me from getting any further.

Here are my attempts:

import requests

from requests.auth import HTTPDigestAuth

url='http://myUsername:myPassword@example.com/cgi/metadata.cgi?template=html'

r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'))

r.status_code

print(r.headers)

print(r.status_code)

Output:

401

CaseInsensitiveDict({'Content-Length': '0', 'WWW-Authenticate': 'Digest realm="the realm of device", nonce="23cde09025c589f05f153b81306928c8", qop="auth"', 'Server': 'Device server name'})

I have also tried BasicAuth with Requests and get the same output. I have tried both including the user:pass@ within the url and not. Although when I put that input that into my browser it works.

I thought that requests handled header data for Digest/BasicAuth but maybe I need to include headers also?

I used Live HTTP Headers(firefox) and got this:

GET /cgi/metadata.cgi?template=html

HTTP/1.1

Host: [Device IP]

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language: en-US,en;q=0.5

Accept-Encoding: gzip, deflate DNT: 1 Connection: keep-alive

HTTP/1.1 401 Unauthorized WWW-Authenticate: Digest realm="Device Realm", nonce="a2333eec4cce86f78016343c48382d21",

qop="auth"

Server: Device Server Content-Length: 0

解决方案

The two requests are independent:

r = requests.get(url, auth=HTTPDigestAuth('user', 'pass'))

response = requests.get(url) #XXX

The second request does not send any credentials. Therefore it is not surprising that it receives 401 Unauthorized http response status.

To fix it:

Use the same url as you use in your browser. Drop digest-auth/auth/user/pass at the end. It is just an example in the requests docs

Print r.status_code instead of response.status_code to see whether it's succeeded.

Why would you use username/password in the url and in auth parameter? Drop username/password from the url. To see the request that is sent and the response headers, you could enable logging/debugging:

import logging

import requests

from requests.auth import HTTPDigestAuth

# these two lines enable debugging at httplib level (requests->urllib3->httplib)

# you will see the REQUEST, including HEADERS and DATA,

# and RESPONSE with HEADERS but without DATA.

# the only thing missing will be the response.body which is not logged.

try:

import httplib

except ImportError:

import http.client as httplib

httplib.HTTPConnection.debuglevel = 1

logging.basicConfig(level=logging.DEBUG) # you need to initialize logging,

# otherwise you will not see anything from requests

# make request

url = 'https://example.com/cgi/metadata.cgi?template=html'

r = requests.get(url, auth=HTTPDigestAuth('myUsername', 'myPassword'),

timeout=10)

print(r.status_code)

print(r.headers)

weixin_39629969

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python requests模块解析html_使用Python Requests模块的HTTP摘要/基本身份验证

My goal here is to be able to parse html/xml data from a password protected page then based on that data (a timestamp) I need to send xml commands to another device. The page I am trying to access is ...
复制链接

扫一扫