python抓取网页内容401应该用哪个库_401使用curl或python抓取网站时出现未经授权的错误...

我对linuxcurl和python还不熟悉,我曾试图从一个网站上获取内容,但由于某些原因,我得到了

“401 Unauthorized”错误。我可以在浏览器中打开它,但我不能从python打开/删除它的内容

或者卷曲我做错什么了?

我检查了我的网址,用户名和密码都是正确的,但我不明白这个问题。

请帮帮我Python 2.7.8

curl 7.37.1

ubuntu@ubuntu:~/pythonSample$ curl -v -u kumar 'https://abc.def.co.in'

Enter host password for user 'kumar':

* Rebuilt URL to: https://abc.def.co.in/

* Hostname was NOT found in DNS cache

* Trying xxx.xxx.xxx.xx...

* Connected to abc.def.co.in (xxx.xxx.xxx.xx) port 443 (#0)

* successfully set certificate verify locations:

* CAfile: none

CApath: /etc/ssl/certs

* SSLv3, TLS handshake, Client hello (1):

* SSLv3, TLS handshake, Server hello (2):

* SSLv3, TLS handshake, CERT (11):

* SSLv3, TLS handshake, Server key exchange (12):

* SSLv3, TLS handshake, Server finished (14):

* SSLv3, TLS handshake, Client key exchange (16):

* SSLv3, TLS change cipher, Client hello (1):

* SSLv3, TLS handshake, Finished (20):

* SSLv3, TLS change cipher, Client hello (1):

* SSLv3, TLS handshake, Finished (20):

* SSL connection using TLSv1.2 /

* Server certificate:

* subject: OU=Domain Control Validated; CN=shareapp.def.co.in

* start date: 2016-01-15 14:07:38 GMT

* expire date: 2017-01-15 13:52:39 GMT

* subjectAltName: abc.def.co.in matched

* issuer: C=; ST=; L=; O=; OU=; CN=Go Daddy Secure Certificate Authority - G2

* SSL certificate verify ok.

* Server auth using Basic with user 'kumar'

> GET / HTTP/1.1

> Authorization: Basic a3VtYXI6UmFodWwxMjM=

> User-Agent: curl/7.37.1

> Host: abc.def.co.in

> Accept: */*

>

< HTTP/1.1 401 Unauthorized

< Cache-Control: private

< Content-Length: 16

< Content-Type: text/plain; charset=utf-8

< SPRequestGuid:

< request-id:

< X-FRAME-OPTIONS: SAMEORIGIN

< SPRequestDuration: 5

< SPIisLatency: 9

< X-AspNet-Version: 4.0.30319

< X-Powered-By: ASP.NET

< WWW-Authenticate: NTLM

< WWW-Authenticate: Negotiate

< X-Content-Type-Options: nosniff

< X-MS-InvokeApp: 1; RequireReadOnly

< MicrosoftSharePointTeamServices: xx.x.x.xxx

< Date: Mon, 25 Apr 2016 19:10:11 GMT

<

* Connection #0 to host abc.def.co.in left intact

401 UNAUTHORIZED

ubuntu@ubuntu:~/pythonSample$

docScraper.py

import requests

from BeautifulSoup import BeautifulSoup

username = 'kumar@def.co.in'

password = 'Rahul123'

url = 'https://abc.def.co.in/'

r = requests.get(url, auth=(username, password))

page = r.content

print page

ubuntu@ubuntu:~/pythonSample$ python docScraper.py

401 UNAUTHORIZED

ubuntu@ubuntu:~/pythonSample$

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值