python的服务器、客户端被拒绝访问_Python中的Scraper提供了“访问被拒绝”

I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:

https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585

By now I use this code :

import bs4

import requests

def extract_source(url):

source=requests.get(url).text

return source

def extract_data(source):

soup=bs4.BeautifulSoup(source)

names=soup.findAll('title')

for i in names:

print i

extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))

But when I execute this code, it gives me an error:

Access Denied

What can I do to solve this?

解决方案

As was mentioned in comments, you need to specify allowable user-agent and pass it as headers:

def extract_source(url):

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}

source=requests.get(url, headers=headers).text

return source

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值