python反反爬虫_做爬虫,如何避免被反爬虫?

爬数据,被反爬虫了,这种情况有什么好的解决方法,以淘宝为例:

#-*_coding:utf-8-*-

import requests

import re

from xlwt import *

import time

reload(__import__('sys')).setdefaultencoding('utf-8')#打印为中文

'''

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#构造头部

cook ={"Cookie":''}#cookie有改动,大家用自己的cookie就好

url="https://www.taobao.com/market/nvzhuang/dress.php?spm=a21bo.7723600.8224.2.nPpHHT"

#html=requests.get(url).content

html=requests.get(url,cookies=cook,headers=headers).content#get中提交url,COOKIE,

print html

'''

def getHtml(url):

proxylist = (

'123.134.185.11',

'115.228.107.142',

'180.76.135.145',

'58.218.198.61',

'110.72.43.148',

)

for proxy in proxylist:

proxies = {'': proxy}

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#构造头部

cook ={''}#cookie有改动,大家用自己的cookie就好

#html=requests.get(url).content

html=requests.get(url,cookies=cook,headers=headers,proxies=proxies).text

return html

def changeurl(start_url,page): #传参数(开始的url,页数)

urls=[]

for i in range(1,page+1):

url=start_url+str(i)

urls.append(url)

return urls

start_url="https://list.tmall.com/search_product.htm?type=pc&totalPage=100&cat=50025135&sort=d&style=g&from=sn_1_cat-qp&active=1&jumpto="

urls=changeurl(start_url,2)

wb = Workbook()

ws = wb.add_sheet('sheet1')

ws.write(0, 0, 'pid')

ws.write(0, 1, 'price')

ws.write(0, 2, 'title')

ws.write(0, 3, 'url')

index = 1

for url in urls:

html=getHtml(url)

time.sleep(1)

reForProduct = re.compile('

\s+ [\s\S]+?

\s+\s+')

products = reForProduct.findall(html)

for pro in products:

for (pid, price, url, title) in products:

text= ("%s\t%s\t%s\t%s") % (pid, price, title, url)

print text

ws.write(index, 0, pid)

ws.write(index, 1, price)

ws.write(index, 2, title)

ws.write(index, 3, url)

index += 1

wb.save('result.xls')

我真的吃相也没很难看,特么数据爬一半就断了。怎么改代码才能不容易被封杀。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值