一、创建scrapy项目
创建完项目运行403.重写url地址,加头信息和cookie的值,ok!
cookie的值有百度的,有点评的,所以就找点评的带上!如下图:
font = TTFont(r"C:\Users\liangxue\Downloads/"+fonts)
字体转换需要下载包:fontTools
错误:Redirecting (302)
第一页数据获取 第二页被禁用了,浏览器访问也被禁用了
派代理服务器去:代理服务器列表自己去搞咯
proxy_list = [
{
"https": "IP:端口号"},
{
"https": "IP:端口号"},
{
"https": "IP:端口号"},
{
"https": "IP:端口号"},
{
"https": "IP:端口号"},
{
"https": "IP:端口号"},
]
PROXYES=settings.proxy_list
random_proxy = random.choice(PROXYES)
“Proxy-Authorization”:random_proxy,
关机重开一下就好了
话不多说,上代码:
settins.py加入cookie
cookies = {
"s_ViewType": "10",
" _lxsdk_cuid": "看你自己的cookie值",
" _lxsdk": "看你自己的cookie值",
" _hc.v": "看你自己的cookie值",
}
comments.py
# -*- coding: utf-8 -*-
import random
import scrapy
from fontTools.ttLib import TTFont
from lxml import html
from scrapy import Request
from comment import settings
from fake_useragent import UserAgent
from comment.items import CommentItem
user_agent = UserAgent().random
# 创建etree模块
etree = html.etree
class CommentsSpider(scrapy.Spider):
name = 'comments'
allowed_domains = ['www.dianping.com']
# start_urls = ['http://www.dianping.com/sanhe/ch10']
PROXYES=settings.proxy_list
random_proxy = random.choice(PROXYES)
# print("random_proxy=============",random_proxy)
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
<