爬虫案例:类和对象思维获取孔夫子旧书网信息

本文详细介绍了如何运用类和对象的概念,结合Python编程技能,实现对孔夫子旧书网站的数据抓取,旨在提供一种学习爬虫技术的方法。
摘要由CSDN通过智能技术生成

本案例只做学习,不做他用!

import os
import re
import requests
import random
from lxml import etree
from bs4 import BeautifulSoup
import threading


# html提取并存储类
class Get_html:

    def __init__(self):
        self.headers=self.get_headers()
        self.lst=[]

    #headers获取
    def get_headers(self):
        ua = [
            "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
            "Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)",
            "Baiduspider-image+(+http://www.baidu.com/search/spider.htm)",
            "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 YisouSpider/5.0 Safari/537.36",
            "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
            "Mozilla/5.0 (compatible; Googlebot-Image/1.0; +http://www.google.com/bot.html)",
            "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
            "Sogou News Spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
            "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);",
            "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
            "Sosospider+(+http://help.soso.com/webspider.htm)",
            "Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)"
        ]
        headers = {
   'User-Agent': random.choice(ua),'Cookie':'kfz_uuid=4f06a81d8a81f5b256c7a2e6a89bdab8; PHPSESSID=51qaqh9hvd8e7ni2opq9sebop3; shoppingCartSessionId=856640faec4cf06b5ee437ad09fac1a3; kfz-tid=fe09c868b427ce1d411020c95c147719; TINGYUN_DATA=%7B%22id%22%3A%22XMf0fX2k_0w%23nUhCMQN2SSk%22%2C%22n%22%3A%22WebAction%2FURI%2Fproduct%252Fbrowse%252Fpc%22%2C%22tid%22%3A%222d44bda41e0e86a%22%2C%22q%22%3A0%2C%22a%22%3A1732%7D; Hm_lvt_bca7840de7b518b3c5e6c6d73ca2662c=1638946047; Hm_lpvt_bca7840de7b518b3c5e6c6d73ca2662c=1638946047; Hm_lvt_33be6c04e0febc7531a1315c9594b136=1638946048; Hm_lpvt_33be6c04e0febc7531a1315c9594b136=1638946048; kfz_trace=4f06a81d8a81f5b256c7a2e6a89bdab8|0|16abc3df330022
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

徐浪老师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值