天眼查反爬的曲线救国道路-爬取红盾网企业信息（Python爬虫实战）

最新推荐文章于 2024-06-07 09:46:02 发布

置顶非Fan的维森

最新推荐文章于 2024-06-07 09:46:02 发布

阅读量1.4k

点赞数 5

分类专栏： Python爬虫文章标签： python 爬虫

本文链接：https://blog.csdn.net/shaomingmin/article/details/106029119

版权

Python爬虫专栏收录该内容

6 篇文章 1 订阅

订阅专栏

先在这里给出红盾网抓取企业信息代码，有时间再去研究如果对天眼查进行企业信息抓取，后续更新…

import requests
import time
from lxml import etree

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3730.400 QQBrowser/10.5.3805.400'}

f=open("company_info.csv","a",encoding="utf-8")

def paser_detail(url):
    response = requests.get(url=url, headers=headers)
    time.sleep(1)
    items = etree.HTML(response.text)
    titles = items.xpath('//a[@class="name"]')
    for titl in titles:  # 地区
        title = titl.xpath('./text()')[0]
    lis = items.xpath('//*[@id="list-container"]/ul/li')
    for li in lis:
        mingcheng=li.xpath('./div/a/text()')[0]
        daima = li.xpath('./div/p[1]/a/span[1]/text()')[0]  # 代码
        person = li.xpath('./div/p[1]/a/span[2]/text()')[0]  # 法人
        address = li.xpath('./div/p[2]/a/span/text()')[0]  # 地址
        f.write(title+"\t"+mingcheng+"\t"+daima+"\t"+person+"\t"+address+"\n")
        data = [title, daima, person, address]
        collection = {
            '地区': title,
            '代码': daima,
            '法人': person,
            '地址': address
        }
        print(mingcheng,title, daima, person, address)
        
if __name__ == '__main__':
	for i in range(51,1000):#修改当前页
	    print("第"+str(i)+"页")
	    paser_detail("https://www.ubaike.cn/class_204/"+str(i)+".html")

非Fan的维森

关注

5
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
天眼查反爬的曲线救国道路-爬取红盾网企业信息（Python爬虫实战）

先在这里给出红盾网抓取企业信息代码，有时间再去研究如果对天眼查进行企业信息抓取，后续更新…import requestsimport timefrom lxml import etreeheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3730.400 QQB
复制链接

扫一扫

专栏目录