java企查查爬_爬取企查查热搜

认真的赵先森

于 2021-02-26 16:18:00 发布

阅读量690

点赞数

文章标签： java企查查爬

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35612315/article/details/114741418

版权

由于是第一次写作可能代码风格比较丑而且语言表达不好,各位看官请见谅.

下面进入正题临时接到一个任务爬取企查查的网络热词,并且要定时更新. 下面是要爬取的网页内容.

6fd39c5f1e95

image

之前有写过这个页面的解析代码,但是事件过的太久已经找不到了.有点难受,不过这个页面没有反爬.话不多说直接上代码

url ='https://www.qichacha.com/cms_topsearch'

ht = requests.get(url=url,headers=headers)

et = etree.HTML(ht.text)

uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')

# jinri热搜

for ulin uls[:51]:

type_ ='今日热搜'

search_num = ul.xpath('./span[last()]/text()')[0]

company = ul.xpath('./span[last()-1]/text()')[0]

company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]

date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))

print(company, search_num, company_url, date)

cursor = conn.cursor()

sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (

type_, company, search_num, company_url, date)

cursor.execute(sql)

conn.commit()

uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')

# 一周热搜

for ulin uls[51:101]:

type_ ='一周热搜'

search_num = ul.xpath('./span[last()]/text()')[0]

company = ul.xpath('./span[last()-1]/text()')[0]

company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]

date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))

print(company, search_num, company_url, date)

cursor = conn.cursor()

sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (

type_, company, search_num, company_url, date)

cursor.execute(sql)

conn.commit()

uls = et.xpath('//ul[@class="list-group topsearch-list"][1]/a')

# 一月热搜

for ulin uls[101:]:

type_ ='一月热搜'

search_num = ul.xpath('./span[last()]/text()')[0]

company = ul.xpath('./span[last()-1]/text()')[0]

company_url ='https://www.qichacha.com' + ul.xpath('./@href')[0]

date =str(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))

print(company, search_num, company_url, date)

cursor = conn.cursor()

sql ='insert into top_search(type_,company,search_num,company_url,sj_time) values(%r,%r,%r,%r,%r)' % (

type_, company, search_num, company_url, date)

cursor.execute(sql)

conn.commit()

页面解析比较简单,毕竟新手熟悉下流程

然后就是改成定时任务,我用的是python内置库 schedule

schedule.every(1).minutes.do(job)

schedule.every().hour.do(job)

schedule.every().day.at("10:30").do(job)

schedule.every(5).to(10).days.do(job)

schedule.every().monday.do(job)

schedule.every().wednesday.at("13:15").do(job)

每隔1分钟执行一次任务

每隔一小时执行一次任务

每天的10:30执行一次任务

每隔5到10天执行一次任务

每周一的这个时候执行一次任务

每周三13:15执行一次任务

def seach():

schedule.every(20).seconds.do(qcc_reci)

while True:

schedule.run_pending()

time.sleep(1)

seach()

run_pending：运行所有可以运行的任务

第一次写简书,很多格式不会用.....

认真的赵先森

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java企查查爬_爬取企查查热搜

由于是第一次写作可能代码风格比较丑而且语言表达不好,各位看官请见谅.下面进入正题临时接到一个任务爬取企查查的网络热词,并且要定时更新. 下面是要爬取的网页内容.image之前有写过这个页面的解析代码,但是事件过的太久已经找不到了.有点难受,不过这个页面没有反爬.话不多说直接上代码url ='https://www.qichacha.com/cms_topsearch'ht = requests.g...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。