关于python批量采集搜索的url

lebron2016

于 2021-04-20 21:17:59 发布

阅读量779

点赞数

分类专栏： python爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_44137529/article/details/115919321

版权

python爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

关于python批量采集搜索的url

第一次写csdn，记录一下小白学习的过程。

爬取url分三步

1、分析网络请求，通过python模拟网络请求
2、分析网页源码，找到想要采集的信息
3、保存采集的信息

import requests
from bs4 import BeautifulSoup

def search(keyword):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'
    }
    html = requests.get('https://www.google.co.kr/search?q={}&num=50&sourceid=chrome&ie=UTF-8'.format(keyword),headers=headers).text
    soup = BeautifulSoup(html, 'html.parser')
    f = open('news.txt','w')
    for i in soup.find_all('div',attrs={'class': 'yuRUbf'}):
        j = (i.find('a', href = True) ['href'][7:])
        f.write("http://"+j+'\n')
	print("已完成采集")
search('inurl:/index/login/login')

运行结果

在这里插入图片描述
然后查看保存的结果

lebron2016

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
4
评论
关于python批量采集搜索的url

关于python批量采集搜索的url第一次写csdn，记录一下小白学习的过程。爬取url分三步1、分析网络请求，通过python模拟网络请求2、分析网页源码，找到想要采集的信息3、保存采集的信息import requestsfrom bs4 import BeautifulSoupdef search(keyword): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) Apple
复制链接

扫一扫