beautifulsoup怎么查找子元素?与抓取活动行页面数据

最新推荐文章于 2024-07-10 17:28:32 发布

伟大的python程序员

最新推荐文章于 2024-07-10 17:28:32 发布

阅读量4.5k

点赞数

分类专栏： python 文章标签： python beautifulsoup

本文链接：https://blog.csdn.net/zhuchuana/article/details/84564794

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

活动行界面：http://www.huodongxing.com/events?orderby=o&city=%E5%85%A8%E9%83%A8&page=1

import requests
import time
from bs4 import BeautifulSoup
page = 1
def getid():
    global page
    pages = requests.get('http://www.huodongxing.com/events?orderby=o&city=%E5%85%A8%E9%83%A8&page=' + str(page))
    page = int(page) + 1
    soup = BeautifulSoup(pages.text, 'html.parser')
    res = soup.find_all("div",class_="search-tab-content-item-mesh") #查找所有class
    for item in res: #循环res结果集
        txtlist = item.find('a')  #找到第一个a标签
        print(txtlist['href']) #打印a标签href值
    time.sleep(5)
getid()
###############################分割线################################################
import requests
import time
from bs4 import BeautifulSoup
import pymysql
page = 1
def getid():
    global page
    while True:
        pages = requests.get('http://www.huodongxing.com/events?orderby=o&city=%E5%85%A8%E9%83%A8&page=' + str(page))
        soup = BeautifulSoup(pages.text, 'html.parser')
        res = soup.find_all("div",class_="search-tab-content-item-mesh")
        i=0
        for item in res:
            txtlist = item.find('a')
            print('进行抓取第'+str(page)+'页,第'+str(i)+'个界面')
            page2 = requests.get('http://www.huodongxing.com'+str(txtlist['href']))
            soup2 = BeautifulSoup(page2.text, 'html.parser')
            #标题
            title = soup2.find('title').string
            #宣传图片
            images = soup2.find('div',class_="jumbotron media").find('img')['src']
            #简介
            summary = soup2.find('title').string
            #内容
            content = soup2.find('div', id="event_desc_page")
            i = i+1
            #数据入库
        page = int(page) + 1
getid()

伟大的python程序员

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
beautifulsoup怎么查找子元素?与抓取活动行页面数据

活动行界面：http://www.huodongxing.com/events?orderby=o&amp;city=%E5%85%A8%E9%83%A8&amp;page=1import requestsimport timefrom bs4 import BeautifulSouppage = 1def getid(): global page pages = r...
复制链接

扫一扫

专栏目录