【爬虫练手】下厨房

最新推荐文章于 2024-05-03 22:55:13 发布

於陵樺暉

最新推荐文章于 2024-05-03 22:55:13 发布

阅读量136

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/wyh33200/article/details/104737833

版权

爬虫专栏收录该内容

10 篇文章 0 订阅

订阅专栏

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
list_g = ['菜名','食材','步骤','详细步骤']
ws.append(list_g)


headers= {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}

for page in range(1,41):
    url = 'http://www.xiachufang.com/explore/?page={}'.format(page)
    response = requests.get(url,headers=headers)
    soup = BeautifulSoup(response.text,'html.parser')
    list = soup.find(class_='normal-recipe-list')
    dishes = list.find_all('li')
    for dish in dishes:
        p = 1
        title = dish.find('p',class_='name')        #菜名
        ingredients = dish.find('p',class_='ing ellipsis')      #食材
        href = dish.find('a')['href']       #获取步骤页面
        url_new = 'http://www.xiachufang.com{}'.format(href)
        response_step = requests.get(url_new,headers=headers)
        soup_step = BeautifulSoup(response_step.text,'html.parser')
        steps = soup_step.find_all(itemprop='recipeInstructions')
        for step in steps:
            #   print(str(p)+step.text)
            name = [title.text.replace('\n', '').replace('\r', '').replace(' ',''), ingredients.text.replace('\n', '').replace('\r', '').replace(' ',''), p, step.text.replace('\n', '').replace('\r', '').replace(' ','')]
            ws.append(name)
            p+=1
            print('写入{title}中，步骤{p}'.format(title = title.text.replace('\n', '').replace('\r', '').replace(' ','') ,p=p))
wb.save('热门菜谱.xlsx')

爬虫练手，顺带熟悉一下openpyxl的使用方法。
下次争取再加入os模块，把项目完完整整的爬下来。一个菜谱文件夹，有两个文件，一个图片文件命名为步骤，一个txt的文档放入食材。
最好能把整个代码拆分成模块化。

於陵樺暉

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【爬虫练手】下厨房

import requestsfrom bs4 import BeautifulSoupfrom openpyxl import Workbookwb = Workbook()ws = wb.activelist_g = ['菜名','食材','步骤','详细步骤']ws.append(list_g)headers= { 'User-Agent': 'Mozilla/5....
复制链接

扫一扫

专栏目录