python爬取多个网页,使用BeautifulSoup在python中抓取多个页面

该博客介绍了如何使用Python的BeautifulSoup库从特定网页抓取数据,并通过循环遍历多页来收集所有信息。作者首先成功获取了第一页的数据,然后在代码中添加了一个循环,通过在URL中传递页面参数来抓取后续页面。最后,数据被写入CSV文件,包括物种名称、物种作者、状态和科别等字段。
摘要由CSDN通过智能技术生成

I have managed to write code to scrape data from the first page and now the I am stuck with writing a loop in this code to scrape the next 'n' pages. Below is the code

I would appreciate if someone could guide/help me to write the code that would scrape the data from remaining pages.

Thanks!

from bs4 import BeautifulSoup

import requests

import csv

url = requests.get('https://wsc.nmbe.ch/search?sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s').text

soup = BeautifulSoup(url, 'lxml')

elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")

#print(elements)

csv_file = open('wsc_scrape.csv', 'w')

csv_writer = csv.writer(csv_file)

csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])

for element in elements:

sp_name = element.i.text.strip()

print(sp_name)

status = element.find('span', class_ = ['success label', 'error label']).text.strip()

print(status)

author_family = element.i.next_sibling.strip().split('|')

species_author = author_family[0].strip()

family = author_family[1].strip()

print(species_author)

print(family)

print()

csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

解决方案

You have to pass page= parameter in URL and iterate over all pages:

from bs4 import BeautifulSoup

import requests

import csv

csv_file = open('wsc_scrape.csv', 'w', encoding='utf-8')

csv_writer = csv.writer(csv_file)

csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])

for i in range(151):

url = requests.get('https://wsc.nmbe.ch/search?page={}&sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s'.format(i+1)).text

soup = BeautifulSoup(url, 'lxml')

elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")

for element in elements:

sp_name = element.i.text.strip()

print(sp_name)

status = element.find('span', class_ = ['success label', 'error label']).text.strip()

print(status)

author_family = element.i.next_sibling.strip().split('|')

species_author = author_family[0].strip()

family = author_family[1].strip()

print(species_author)

print(family)

print()

csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值