山东大学暑期项目实训——农产品智能推荐平台——爬取农产品对应图片

最新推荐文章于 2024-09-16 07:18:09 发布

Q人生

最新推荐文章于 2024-09-16 07:18:09 发布

阅读量203

点赞数

文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_43845888/article/details/118415799

版权

本文介绍了如何使用Python结合selenium爬取农产品图片，首先从摄图网爬取，后因IP被封转向百度图片。通过遍历农产品名称，搜索并保存图片源，将图片链接存入数据库。

摘要由CSDN通过智能技术生成

爬取农产品数字数据对应图片的src并将其存储以备以后使用。数据中的农产品数据众多需先统计所有农产品的种类。

统计代码：

import csv
with open('payapa.csv','rt') as csvfile:
 reader = csv.DictReader(csvfile)
 all_vegetables = [row['名称'] for row in reader]

print(all_vegetables)
vegetables=[]

for vegetable in all_vegetables:
  if vegetable not in vegetables:
   vegetables.append(vegetable)

print(vegetables)

共有85种农产品，接下来将爬取对应的图片src。

爬取图片src

本次爬取采用python+selenium交互爬虫形式进行爬取。

小组先选定了摄图网，摄图网是专业的图片检索网站，内部内容十分强大。爬取时需要关闭可能弹出的弹窗和登入页面。

了摄图网，摄图网是专业的图片检索网站，内部内容十分强大。爬取时需要关闭可能弹出的弹窗和登入页面。

爬取代码：

   connect = pymysql.connect(host='localhost', user='root', password='112121', db='nongchanpin', port=3306)
    cursor = connect.cursor()
    print("连接数据库成功")
    jiage = pandas.read_csv('2019.csv', encoding='gbk')
    name = jiage['名称']
    namelist = []
    for i in range(len(name)):
        if name[i] in namelist:
            continue
        else:
            namelist.append(name[i])
    url = 'https://699pic.com/'#首页url
    srclist=[]
    driver.get(url)#请求首页面
    driver.maximize_window()
    driver.implicitly_wait(3)
    driver.find_element_by_xpath('// *[ @ id = "act95Close"]').click()
    sleep(1)
    driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div[1]/div[2]/form[1]/input').send_keys('1')
    driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div[2]').click()
    sleep(2)
    driver.find_element_by_xpath('// *[ @ id = "act95Close"]').click()
    windows = driver.window_handles
    print(windows)
    driver.switch_to.window(windows[-1])
    for i in rang