python中网页数据输出到csv操作方法

最新推荐文章于 2022-05-14 17:30:47 发布

品尚公益团队

最新推荐文章于 2022-05-14 17:30:47 发布

阅读量417

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/u010719791/article/details/120158390

版权

python 专栏收录该内容

35 篇文章 6 订阅

订阅专栏

该博客展示了如何使用Python进行网页数据抓取，利用BeautifulSoup解析HTML，提取职位信息（包括职位名称、薪资和要求），并将数据存储到CSV和JSON文件中。代码实现了遍历HTML列表，找到特定类名的元素，获取其文本内容，并进行异常处理，确保数据的完整性和一致性。最后，将CSV和JSON文件进行追加写入。

摘要由CSDN通过智能技术生成

完整代码：

import json
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
with open('web.html','r',encoding='gbk') as f:
    html=BeautifulSoup(f,'html.parser')
    html.list=html.find_all('div',attrs={'class':'e'})
    # print(html)
    # print(len(html.list))
    # print(html.list)
    job=[]
    for i,item in enumerate(html.list):
        try:
            print(item.find_all('span',attrs={'class':'jname at'})[0].text)
            print(item.find_all('span', attrs={'class': 'sal'})[0].text)
            print(item.find_all('span', attrs={'class': 'd at'})[0].text)
            # print(i)
            # print(item.find('span',attrs={'class':'jname'}).text)
            job.append({
                'jobname':item.find_all('span',attrs={'class':'jname at'})[0].text,
                'jobincome':item.find_all('span', attrs={'class': 'sal'})[0].text,
                'jobrequire':item.find_all('span', attrs={'class': 'd at'})[0].text,
            })
            #写入csv
            df=pd.DataFrame()
            df['jobname']=item.find_all('span',attrs={'class':'jname at'})[0].text,
            df['jobincome']=item.find_all('span', attrs={'class': 'sal'})[0].text,
            df['jobrequire']=item.find_all('span', attrs={'class': 'd at'})[0].text,
        except:
            continue
        # header = ['jobname', 'jobincome', 'jobrequire']
        df.to_csv('webT.csv',mode='a',header=None,index=None,encoding='utf-8-sig')#写入csv，mode=a+表示追加

    with open('web.json','w',encoding='utf-8') as f:
        json.dump(job,f,indent=1,ensure_ascii=False)
#写入csv
with open('web.json','r',encoding='utf-8') as f:
    data=json.load(f)

    with open('web.csv',mode='a',encoding='utf-8-sig',newline='') as f:
        writer=csv.writer(f)
        # header = ['jobname', 'jobincome', 'jobrequire']
        # writer.writerow(header)
        for item in data:
            writer.writerow([item['jobname'], item['jobincome'], item['jobrequire']])

        # for item in range(len(data)):
        #         writer.writerows(item)

        # writer.writerows(data)
        f.close()
    f.close()