python中向列表中添加字典时,出现前面的覆盖了后面的

最新推荐文章于 2022-12-05 16:04:46 发布

沉默隐者

最新推荐文章于 2022-12-05 16:04:46 发布

阅读量4k

点赞数 1

分类专栏： Python 文章标签： python 爬虫数据

本文链接：https://blog.csdn.net/qq_29648129/article/details/74989998

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

主要问题就是,,字典存入到列表中,其实是将一个引用存入了列表,,如果你每次存入列表中的字典都是在同一个引用下,这时将列表中的字典的引用取出,然后去找对应的值,,,,,如果引用相同,就会出现,遍历出来的都是最后一条数据的现象, 由于引用指向的地址的内容不断被覆盖

下面是我编写爬虫代码时遇到的问题,在调用new_data.getNewBody(new[‘url’])方法返回数据组成的字典时,最后遍历出现下面图片中的数据

#得到具体文章的内容类的对象
newsdata = [{},{}]   #这里只是模拟一个包含字典的列表
new_data = new_Data() #这是一个类
new_detil = []  #这里保存所有抓取的数据
for new in newsdata:
    #通过new_data对象调用它里面的方法,,并且将对应的url传入
    new_body = new_data.getNewBody(new['url'])
    print(new_body)#打印每条数据
    new_detil.append(new_body)
    print('ok')
for aa in new_detil:#遍历每条数据
    print(aa)

这里我截取了部分数据,,,可以看出上面打印的数据每一条都不一样,而下面却都是相同的

这里写图片描述
这里的数据都是相同的错误

这里附上 ,,new_Data类中的部分代码:

#-*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import pandas
import json

class new_Data(object):
    commentUrl = "http://comment5.news.sina.com.cn/page/infoversion=1&format=js&
channel=gn&newsid=comos-{}&group=&compress=0&ie=utf-8&oe=utf-8&page=1&
page_size=20"
    def __init__(self):
        #具体新闻的数据字典
        self.new_body = {}
    def getNewBody(self,url):
        '''
        本方法返回具体文章对应的主要内容,返回的是一个数据字典
        '''
        #new_body = {}
        res = requests.get(url)
        res.encoding = 'utf-8'
        soup = BeautifulSoup(res.text,'html.parser')

        artibodyTitle = soup.select('#artibodyTitle')[0].text
        timesouce = soup.select(".time-source")[0].contents[0].strip()
        dt = datetime.strptime(timesouce,"%Y年%m月%d日%H:%M")

        #soup.select('.time-source')[0].contents[1].text.strip()
        media_name = soup.select('.time-source span')[0].text.strip()

        #print soup.select("#artibody")[0].text
        article = []
        #使用[:-1]是规定循环执行到倒数第二个为止,这个目的是不想要最后一条数据
        for p in soup.select('#artibody p')[:-1]:
            article.append(p.text.strip())
        #将文章使用换行符连接在一起
        art_txt = '\n'.join(article)
        #print text

        editor = soup.select('.article-editor')[0].text.lstrip("责任编辑：")

        #获得评论数和评论内容
        commentCount,comments = self.getCommentsandCounts(url)
        #评论数
        comment = '\n'.join(comments)


        self.new_body["artibodyTitle"] = artibodyTitle
        self.new_body["datetime"] = dt
        self.new_body["media_name"] = media_name
        self.new_body["article"] = str(art_txt)
        self.new_body["editor"] = editor
        self.new_body["commentCount"] = commentCount
        self.new_body["comments"] = comment

        return self.new_body

主要问题就出现在下面的代码

     def __init__(self):
        #具体新闻的数据字典
        self.new_body = {}

    self.new_body["artibodyTitle"] = artibodyTitle
    self.new_body["datetime"] = dt
    self.new_body["media_name"] = media_name
    self.new_body["article"] = str(art_txt)
    self.new_body["editor"] = editor
    self.new_body["commentCount"] = commentCount
    self.new_body["comments"] = comment
    return self.new_body