集体智慧编程 - 发现群组 - 代码更改

最新推荐文章于 2019-07-02 21:03:04 发布

wuchenhaaa

最新推荐文章于 2019-07-02 21:03:04 发布

阅读量309

点赞数

本文链接：https://blog.csdn.net/wuchenhaaa/article/details/80547806

版权

P31 第一段代码报错说没有 title ，暂时不知道什么原因，但尝试更改：

def getwordcounts(url):
    d = feedparser.parse(url)
    wc = {}
    title = {}
    for e in d.entries:
        if 'summary' in e:
            summary = e.summary
        else:
            summary = e.description

        words = getwords(e.title+''+summary)
        for word in words:
            wc.setdefault(word, 0)
            wc[word] += 1
        feed = d.feed
        title = feed.title
    return title, wc

P32 第一段代码中出现关键字 file ，但是 Python3 已弃用，尝试更改如下：

apcount = {}
wordcounts = {}
feedlist = [line for line in open('feedlist.txt')]
for feedurl in feedlist:
    title, wc = getwordcounts(feedurl)
    print(title)# 显示进度
    if title == {}:# 有的URL已经失效，将之跳过，否则将报错
        continue
    wordcounts[title] = wc
    for word, count in wc.items():
        apcount.setdefault(word, 0)
        if count > 1:
            apcount[word] += 1

不想麻烦的话可以直接使用官方提供的 blogdata.txt 文件

由于书中的例子均为英文版，有网站无法访问，有的可以访问但是分析聚类结果时很难一眼看出效果，因此本文选择了一些中文网站的 RSS 作为分析的初始 url 。