feedparser学习摘要

最新推荐文章于 2021-05-02 14:45:49 发布

iteye_16821

最新推荐文章于 2021-05-02 14:45:49 发布

阅读量106

点赞数

分类专栏： python 文章标签： Access XML HTML Blog

python 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

号称Universal Feed Parser，通吃所有合法不合法的RSS。先简单写下使用方式：
[code]
>>> import feedparser
>>> d = feedparser.parse('http://willzh.iteye.com/rss')
>>> d['feed']['title']
u"Will's Blog"
[/code]
另外这里有个使MoinMoin支持RSS的方法，有空再看了，先贴个地址备忘。
[url]http://moinmoin.wikiwikiweb.de/macro/FeedParser[/url]

feedparser的首页有个基本使用方法：
[code]
>>> import feedparser
>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
>>> d['feed']['title'] # feed data is a dictionary
u'Sample Feed'
>>> d.feed.title # get values attr-style or dict-style
u'Sample Feed'
>>> d.channel.title # use RSS or Atom terminology anywhere
u'Sample Feed'
>>> d.feed.link # resolves relative links
u'http://example.org/'
>>> d.feed.subtitle # parses escaped HTML
u'For documentation <em>only</em>'
>>> d.channel.description # RSS terminology works here too
u'For documentation <em>only</em>'
>>> len(d['entries']) # entries are a list
1
>>> d['entries'][0]['title'] # each entry is a dictionary
u'First entry title'
>>> d.entries[0].title # attr-style works here too
u'First entry title'
>>> d['items'][0].title # RSS terminology works here too
u'First entry title'
>>> e = d.entries[0]
>>> e.link # easy access to alternate link
u'http://example.org/entry/3'
>>> e.links[1].rel # full access to all Atom links
u'related'
>>> e.links[0].href # resolves relative links here too
u'http://example.org/entry/3'
>>> e.author_detail.name # author data is a dictionary
u'Mark Pilgrim'
>>> e.updated_parsed # parses all date formats
(2005, 11, 9, 11, 56, 34, 2, 313, 0)
>>> e.content[0].value # sanitizes dangerous HTML
u'<div>Watch out for <em>nasty tricks</em></div>'
>>> d.version # reports feed type and version
u'atom10'
>>> d.encoding # auto-detects character encoding
u'utf-8'
>>> d.headers.get('Content-type') # full access to all HTTP headers
u'application/xml'
[/code]