如何使用newspaper智能解析网页?
安装
pip3 install newspaper3k
使用newspaper作为网页下载器,可以按照官网给出的例子使用
from newspaper import Article
url = ‘http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/’
article = Article(url)
article.download()
article.parse()
article.title
article.publish_date
article.text
scrapy + newspaper,直接把scrapy获取到的html使用newspaper解析
from newspaper import Article
from newspaper import Config
config = Config()
config.follow_meta_refresh = True
config.language = ‘zh’
article = Article(
‘’, config=config)
article.download(input_html=html)
article.parse()
article.title
article.publish_date
article.text