用python爬取新闻,如何使用Python每天一次抓取每日新闻？

最新推荐文章于 2024-04-28 11:35:41 发布

茜茜丁

最新推荐文章于 2024-04-28 11:35:41 发布

阅读量261

点赞数

文章标签：用python爬取新闻

I am trying to build an application for which I need daily news feed from several websites. One way to do this is by using BeautifulSoup library of Python. However this is good for pages which have their news on one static page.

Let's consider a site like http://www.techcrunch.com. They have only one their headlines and for more news you need to click on "Read more". For several other news websites, it is similar. How do I extract such information and dump it in a file- txt/.dmp or any other kind of file? What tool should I use? What approach should I take to implement this in Python?

I need this script to automatically download news from several websites ONCE EVERY SINGLE DAY and store it in a file with categories such as, heading, date, content, etc. I would be uploading this script on apache2 server. Any suggestions?

解决方案How do I extract such information and dump it in a file- txt/.dmp or any other kind of file? What tool should I use?

for more news you need to click on "Read more".

The tools you might leverage are Selenuim as its pure browser automation or iMacros.

Here is an example of leveraging Selenium in Python, server side.

Here is a post (and video) on data extraction using iMacros. Since you need it only once a day you might schedule to run it regulary in Win or Mac.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

茜茜丁

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python爬取新闻,如何使用Python每天一次抓取每日新闻？

I am trying to build an application for which I need daily news feed from several websites. One way to do this is by using BeautifulSoup library of Python. However this is good for pages which have th...
复制链接

扫一扫