python 爬虫+twilio模块实现英语文章推送（学习笔记）

最新推荐文章于 2023-02-02 10:00:00 发布

我啊zbs

最新推荐文章于 2023-02-02 10:00:00 发布

阅读量526

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/qq_45688322/article/details/104522175

版权

笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

使用情景：为实现提升英语阅读能力………………

分析：由于某些英语网站每天会更新英语的文章或是新闻，通过爬虫获取文章的url以及内容，再通过twilio平台发送至手机中。

先去www.twilio.com 注册账号并创建项目，获取account_sid和auth_token以及选择发送短信的号码。
注：待接收短信的号码需要通过短信验证。
```
 twilio模块文档地址：https://www.twilio.com/docs/libraries/python
```

需要准备的模块

import requests
from twilio.rest import Client
import time
from bs4 import BeautifulSoup
import re

创建一个对象并填入account_sid和auth_token等信息

class Send() :
    def __init__(self) :
        self.account_sid = "AC5a8484XXXXXXXXXXXXXXXX772910a50740"
        self.auth_token = "cccfXXXXXXXXXXXXXXXXa81"
        self.headers = {
            "user-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
        }

创建获取文章的英语文章网站的链接，这里选用的是 https://www.newsinlevels.com/

 def get_wurl(self) :
        wurl = "https://www.newsinlevels.com/"
        # q = time.time()
        boo = True
        while boo == True :
            try :
                wres = requests.get(wurl,headers=self.headers,timeout = 10)  #由于是外网，设置超时时间防止一直等待加载
                return wres.text
                boo = False
            except :
                print('连接超时')
                boo = True

在首页先获取文章的url

 def get_title(self) :
        soup = BeautifulSoup(self.get_wurl(),'html.parser')
        title = soup.find_all("div",class_="title")
        for title1 in title :
            aherf = title1.find_all('a')[0].attrs['href']
            aherf = re.sub(r'-1','-2',aherf)     #首页的url只想的是水平1级，这里我选择了2级
            break                #这里每天我只选择了一篇文章进行阅读
        return aherf

获取文章的页面：

def get_article(self) :
        aurl = self.get_title()
        # print(aurl)
        bl = True
        while bl == True :
            try :
                article_res = requests.get(aurl,headers=self.headers,timeout=10)
                bl = False
            except :
                print('连接超时2')
                bl = True
        return article_res.text

获取文章的标题以及内容（这里使用了beautifulsoup来进行解析和数据提取）：

def get_articleinfo(self) :
        sp = BeautifulSoup(self.get_article(),'html.parser')
        article_title = sp.select('.article-title')[0].get_text()
        article_body = sp.select('#nContent')[0].get_text()
        article = article_title + '\n' + article_body
        article = re.sub(r'You can watch the original video in the Level 3 section.','',article)  #也可用字符串的replace方法... 
        return article

最后通过twilio模块进行短信的发送：

def sendmsg(self) :
        client = Client(self.account_sid,self.auth_token)
        message = client.messages.create(
            to = "+8613XXXXXXXXX",    #待接收短信的号码（自己）
            from_="+172035XXXXXX",    #从twilio平台获取的发送短信的号码
            body = self.get_articleinfo() #所获取到的文章的内容
        )

最后通过函数的调用

if __name__ == "__main__":
    try :
        ms = Send()
        ms.sendmsg()
        print(time.strftime('%Y%m%d%H%M%S',time.localtime(time.time())))
        print("发送成功") 
    except :
        print(time.strftime('%Y%m%d%H%M%S',time.localtime(time.time())))
        print("发送失败")

总代码：

import requests
from twilio.rest import Client
import time
from bs4 import BeautifulSoup
import re

class Send() :
    def __init__(self) :
        self.account_sid = "AC5a8484XXXXXXXXXXXXXXXX772910a50740"
        self.auth_token = "cccfXXXXXXXXXXXXXXXXa81"
        self.headers = {
            "user-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
        }
    
    def get_wurl(self) :
        wurl = "https://www.newsinlevels.com/"
        boo = True
        while boo == True :
            try :
                wres = requests.get(wurl,headers=self.headers,timeout = 10)
                return wres.text
                boo = False
            except :
                print('连接超时')
                boo = True

    def get_title(self) :
        soup = BeautifulSoup(self.get_wurl(),'html.parser')
        title = soup.find_all("div",class_="title")
        for title1 in title :
            aherf = title1.find_all('a')[0].attrs['href']
            aherf = re.sub(r'-1','-2',aherf)
            break
        return aherf

    def get_article(self) :
        aurl = self.get_title()
        # print(aurl)
        bl = True
        while bl == True :
            try :
                article_res = requests.get(aurl,headers=self.headers,timeout=10)
                # print(article_res.text)
                bl = False
            except :
                print('连接超时2')
                bl = True
        return article_res.text
    
    def get_articleinfo(self) :
        sp = BeautifulSoup(self.get_article(),'html.parser')
        article_title = sp.select('.article-title')[0].get_text()
        article_body = sp.select('#nContent')[0].get_text()
        article = article_title + '\n' + article_body
        article = re.sub(r'You can watch the original video in the Level 3 section.','',article)
        return article

    def sendmsg(self) :
        client = Client(self.account_sid,self.auth_token)
        message = client.messages.create(
            to = "+8613########",
            from_="+172#######",
            body = self.get_articleinfo()
        )

if __name__ == "__main__":
    try :
        ms = Send()
        ms.sendmsg()
        print(time.strftime('%Y%m%d%H%M%S',time.localtime(time.time())))
        print("发送成功") 
    except :
        print(time.strftime('%Y%m%d%H%M%S',time.localtime(time.time())))
        print("发送失败")

然后将代码挂载至服务器上使用MobaXterm上传即可

定时运行程序需安装crontab……

    yum install vixie-cron
    yum install crontabs
    service crond start
    service crond stop
    service crond status

添加定时任务 vim /etc/crontab

0 6 * * * root /usr/bin/python3.5 /py/Letters.py > /py/run.log
#我设置的是每天早上六点

注：服务器也要安装所需要的模块

import requests
from twilio.rest import Client
import time
from bs4 import BeautifulSoup
import re

以下是示例图……

在这里插入图片描述

大家也可以用这个平台推送一些别的有意思的东西！

本人刚开始学习Python，许多地方可能不太规范，勿喷……

我啊zbs

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫+twilio模块实现英语文章推送（学习笔记）

使用情景：为实现提升英语阅读能力………………分析：由于某些英语网站每天会更新英语的文章或是新闻，通过爬虫获取文章的url以及内容，再通过twilio平台发送至手机中。先去www.twilio.com 注册账号并创建项目，获取account_sid和auth_token以及选择发送短信的号码。注：待接收短信的号码需要通过短信验证。需要准备的模块import requestsfro...
复制链接

扫一扫