自动爬取每日天气、每日微博热搜、每日外文网数据(附带自动翻译)数据,并以文本形式邮件发送

自动爬取每日天气、每日微博热搜、每日外文网数据(附带自动翻译)数据,并以文本形式邮件发送

项目特点:

  1. 自动爬取中国天气网指定城市的一周天气;
  2. 自动爬取每日微博热搜的热搜标题以及链接;
  3. 自动爬取外文网http://conflictoflaws.net/的每日推荐标题及链接,并同时将标题处理为中英对照,翻译引擎为有道在线翻译
  4. 将所爬取的所有数据整合为文本形式分别保存到本地并以邮件发送到目标邮箱。

具体实现代码如下:

1. 主体逻辑代码如下:

def get_text():
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3722.400 QQBrowser/10.5.3738.400'
    }
    url = {
        'weibo': 'https://s.weibo.com/top/summary?cate=realtimehot',
        'tianqi': 'http://www.weather.com.cn',
        'law': 'http://conflictoflaws.net',
    }

    weibo  = get_weibo(url, headers)
    tianqi = get_tianqi(url, headers)
    laws   = get_law(url, headers)
    for item in laws:
        law = item
    
    text = oprate(weibo, tianqi, law)
    return text

def main():
    print('|============正在搜集数据===========|')
    text = get_text()
    print('|======搜索完成,正在更新旧数据=====|')
    os.remove('text.txt')
    time.sleep(3)
    for each in text:
        with open('text.txt', 'a+', encoding= 'utf-8') as f:
            f.write(each + '\n')
    
    print('|==============准备发送=============|')
    with open('text.txt', 'r', encoding='utf-8') as f:
        string = f.read()
        time.sleep(5)

    try_max = 1
    while try_max < 6:
        try:
            from_addr = 'xxxx@126.com'
            password = 'xxxx'
            to_addr = ['xxxx@qq.com', 'xxxx@126.com', 'xxxx@qq.com']
            smtp_server = 'smtp.126.com'

            message = MIMEText(string, 'plain', 'utf-8')
            message['From'] = 'xxxx <xxxx@126.com>'
            message['To'] = 'Little Pig <SuperUser@qq.com>'
            message['Subject'] = Header(u'阿光每日小报', 'utf-8').encode()

            server = smtplib.SMTP(smtp_server, 25)
            server.set_debuglevel(1)
            server.login(from_addr, password)
            server.sendmail(from_addr, to_addr, message.as_string())
            server.quit()
        except SMTPDataError:
            print('|====发送失败,正在尝试重发第%d次====|' % try_max)
            try_max += 1
            time.sleep(3)
        else:
            print('|===========邮件发送完成============|')
            time.sleep(5)
            break

if __name__ == '__main__':
    main()

2. 自动爬取中国天气网指定城市(兰州,长沙,南京,海南)的一周天气

def get_tianqi(url, headers):
    lanzhou_url = url.get('tianqi') + '/weather/101160101.shtml'
    changsha_url = url.get('tianqi') + '/weather/101250101.shtml'
    nanjing_url = url.get('tianqi') + '/weather/101190101.shtml'
    hainan_url = url.get('tianqi') + '/weather/101310101.shtml'
    url_pool = [lanzhou_url, changsha_url, nanjing_url, hainan_url]
    weathers = []
    for item in url_pool:
        weather = []
        html = requests.get(item, headers).content.decode('utf-8')
        soup = BeautifulSoup(html, 'html.parser')
        day_list = soup.find('ul', 't clearfix').find_all('li')
        for day in day_list:
            date = day.find('h1').get_text()
            wea = day.find('p',  'wea').get_text()
            if day.find('p', 'tem').find('span'):
                    hightem = day.find('p', 'tem').find('span').get_text()
            else:
                    hightem = ''
            lowtem = day.find('p', 'tem').find('
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值