爬虫笔记：爬取财经

最新推荐文章于 2021-11-17 16:23:46 发布

诺依曼.gu

最新推荐文章于 2021-11-17 16:23:46 发布

阅读量192

点赞数 1

文章标签： python 爬虫 html

本文链接：https://blog.csdn.net/qq_52696994/article/details/117467967

版权

小白学习爬虫：爬取财经
放码过来

import requests
from bs4 import BeautifulSoup
url='https://finance.sina.com.cn/'
html=requests.get(url)
html.encoding='utf8'
soup=BeautifulSoup(html.text,'lxml')
lis=soup.select('.m-p1-m-blk2 .m-p1-mb2-list.m-list-container ul li a ')
for li in lis:
    title=li.text
    innerUrl=li['href']
    if innerUrl.endswith('shtml') and len(title)>3:
        print(title,innerUrl)
        html = requests.get(innerUrl)
        html.encoding='utf8'
        soup = BeautifulSoup(html.text, 'lxml')
        result = soup.select('.article p')
        res = ''
        for r in result:
            res += r.text
        print('新闻内容', res)
        with open('caijing.txt','a',encoding='utf8')as f:
            f.write(res+'\n')

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

诺依曼.gu

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬虫笔记：爬取财经

小白学习爬虫：爬取财经放码过来import requestsfrom bs4 import BeautifulSoupurl='https://finance.sina.com.cn/'html=requests.get(url)html.encoding='utf8'soup=BeautifulSoup(html.text,'lxml')lis=soup.select('.m-p1-m-blk2 .m-p1-mb2-list.m-list-container ul li a ')for
复制链接

扫一扫