Python每日练习 07 一个HTML文件，找出里面的正文与链接

最新推荐文章于 2023-05-30 19:32:56 发布

fat_summer

最新推荐文章于 2023-05-30 19:32:56 发布

阅读量1.2k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/fat_summer/article/details/79132631

版权

Python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

#一个HTML文件，找出里面的正文与链接
import requests
from bs4 import BeautifulSoup
def search_body_urls(path):
    #path = 'http://mil.news.sina.com.cn/china/2017-04-05/doc-ifycwymx3854291.shtml'
    page = requests.get(path)
    page.encoding = 'utf-8'
    soup = BeautifulSoup(str(page.text),'html.parser')
    article = soup.select('.content')[0].text
    urls = soup.findAll('a')
    for u in urls:
         print(u['href'])
    print(article)

if __name__ == '__main__':
    search_body_urls(path='http://mil.news.sina.com.cn/china/2017-04-05/doc-ifycwymx3854291.shtml')