Beautifulsoup 小用

最新推荐文章于 2024-02-21 20:10:21 发布

一刀不二

最新推荐文章于 2024-02-21 20:10:21 发布

阅读量582

点赞数

分类专栏： [Python]

本文链接：https://blog.csdn.net/pandora_madara/article/details/50644635

版权

[Python] 专栏收录该内容

33 篇文章 0 订阅

订阅专栏

用 beautifulsoup 爬了下伯克利大学 programming languages and compilers 的课件

import re
import requests
from bs4 import BeautifulSoup

r = requests.get( "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/index.html" )
soup = BeautifulSoup( r.text, "html.parser" )

for elem in soup.findAll( name = "a", attrs = { "href" : re.compile( "lecture[0-9]*.pdf" ) } ):        
    file_name = elem["href"][:-4] + "-" +\
                reduce( lambda a, b: a + " " + b,
                        elem.find_parent().find_previous_sibling().get_text().split( ":" ) ) + ".pdf"
    file_url = "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/" + elem["href"]
    file_get = requests.get( file_url, stream = True )
    with open( file_name, "wb" ) as f:
        for chunk in file_get.iter_content( chunk_size = 1024 ):
            if chunk:
                f.write( chunk )

一刀不二

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Beautifulsoup 小用

用 beautifulsoup 爬了下伯克利大学 programming languages and compilers 的课件import reimport requestsfrom bs4 import BeautifulSoupr = requests.get( "http://inst.eecs.berkeley.edu/~cs164/fa11/lectures/in
复制链接

扫一扫

专栏目录