Python学习笔记之阶段练习

最新推荐文章于 2021-02-19 11:19:20 发布

m0_37717595

最新推荐文章于 2021-02-19 11:19:20 发布

阅读量281

点赞数

分类专栏： Python学习课堂作业 Python学习笔记文章标签： Python

本文链接：https://blog.csdn.net/m0_37717595/article/details/80743873

版权

这篇博客记录了使用Python进行实战练习的过程，包括从银行官网获取网址信息和爬取猫眼电影的前一百名榜单。通过这两个实例，展示了Python在网络数据抓取方面的应用。

摘要由CSDN通过智能技术生成

一、获得银行官网网址信息

from urllib import request
from urllib.request import urlopen
import re
url = 'http://www.cbrc.gov.cn/chinese/jrjg/index.html'
def get_content(url,fileName):
    """
    因为中国银行的官方网址的服务器可能会因为我们多次进行爬虫，
    而对我们的IP进行暂时的封锁，导致实验失败，因此我们只要成功一次，
    将数据保存即可
    :param url:
    :param fileName: 将网页的内容保存到本地文件中
    :return:
    """
    try:
        headers = {'User-agent': 'Chrome/23.0'}
        req = request.Request(url, headers=headers)

        with urlopen(req) as urlObj:
            content =  urlObj.read().decode('utf-8')
    except Exception as Error:
        print('爬取网页信息失败',Error)
    else:
        with open(fileName, 'w') as f:
            f.write(content)
            print('write success')

def get_file_content(fileName,url):
    get_content(url,fileName)
    with open(fileName) as f:
        return f.read().replace('\t', '')   #去掉文本内容中的许多\t
def get_bank_info(filName,url,New_filename):
    # <a href="http://www.jcfc.cn/" target="_blank"  style="color:#08619D">
    # 晋商消费金融股份有限公司
    content = get_file_content(filName,url)
    bank_infor = re.findall(r'<a