Python | 简单的爬虫

最新推荐文章于 2023-07-24 11:47:44 发布

YuSaMa_

最新推荐文章于 2023-07-24 11:47:44 发布

阅读量98

点赞数

分类专栏： Python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_41009581/article/details/116807868

版权

前言

爬取的数据是：豆瓣电影TOP250
使用的python库有：requests、bs4的BeautifulSoup、pandas。
通过requests爬取网页数据，通过BeautifulSoup解析网页数据，通过pandas将数据保存成excel、csv格式。

爬虫代码

import requests  #爬取数据
from bs4 import BeautifulSoup  #解析数据
import pandas as pd  #保存数据
#主函数，连接其他函数
def main():
    url = "https://movie.douban.com/top250?start="
    html = getUrlData(url)
    getHtmlData(html,url)
#获取网页数据
def getUrlData(url):
    try:
        # 有些网站会拒绝爬虫访问，需要模拟浏览器访问网站，添加请求头
        headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
        }
        r = requests.get(url,headers=headers,timeout=30)
        r.raise_for_status
        r.encoding=r.apparent_encoding
        html =

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

YuSaMa_

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python | 简单的爬虫

前言爬取的数据是：豆瓣电影TOP250使用的python库有：requests、bs4的BeautifulSoup、pandas。通过requests爬取网页数据，通过BeautifulSoup解析网页数据，通过pandas将数据保存成excel、csv格式。爬虫代码import requests #爬取数据from bs4 import BeautifulSoup #解析数据import pandas as pd #保存数据#主函数，连接其他函数def main(): ur
复制链接

扫一扫