day1-html和requests

最新推荐文章于 2024-08-11 02:49:42 发布

南风๑

最新推荐文章于 2024-08-11 02:49:42 发布

阅读量149

点赞数

文章标签： html python json

本文链接：https://blog.csdn.net/qq_57803101/article/details/128538317

版权

1. requests的基本用法

作用1：爬网页数据

# 1）对网页发送请求获取数据
response = requests.get('https://movie.douban.com/top250')

# 2）设置文本内容的编码方式(如果打印结果乱码就需要设置成和网页编码方式一样的值)
response.encoding = 'utf-8'

# 3）获取网页内容(网页源代码)
result = response.text
print(result)

作用2：请求接口数据

# 1）对接口发送请求
response = requests.get('https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js')

# 2）对请求结果进行json解析(将json数据转换成对应的python数据)
result = response.json()
print(result)
for i in result['hero']:
    print(i['name'], i['goldPrice'])

作用3：下载图片、视频、音频等

# 1)对图片地址、视频地址或者音频地址发送请求
response = requests.get('https://iknow-pic.cdn.bcebos.com/4bed2e738bd4b31c6c83b42895d6277f9f2ff8e2')

# 2)获取请求结果(视频、音频、图片的数据)
result = response.content

# 3)将视频、音频或者图片的数据保存到文件中
with open(r'files\a.jpg', 'wb') as f:
    f.write(result)

补充：json数据

json是一种通用的文本数据格式（xml数据格式也是通用的文本数据格式）

1）认识json

要求：：一个json有且只有一个数据，并且这个数据是json支持的类型的数据

json支持的数据类型格式
数字	和数学一样
字符串	只能使用双引号
布尔	true、false
空值	null
数组（列表）	[数据1, 数据2, 数据3, …]
字典	{键1: 值1, 键2: 值2, …}

注意：json中字典的键只能是字符串，值可以是任何类型的数据

2）Python与json的相互转换

Python中的json模块中提供了处理json数据的相关函数

import json

# a.将json数据转换成对应的Python数据
# json字符串      ->      python字符串
# json数字        ->      python数字
# null           ->      None
# true、false     ->      True、False
# json.loads(json格式字符串)     -       将指定的json数据转换成对应的Pyhton数据
# json格式字符串：字符串内容是一个合法的json数据的字符串
result = json.loads('"hello"')
print(result, type(result))     # hello <class 'str'>

result = json.loads('200')
print(result, type(result))     # 200 <class 'int'>

result = json.loads('[10, "asd", null, false]')
print(result, type(result))     # [10, 'asd', None, False] <class 'list'>

# b.将Python数据转换成对应的json格式数据
# json.dumps(python数据)   -   将python数据转换成对应的json格式字符串
json.dumps(100)       # '100'
json.dumps('asd')       # '"asd"'
json.dumps(True)        # 'true'
print(json.dumps({'name': '张三', 'age': 19, '已婚': False, 12: 21}))
# '{"name": "\u5f20\u4e09", "age": 19, "\u5df2\u5a5a": false, "12": 21}'

2. 练习

练习：将所有英雄选择的音频下载下来，下载的时候音频文件的名字用英雄名称命名

import requests


def get_all_audio():
    response = requests.get('https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js')
    result = response.json()
    urls = []
    for i in result['hero']:
        url = i['selectAudio']
        name = i['name']
        urls.append((url, name))
    return urls


def download_audio(hero):
    url = hero[0]
    name = hero[1]
    response = requests.get(url)
    result = response.content
    with open(f'files/{name}.ogg', 'wb') as f:
        f.write(result)
        print(f'{name}下载成功')


if __name__ == '__main__':
    for x in get_all_audio():
        download_audio(x)

3. 浏览器反爬

import requests
from re import findall

# 1)对网页发送请求（伪装成浏览器）
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
response = requests.get('https://movie.douban.com/top250', headers=headers)
# print(response)     # <Response [418]>

# 2)获取网页内容
result = response.text
# print(result)

# 所有电影的中文名称
names = findall(r'<img width="100" alt="(\w+)"', response.text)

# 所有电影的评分
scores = findall(r'property="v:average">(\d+\.?\d*)</span>', response.text)

result = map(lambda i1, i2: {'title': i1, 'score': i2}, names, scores)
print(list(result))