【Python有趣打卡】python—调用百度人脸识别API计算颜值

最新推荐文章于 2024-08-01 10:33:26 发布

xxjcyh

最新推荐文章于 2024-08-01 10:33:26 发布

阅读量1.4k

点赞数 2

分类专栏： python爬虫文章标签：人脸识别API

本文链接：https://blog.csdn.net/weixin_37888958/article/details/88142324

版权

python爬虫专栏收录该内容

5 篇文章 0 订阅

订阅专栏

python—调用百度人脸识别API计算颜值

今天要跟着罗罗攀（公众号：luoluopan1）学习Python有趣|寻找知乎最美小姐姐

参加了罗罗攀的python打卡，太有意思了，安利给大家，原文：https://mp.weixin.qq.com/s/M64NBbAFglxscPOvuz0r-w
此文章仅为学习文章~~

爬虫网页：https://www.zhihu.com/question/295119062
在这里插入图片描述
爬虫目的：爬取网页小姐姐们的照片，并调用百度人脸识别API进行颜值打分

分析网页

该网页采用了异步加载技术，就是不停的滑呀滑呀，就会出来好多小姐姐们的回答，找到滑动会加载的答案页面
在这里插入图片描述
观察这个网址

我一般都会选取几个网址，一起放在txt里，进行对比

放在一起就会很容易发现，除了offset会有不同，而且每5个一轮，其他都是固定不变的，因此这个url构造起来还是比较容易的，不断的循环offset即可，是不是敲简单~
我们任选一个url，打开看下~
在这里插入图片描述
这是json格式的，这里案例个小公举——“https://www.json.cn”（这里感谢下CYH推荐的哈哈哈哈哈），只要把json格式的内容拷贝到这个网页上就能稍微好看点

观察这个文件，可以发现主要的回答的内容在data的content里，我们还需要一些别的信息，比如知乎用户名（在data的author的name里），最重要的当然是图片啦，可以看到，图片的连接都在content里
在这里插入图片描述
现在我们已经知道这些网址是什么了，也知道我们要的数据在网页返回内容的什么位置了，那我们就开始爬取我们要的数据吧！

爬取网页

import requests
from lxml import etree
import json
import time
import re

headers={'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36',
        'cookie':'填写你自己的哦'}

def get_img(url):
    res = requests.get(url,headers = headers)
    i = 1
    json_data = json.loads(res.text)
    datas = json_data['data']
    for data in datas:
        Id = data['author']['name']
        content = data['content']
        imgs = re.findall('img src="(.*?)"',content,re.S)     
        if len(imgs) == 0 :   #也有没有po照片的~
            pass
        else:
            for img in imgs:
                if 'jpg' in img:
                    res_img = requests.get(img,headers=headers)
                    fp = open('存放文件的地址'+ Id + '+' + str(i)+'.jpg','wb')
                    fp.write(res_img.content)
                    i = i+1    #有的小姐姐po了很多照片，emmm和表情包
                    print(id,img)

if __name__ =="__main__":
    urls =['https://www.zhihu.com/api/v4/questions/29024583/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset={}&platform=desktop&sort_by=default'.format(
str(i)) for i in range(0,25000,5)]
    for url in urls:
        get_img(url)
        time.sleep(2)

在这里插入图片描述

人脸识别API

我们已经获得了一堆小姐姐的美美的照片了，偷偷塞了几张好基友的照片哈哈哈哈哈哈哈，算下他们的颜值，但是图片中还有一些男的（不想要），还有各种表情包！介么多照片肯定不能一张一张挑选，可以使用百度的人脸识别API进行图片筛选和打分（最期待打分了）~
接下来就是调用接口的时候了
百度人脸识别：http://ai.baidu.com/tech/face
人脸识别手册：https://ai.baidu.com/docs#/Face-Detect-V3/top
按照文档的要求一步一来就可以了
第一步：创建应用
在这里插入图片描述
创建成功后，会获得API Key和Secret Key，这类似于你的通行证，有了他们你才能调用

第二步：根据人脸识别文档的第一步是通过API Key和Secret Key获取token

import requests

ak = '你自己的ak'
sk = '你自己的sk'
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={}&client_secret={}'.format(ak,sk)
res = requests.post(host)
print(res.text)

在这里插入图片描述
用这个token就可以对API进行请求了
直接上代码（建议对着开发文档写代码）

import base64
import json
import requests

token = '刚刚得到的token
def get_img_base(file):
    with open(file,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content
requests_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
requests_url = requests_url+'?access_token=' +token

params ={
        'image':get_img_base(r'C:\Users\xxj\Desktop\test.jpg'),
        'image_type':'BASE64',
        'face_field':'age,beauty,gender'
        }

res = requests.post(requests_url,data = params)
result = res.text
json_result = json.loads(result)

code = json_result['error_code']
gender = json_result['result']['face_list'][0]['gender']['type']
beauty = json_result['result']['face_list'][0]['beauty']
print(code,gender,beauty)

这里以我居居老师为例哈哈哈哈哈哈哈哈，看下我神仙颜值的居老师有多少分
在这里插入图片描述

辣鸡，怎么可能才73.41分，表示怀疑，应该100昏！！！！！
再试试我丽颖
在这里插入图片描述
真神仙颜值！！！哈哈哈哈哈哈哈哈

综合

最后，我们要调用接口过滤掉非女孩子，非人物的照片，对小姐姐照片进行打分，并按照不同的级别的分数进行分类放置在不同的文件夹里。

import requests
import os
import base64
import json
import time

def get_img_base(file):
    with open(file,'rb') as fp:
        content = base64.b64encode(fp.read())
        return content

file_path = 'C:/Users/Desktop/test'
list_paths = os.listdir(file_path)
for list_path in list_paths:
    img_path = file_path + '/'+ list_path
    token = ''
    requests_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
    requests_url = requests_url+'?access_token=' +token
    
    params ={
            'image':get_img_base(img_path),
            'image_type':'BASE64',
            'face_field':'age,beauty,gender'
            }
    
    res = requests.post(requests_url,data = params)
    result = res.text
    json_result = json.loads(result)
    
    code = json_result['error_code']
    if code == 222202:
        continue
    try:
        gender = json_result['result']['face_list'][0]['gender']['type']
        if gender == 'male':
            continue
        beauty = json_result['result']['face_list'][0]['beauty']
        new_beauty = round(beauty/10,1)
        print(img_path,new_beauty)
        if new_beauty >= 8:
            os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/8分',str(new_beauty) +'+'+ list_path))
        elif new_beauty >= 7:
            os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/7分',str(new_beauty) +'+'+ list_path))
        elif new_beauty >= 6:
            os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/6分',str(new_beauty) +'+'+ list_path))
        elif new_beauty >= 5:
            os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/xxj/Desktop/5分',str(new_beauty) +'+'+ list_path))
        else:            
            os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/xxj/Desktop/哎',str(new_beauty) +'+'+ list_path))
        time.sleep(1)
    except KeyError:
        pass
    except TypeError:
        pass