1、前言
近期要做个具备图像识别、图像标注能力的演示项目。团队之前也没研究过图像识别技术,这自己研究哪搞的出来。然后调研下市场上图像识别技术较好的产品及三方能力,最终选择了借助Cloud Vision、Amazon Rekognition、Gemini 1.5能力来临时做个图像标注的Demo。
1.1 图像标注三方能力说明
-
Cloud Vision AI
图像标注能力,识别图像中的人脸、物理对象、分类标签,给出标签对应对象在图像中的坐标。 -
Amazon Rekognition
识别名人能力,识别图像中的名人,给出名人在图像中的坐标。关联知识库查出名人介绍,名人介绍外链地址。 -
Gemini 1.5
图像分析能力,识别图像内容,分析图像包含的标签对象、名人、场景,给出图像内容的综合性描述。图像分析能力支持录入问题,比如录入"图像中包含多少男人、多少女人?"、"图像是否发生起火,有几个起火点。"等问题,图像分析将依据录入的问题有针对性的分析。不录入的问题的情况下,将自由发散的分析图像内容,综合性的描述图像表达的内容。
1.2 成品效果如下所示
人脸识别、名人识别效果图:
图像识别对象标注效果图:
图像分析结果图
2 Cloud Vision AI 图像标注
2.1 Cloud Vision AI 产品文档
https://cloud.google.com/vision/docs?hl=zh-cn
2.2 Cloud Vision AI 体验地址
https://cloud.google.com/vision/docs/drag-and-drop?hl=zh-cn
2.3 主要功能特性
包含人脸检测、对象检测、OCR文本识别、文档文本识别、地标检测、徽标检测、黄色内容安全检测等。
- 人脸检测
使用边界多边形定位脸部,并识别具体的面部“特征”(例如眼睛、耳朵、鼻子、嘴巴等)及可信度。
返回情绪(喜悦、悲伤、愤怒、惊喜)及对应情绪的可信度评分。 - 对象检测
识别出图片中的多个对象,并为对象标识常规标签(例如汽车、人、建筑)和边界框注释,以及对象的可信度。
对于检测的每个对象会给出该对象的文本描述、可信度分数以及该对象的边界多边形的规范化顶点 [0,1]。
基于多边形顶点坐标可以在图像上绘制出多边形边界,实现图像标注。 - 文本检测(TEXT_DETECTION)
基于图片的光学字符识别(OCR)及文本转换技术,识别并提取图像中的 UTF-8 文本。 - 文档文本检测(DOCUMENT_TEXT_DETECTION)
文件(PDF/TIFF)或密集文本图片的光学字符识别(OCR);密集文本识别和机器编码文本转换。
推荐使用DOCUMENT_TEXT_DETECTION代替TEXT_DETECTION。 - 地标检测
识别图片中地标的名称、可信度分数和边界多边形的规范化顶点坐标。
地标如:埃菲尔铁塔、中国尊、广州塔、东方明珠 - 徽标检测
识别图片中徽标的名称、可信度分数和边界多边形的规范化顶点坐标。
徽标如:Baidu、Google、BYD - 露骨内容检测
提供以下露骨内容类别的相似度评分:adult、spoof、medical、violence 和 racy。 - 标签检测
提供图片的通用化标签。
对于每个标签,系统会返回文本描述、可信度分数和话题性分数。 - 图片属性
返回图片中的主色。
每种颜色以 RGBA 颜色空间表示,返回可信度分数,并且会显示该颜色占据的像素比例 [0, 1]。 - 裁剪提示检测
为每个请求提供剪裁后图片的边界多边形、可信度分数以及此重要区域相对于原始图片的比例。 - Web实体和页面
提供一系列与图片相关的 Web 内容。
2.4 创建Google Cloud API Key
访问Google Cloud 控制台,点击选择项目下拉框,然后新建项目,项目名称自定义,此处为ImageRecognition。
切换到新建的项目ImageRecognition,在快速访问下点击API和服务。进入API和服务页面点击启用API和服务,搜索Cloud Vision,打开服务并点击启用。
访问OAuth 权限请求页面,按引导配置OAuth同意屏幕
访问凭据页面,创建API Key。编辑密钥修改API限制,考虑到风控,可以限制下当前密钥能访问的API能力。
然后拿着这个API密钥去访问Cloud Vision API。
2.5 图像识别接口文档
2.6 Pom依赖
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-vision</artifactId>
<version>3.44.0</version>
</dependency>
2.7 Java 实现
图像识别业务接口
import com.niaonao.response.BatchAnnotateImagesResponse;
public interface IGoogleVisionService {
/**
* BatchAnnotateImagesResponse 此处没有用sdk,该response是自定义的实体对象
*/
BatchAnnotateImagesResponse imageAnnotate(String imageUrl);
}
图像识别业务实现类
import com.alibaba.fastjson.TypeReference;
import com.niaonao.response.BatchAnnotateImagesResponse;
import com.niaonao.service.IGoogleVisionService;
import com.niaonao.util.HttpClient;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.util.*;
/**
*
* TYPE_UNSPECIFIED 未指定的特征类型。
* FACE_DETECTION 运行人脸检测。
* LANDMARK_DETECTION 运行地标检测。
* LOGO_DETECTION 运行徽标检测。
* LABEL_DETECTION 运行标签检测。
* TEXT_DETECTION 运行文本检测/光学字符识别 (OCR)。文本检测针对较大图像中的文本区域进行了优化;如果图像是文档,则使用。DOCUMENT_TEXT_DETECTION
* DOCUMENT_TEXT_DETECTION 运行密集文本文档 OCR。当和同时存在时优先。DOCUMENT_TEXT_DETECTIONTEXT_DETECTION
* SAFE_SEARCH_DETECTION 运行安全搜索来检测潜在的不安全或不良内容。
* IMAGE_PROPERTIES 计算一组图像属性,例如图像的主色。
* CROP_HINTS 运行裁剪提示。
* WEB_DETECTION 运行网络检测。
* PRODUCT_SEARCH 运行产品搜索。
* OBJECT_LOCALIZATION 运行定位器进行物体检测。
*/
@Service
@Slf4j
public class GoogleVisionServiceImpl implements IGoogleVisionService {
private final static String URL = "https://vision.googleapis.com/v1p4beta1/images:annotate?key=";
private final static String KEY= "此处换成创建的API密钥";
@Override
public BatchAnnotateImagesResponse imageAnnotate(String imageUrl) {
try {
Map<String, String> headers = new HashMap<>();
headers.put("Content-Type", "application/json");
Map<String, Object> params = new HashMap<>();
// Create the features list
List<Map<String, Object>> featuresList = new ArrayList<>();
featuresList.add(createFeature(50, "LANDMARK_DETECTION"));
featuresList.add(createFeature(50, "FACE_DETECTION"));
featuresList.add(createFeatureWithModel(100, "builtin/latest", "OBJECT_LOCALIZATION"));
featuresList.add(createFeatureWithModel(50, "builtin/latest", "LOGO_DETECTION"));
featuresList.add(createFeature(50, "LABEL_DETECTION"));
featuresList.add(createFeature(50, "PRODUCT_SEARCH"));
featuresList.add(createFeatureWithModel(50, "builtin/latest", "DOCUMENT_TEXT_DETECTION"));
featuresList.add(createFeature(50, "SAFE_SEARCH_DETECTION"));
featuresList.add(createFeature(50, "IMAGE_PROPERTIES"));
featuresList.add(createFeature(50, "CROP_HINTS"));
// Create the image source map
Map<String, String> imageSourceMap = new HashMap<>();
imageSourceMap.put("imageUri", imageUrl);
// Create the image map
Map<String, Object> imageMap = new HashMap<>();
imageMap.put("source", imageSourceMap);
// Create the request map
Map<String, Object> requestMap = new HashMap<>();
requestMap.put("features", featuresList);
requestMap.put("image", imageMap);
// Create the requests list
List<Map<String, Object>> requestsList = new ArrayList<>();
requestsList.add(requestMap);
// Put the requests list into the main map
params.put("requests", requestsList);
BatchAnnotateImagesResponse response = HttpClient.post(URL + KEY, params, headers, new TypeReference<BatchAnnotateImagesResponse>() {});
return response;
} catch (Exception e) {
log.error("annotateImages error", e);
}
return null;
}
private static Map<String, Object> createFeature(int maxResults, String type) {
Map<String, Object> featureMap = new HashMap<>();
featureMap.put("maxResults", maxResults);
featureMap.put("type", type);
return featureMap;
}
private static Map<String, Object> createFeatureWithModel(int maxResults, String model, String type) {
Map<String, Object> featureMap = new HashMap<>();
featureMap.put("maxResults", maxResults);
featureMap.put("model", model);
featureMap.put("type", type);
return featureMap;
}
}
httpclient工具类
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.alibaba.fastjson.TypeReference;
import lombok.Getter;
import lombok.Setter;
import lombok.extern.slf4j.Slf4j;
import okhttp3.*;
import org.apache.http.client.utils.URIBuilder;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.net.URL;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.TimeUnit;
/**
* @Author guan
* @Description TODO
* @Date 2024/07/08 14:50
* @Version 1.0
*/
@Slf4j
@Service
public class HttpClient {
/**
* Get HTTP client
*/
@Getter
@Setter
protected static OkHttpClient httpClient;
public HttpClient() {
httpClient = new OkHttpClient().newBuilder()
.connectTimeout(60 * 1000, TimeUnit.MILLISECONDS)
.readTimeout(5 * 60 * 1000, TimeUnit.MILLISECONDS)
.writeTimeout(5 * 60 * 1000, TimeUnit.MILLISECONDS)
.build();
}
public static <T> T get(String url, Object param, Headers headers, TypeReference<T> responseType) throws Exception {
URIBuilder ub = new URIBuilder(url);
if (Objects.nonNull(param)) {
JSONObject paramData = (JSONObject) JSON.toJSON(param);
paramData.forEach((k, v) -> {
if (Objects.nonNull(v)) {
ub.addParameter(k, v instanceof String ? (String) v : JSON.toJSONString(v));
}
});
}
URL wholeUrl = ub.build().toURL();
Request request = new Request.Builder()
.url(wholeUrl)
.method("GET", null)
.headers(headers)
.build();
Response response = httpClient.newCall(request).execute();
String responseStr = response.body().string();
log.info("execute get request url is:{}, execute for response:{}", wholeUrl, responseStr);
T t = JSON.parseObject(responseStr, responseType);
return t;
}
/**
* post方法
*/
public static <T> T post(String url, Object param, Map<String,String> headerMap, TypeReference<T> responseType) throws Exception {
String paramJson = JSON.toJSONString(param);
RequestBody requestBody = RequestBody.create(MediaType.parse("application/json"), paramJson);
Headers headers = Headers.of(headerMap);
Request request = new Request.Builder()
.url(url)
.method("POST", requestBody)
.headers(headers)
.build();
Response response = httpClient.newCall(request).execute();
String responseStr = Objects.requireNonNull(response.body()).string();
log.info("execute post request url is:{} , param is {} ,execute for response:{}", url, paramJson, responseStr);
T t = JSON.parseObject(responseStr, responseType);
return t;
}
/**
* post方法
*/
public static <T> T postForm(String url, Map<String, String> param, Map<String,String> headerMap, TypeReference<T> responseType) throws Exception {
// 构造表单数据
FormBody.Builder formBuilder = new FormBody.Builder();
for (Map.Entry<String, String> entry : param.entrySet()) {
formBuilder.add(entry.getKey(), entry.getValue());
}
RequestBody requestBody = formBuilder.build();
Headers headers = Headers.of(headerMap);
Request request = new Request.Builder()
.url(url)
.post(requestBody)
.headers(headers)
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (!response.isSuccessful()) {
throw new IOException("Unexpected code " + response);
}
String responseStr = Objects.requireNonNull(response.body()).string();
log.info("execute post request url is:{} , param is {} ,execute for response:{}", url, JSON.toJSONString(param), responseStr);
return JSON.parseObject(responseStr, responseType);
}
}
}
图像识别响应实体类
import lombok.Data;
import java.util.List;
@Data
public class BatchAnnotateImagesResponse {
private List<AnnotateImageResponse> responses;
@Data
public class AnnotateImageResponse {
private List<FaceAnnotation> faceAnnotations;
private List<LocalizedObjectAnnotation> localizedObjectAnnotations;
private List<EntityAnnotation> labelAnnotations;
@Data
public class FaceAnnotation {
private BoundingPoly boundingPoly;
private BoundingPoly fdBoundingPoly;
private List<Landmark> landmarks;
private Float rollAngle;
private Float panAngle;
private Float tiltAngle;
private Float detectionConfidence;
private Float landmarkingConfidence;
private String joyLikelihood;
private String sorrowLikelihood;
private String angerLikelihood;
private String surpriseLikelihood;
private String underExposedLikelihood;
private String blurredLikelihood;
private String headwearLikelihood;
// vision ai 的名人识别
// private List<FaceRecognitionResult> recognitionResult_;
/**
* BatchAnnotateImagesResponse.FaceAnnotation.BoundingPoly.Vertex
* RecognizeCelebritiesResult.Celebrity.ComparedFace.BoundingBox
* 将 Vision AI 返回的 VisionBoundingPoly 转换为 BoundingBox
* 并与 AWS 返回的 BoundingBox 进行匹配。
* 匹配算法基于 bounding box 的中心点距离,
* 确保即使 bounding box 的尺寸和位置不同,也能找到最相似的一对。
*/
private Object celebrityFace;
@Data
public class BoundingPoly {
private List<Vertex> vertices;
private List<NormalizedVertex> normalizedVertices;
public BoundingPoly() {
}
public BoundingPoly(List<Vertex> vertices) {
this.vertices = vertices;
}
@Data
public class Vertex {
private Integer x;
private Integer y;
public Vertex() {
}
public Vertex(Integer x, Integer y) {
this.x = x;
this.y = y;
}
}
@Data
public class NormalizedVertex {
private Float x;
private Float y;
}
}
@Data
public class Landmark {
private String type;
private Position position;
@Data
public class Position {
private Float x;
private Float y;
private Float z;
}
}
}
@Data
public class LocalizedObjectAnnotation {
private String mid;
private String languageCode;
private String name;
private Float score;
private BoundingPoly boundingPoly;
@Data
public class BoundingPoly {
private List<Vertex> vertices;
private List<NormalizedVertex> normalizedVertices;
@Data
public class Vertex {
private Integer x;
private Integer y;
}
@Data
public class NormalizedVertex {
private Float x;
private Float y;
public NormalizedVertex() {
}
public NormalizedVertex(Float x, Float y) {
this.x = x;
this.y = y;
}
}
}
}
@Data
public class EntityAnnotation {
private String mid;
private String locale;
private String description;
private Float score;
private Float confidence;
private Float topicality;
private BoundingPoly boundingPoly;
// private List<LocationInfo> locations;
// private List<Property> properties;
@Data
public class BoundingPoly {
private List<Vertex> vertices;
private List<NormalizedVertex> normalizedVertices;
@Data
public class Vertex {
private Integer x;
private Integer y;
}
@Data
public class NormalizedVertex {
private Float x;
private Float y;
}
}
}
}
}
3 Amazon Rekognition 识别名人
3.1 识别名人产品文档
https://docs.aws.amazon.com/zh_cn/rekognition/latest/dg/celebrities-procedure-image.html
3.2 创建aws账户获取sdk
使用 AmazonRekognitionFullAccess 和 AmazonS3ReadOnlyAccess 权限创建或更新用户。设置 AWS 账户并创建用户。
安装并配置 AWS CLI 和 AWS SDK。设置 AWS CLI 和 AWS 软件开发工具包。
3.3 在创建 AWS 账户和用户中创建的用户创建一个访问密钥
登录 AWS Management Console 并打开 IAM 控制台。在导航窗格中,选择用户。选择在创建 AWS 账户和用户中创建的用户的名称。选择安全凭证选项卡。选择创建访问密钥。然后,选择下载 .csv 文件,将访问密钥 ID 和秘密访问密钥保存至计算机上的 CSV 文件中。
3.4 Pom依赖
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>secretsmanager</artifactId>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>rekognition</artifactId>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>sqs</artifactId>
</dependency>
3.5 Java实现
AWS 客户端类
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
@Slf4j
@Service
public class AwsClient {
private final static String accessKeyId = "aws账户的appKey";
private final static String secretAccessKey = "aws账户的appSecret";
public static RekognitionClient rekClient() {
Region region = Region.US_EAST_1;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create(accessKeyId, secretAccessKey);
return RekognitionClient.builder()
.region(region)
.credentialsProvider(StaticCredentialsProvider.create(awsCreds))
.build();
}
}
名人识别业务接口
import com.niaonao.response.RecognizeCelebritiesResult;
public interface IAwsService {
RecognizeCelebritiesResult recognizeAllCelebrities(String imageUrl);
}
名人识别业务实现类
import com.niaonao.client.AwsClient;
import com.niaonao.response.RecognizeCelebritiesResult;
import com.niaonao.service.IAwsService;
import com.google.api.client.util.IOUtils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import software.amazon.awssdk.core.SdkBytes;
import software.amazon.awssdk.services.rekognition.model.*;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
@Service
@Slf4j
public class AwsServiceImpl implements IAwsService {
@Override
public RecognizeCelebritiesResult recognizeAllCelebrities(String imageUrl) {
RecognizeCelebritiesResult recognizeCelebritiesResult = new RecognizeCelebritiesResult();
try {
InputStream inputStream;
if (imageUrl.startsWith("http")) {
HttpURLConnection uc = (HttpURLConnection) new URL(imageUrl).openConnection();
uc.connect();
inputStream = uc.getInputStream();
} else { // 本地
Path path = Paths.get(imageUrl);
inputStream = Files.newInputStream(path);
}
// 图片宽高
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
IOUtils.copy(inputStream, byteArrayOutputStream);
byte[] bytes = byteArrayOutputStream.toByteArray();
SdkBytes sourceBytes = SdkBytes.fromInputStream(new ByteArrayInputStream(bytes));
Image souImage = Image.builder()
.bytes(sourceBytes)
.build();
RecognizeCelebritiesRequest request = RecognizeCelebritiesRequest.builder()
.image(souImage)
.build();
RecognizeCelebritiesResponse result = AwsClient.rekClient().recognizeCelebrities(request);
List<RecognizeCelebritiesResult.Celebrity> celebrityFaces = new ArrayList<>();
List<RecognizeCelebritiesResult.ComparedFace> unrecognizedFaces = new ArrayList<>();
List<ComparedFace> unrecognizedFacesResult = result.unrecognizedFaces();
List<Celebrity> celebrityFacesResult = result.celebrityFaces();
for (ComparedFace comparedFaceResult : unrecognizedFacesResult) {
RecognizeCelebritiesResult.ComparedFace comparedFace = recognizeCelebritiesResult.new ComparedFace();
handleMapping(comparedFaceResult, comparedFace);
unrecognizedFaces.add(comparedFace);
}
for (Celebrity celebrityResult : celebrityFacesResult) {
RecognizeCelebritiesResult.Celebrity celebrity = recognizeCelebritiesResult.new Celebrity();
ComparedFace comparedFaceResult = celebrityResult.face();
String idResult = celebrityResult.id();
KnownGender knownGenderResult = celebrityResult.knownGender();
Float matchConfidenceResult = celebrityResult.matchConfidence();
String nameResult = celebrityResult.name();
List<String> urlsResult = celebrityResult.urls();
RecognizeCelebritiesResult.Celebrity.ComparedFace comparedFace = celebrity.new ComparedFace();
handleMappingCelebrity(comparedFaceResult, comparedFace);
String typeResult = knownGenderResult.typeAsString();
RecognizeCelebritiesResult.Celebrity.KnownGender knownGender = celebrity.new KnownGender();
knownGender.setType(typeResult);
celebrity.setFace(comparedFace);
celebrity.setId(idResult);
celebrity.setKnownGender(knownGender);
celebrity.setMatchConfidence(matchConfidenceResult);
celebrity.setName(nameResult);
celebrity.setUrls(urlsResult);
celebrityFaces.add(celebrity);
}
recognizeCelebritiesResult.setCelebrityFaces(celebrityFaces);
recognizeCelebritiesResult.setUnrecognizedFaces(unrecognizedFaces);
} catch (Exception e) {
log.error("recognizeAllCelebrities error url {}", imageUrl, e);
}
return recognizeCelebritiesResult;
}
private void handleMapping(ComparedFace comparedFaceResult, RecognizeCelebritiesResult.ComparedFace comparedFace) {
RecognizeCelebritiesResult.ComparedFace.BoundingBox boundingBox = comparedFace.new BoundingBox();
Float confidence = comparedFaceResult.confidence();
List<RecognizeCelebritiesResult.ComparedFace.Landmark> landmarks = new ArrayList<>();
RecognizeCelebritiesResult.ComparedFace.Pose pose = comparedFace.new Pose();
RecognizeCelebritiesResult.ComparedFace.ImageQuality quality = comparedFace.new ImageQuality();
List<RecognizeCelebritiesResult.ComparedFace.Emotion> emotions = new ArrayList<>();
RecognizeCelebritiesResult.ComparedFace.Smile smile = comparedFace.new Smile();
List<Landmark> landmarksResult = comparedFaceResult.landmarks();
List<Emotion> emotionsResult = comparedFaceResult.emotions();
for (Landmark landmarkResult : landmarksResult) {
RecognizeCelebritiesResult.ComparedFace.Landmark landmark = comparedFace.new Landmark();
landmark.setType(landmarkResult.typeAsString());
landmark.setX(landmarkResult.x());
landmark.setY(landmarkResult.y());
landmarks.add(landmark);
}
for (Emotion emotionResult : emotionsResult) {
RecognizeCelebritiesResult.ComparedFace.Emotion emotion = comparedFace.new Emotion();
emotion.setType(emotionResult.typeAsString());
emotion.setConfidence(emotionResult.confidence());
emotions.add(emotion);
}
BoundingBox boundingBoxResult = comparedFaceResult.boundingBox();
boundingBox.setHeight(boundingBoxResult.height());
boundingBox.setLeft(boundingBoxResult.left());
boundingBox.setTop(boundingBoxResult.top());
boundingBox.setWidth(boundingBoxResult.width());
Pose poseResult = comparedFaceResult.pose();
pose.setPitch(poseResult.pitch());
pose.setRoll(poseResult.roll());
pose.setYaw(poseResult.yaw());
ImageQuality imageQualityResult = comparedFaceResult.quality();
quality.setBrightness(imageQualityResult.brightness());
quality.setSharpness(imageQualityResult.sharpness());
Smile smileResult = comparedFaceResult.smile();
smile.setConfidence(smileResult.confidence());
smile.setValue(smileResult.value());
comparedFace.setBoundingBox(boundingBox);
comparedFace.setConfidence(confidence);
comparedFace.setLandmarks(landmarks);
comparedFace.setPose(pose);
comparedFace.setQuality(quality);
comparedFace.setEmotions(emotions);
comparedFace.setSmile(smile);
}
private void handleMappingCelebrity(ComparedFace comparedFaceResult, RecognizeCelebritiesResult.Celebrity.ComparedFace comparedFace) {
RecognizeCelebritiesResult.Celebrity.ComparedFace.BoundingBox boundingBox = comparedFace.new BoundingBox();
Float confidence = comparedFaceResult.confidence();
List<RecognizeCelebritiesResult.Celebrity.ComparedFace.Landmark> landmarks = new ArrayList<>();
RecognizeCelebritiesResult.Celebrity.ComparedFace.Pose pose = comparedFace.new Pose();
RecognizeCelebritiesResult.Celebrity.ComparedFace.ImageQuality quality = comparedFace.new ImageQuality();
List<RecognizeCelebritiesResult.Celebrity.ComparedFace.Emotion> emotions = new ArrayList<>();
RecognizeCelebritiesResult.Celebrity.ComparedFace.Smile smile = comparedFace.new Smile();
List<Landmark> landmarksResult = comparedFaceResult.landmarks();
List<Emotion> emotionsResult = comparedFaceResult.emotions();
for (Landmark landmarkResult : landmarksResult) {
RecognizeCelebritiesResult.Celebrity.ComparedFace.Landmark landmark = comparedFace.new Landmark();
landmark.setType(landmarkResult.typeAsString());
landmark.setX(landmarkResult.x());
landmark.setY(landmarkResult.y());
landmarks.add(landmark);
}
for (Emotion emotionResult : emotionsResult) {
RecognizeCelebritiesResult.Celebrity.ComparedFace.Emotion emotion = comparedFace.new Emotion();
emotion.setType(emotionResult.typeAsString());
emotion.setConfidence(emotionResult.confidence());
emotions.add(emotion);
}
BoundingBox boundingBoxResult = comparedFaceResult.boundingBox();
boundingBox.setHeight(boundingBoxResult.height());
boundingBox.setLeft(boundingBoxResult.left());
boundingBox.setTop(boundingBoxResult.top());
boundingBox.setWidth(boundingBoxResult.width());
Pose poseResult = comparedFaceResult.pose();
pose.setPitch(poseResult.pitch());
pose.setRoll(poseResult.roll());
pose.setYaw(poseResult.yaw());
ImageQuality imageQualityResult = comparedFaceResult.quality();
quality.setBrightness(imageQualityResult.brightness());
quality.setSharpness(imageQualityResult.sharpness());
Smile smileResult = comparedFaceResult.smile();
smile.setConfidence(smileResult.confidence());
smile.setValue(smileResult.value());
comparedFace.setBoundingBox(boundingBox);
comparedFace.setConfidence(confidence);
comparedFace.setLandmarks(landmarks);
comparedFace.setPose(pose);
comparedFace.setQuality(quality);
comparedFace.setEmotions(emotions);
comparedFace.setSmile(smile);
}
}
名人识别响应实体类
import lombok.Data;
import java.util.List;
@Data
public class RecognizeCelebritiesResult {
private List<Celebrity> celebrityFaces;
private List<ComparedFace> unrecognizedFaces;
@Data
public class Celebrity {
private ComparedFace face;
private String id;
private KnownGender knownGender;
private Float matchConfidence;
private String name;
private List<String> urls;
@Data
public class KnownGender {
private String type;
}
@Data
public class ComparedFace {
private BoundingBox boundingBox;
private Float confidence;
private List<Landmark> landmarks;
private Pose pose;
private ImageQuality quality;
private List<Emotion> emotions;
private Smile smile;
@Data
public class BoundingBox {
private Float width;
private Float height;
private Float left;
private Float top;
public BoundingBox() {
}
public BoundingBox(Float width, Float height, Float left, Float top) {
this.width = width;
this.height = height;
this.left = left;
this.top = top;
}
}
@Data
public class Landmark {
private String type;
private Float x;
private Float y;
}
@Data
public class Pose {
private Float roll;
private Float yaw;
private Float pitch;
}
@Data
public class ImageQuality {
private Float brightness;
private Float sharpness;
}
@Data
public class Emotion {
private String type;
private Float confidence;
}
@Data
public class Smile {
private Boolean value;
private Float confidence;
}
}
}
@Data
public class ComparedFace {
private BoundingBox boundingBox;
private Float confidence;
private List<Landmark> landmarks;
private Pose pose;
private ImageQuality quality;
private List<Emotion> emotions;
private Smile smile;
@Data
public class BoundingBox {
private Float width;
private Float height;
private Float left;
private Float top;
}
@Data
public class Landmark {
private String type;
private Float x;
private Float y;
}
@Data
public class Pose {
private Float roll;
private Float yaw;
private Float pitch;
}
@Data
public class ImageQuality {
private Float brightness;
private Float sharpness;
}
@Data
public class Emotion {
private String type;
private Float confidence;
}
@Data
public class Smile {
private Boolean value;
private Float confidence;
}
}
}
4 Gemini 1.5 图像分析
4.1 图像分析产品文档
https://ai.google.dev/gemini-api/docs/get-started/tutorial?lang=rest
4.2 创建一个API 密钥
在 Google AI Studio 创建一个 API 密钥用于调用Gemini API。
4.3 Java 实现
图像分析业务接口
import com.niaonao.response.GenerateContentResponse;
public interface IGeminiService {
/**
* 图片Gemini分析
* @param text 明确的表达想要的分析结果
* @param mimeType 资源类型
* @param url 资源地址
* @return GenerateContentResponse 此处为自定义实体
*/
GenerateContentResponse imageAnalyze(String text, String mimeType, String url);
}
图像分析业务实现类
import com.alibaba.fastjson.JSON;
import com.niaonao.response.GenerateContentResponse;
import com.niaonao.service.IGeminiService;
import com.niaonao.util.FileUtils;
import com.ctrip.framework.apollo.spring.annotation.ApolloJsonValue;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.util.*;
@Service("GeminiVisionService")
@Slf4j
public class GeminiServiceImpl implements IGeminiService {
/**
* https://console.cloud.google.com/ 创建项目
* 在Google AI Studio创建api key。https://aistudio.google.com/app/apikey
*/
private String key = "此处为创建的gemini api 密钥";
/** 指定模型版本 */
private static final String MODELS = "gemini-1.5-pro";
private static final String MODELS_1_5_FLASH = "gemini-1.5-flash";
/** gemini-1.5-pro:generateContent */
private static final String GENERATE_CONTENT = new StringBuilder("https://generativelanguage.googleapis.com/v1beta/models/").append(MODELS_1_5_FLASH).append(":generateContent?key=").toString();
/** 图像分析提示 */
private String textPrompt = "Please output the image recognition result in Chinese. Pay attention to analyze the elements of the image such as people, animals, objects, scenes, etc. The elements not contained in the image do not need to be analyzed. Then summarize the content of the image.";
/**
* 图片Gemini分析
* https://ai.google.dev/api/rest/v1beta/models/generateContent
* @param text 明确的表达想要的分析结果
* @param mimeType 资源类型
* @param url 资源地址
* @return String
*/
@Override
public GenerateContentResponse imageAnalyze(String text, String mimeType, String url) {
try {
String base64 = FileUtils.readFileFromUrl(url);
CloseableHttpClient httpClient = HttpClients.createDefault();
// 创建HttpPost请求
HttpPost httpPost = new HttpPost(GENERATE_CONTENT + key);
// 设置请求头
httpPost.addHeader("Content-Type".intern(), "application/json".intern());
String replaceText = StringUtils.isEmpty(text) ? textPrompt : new StringBuilder(textPrompt).append(text).toString();
String request = "{\"contents\":[{\"parts\":[{\"text\":\"#text\"},{\"inline_data\":{\"mime_type\":\"#mime_type\",\"data\":\"#data\"}}]}]}";
request = request.replace("#text", replaceText);
request = request.replace("#mime_type", mimeType);
request = request.replace("#data", base64);
httpPost.setEntity(new StringEntity(request, ContentType.APPLICATION_JSON));
// 发送请求并获取响应
CloseableHttpResponse response = httpClient.execute(httpPost);
int statusCode = response.getStatusLine().getStatusCode();
if (statusCode == 200) {
// Parse the JSON response
HttpEntity entity = response.getEntity();
String jsonResponse = EntityUtils.toString(entity);
GenerateContentResponse generateContentResponse = JSON.parseObject(jsonResponse, GenerateContentResponse.class);
return generateContentResponse;
}
} catch (IOException e) {
log.error("调Gemini接口报错:", e);
throw new RuntimeException(e);
}
return null;
}
}
文件处理工具类
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;
@Slf4j
public class FileUtils {
public static String readFileFromUrl(String url) throws IOException {
if (StringUtils.isEmpty(url)) {
return null;
}
URL urlObj = new URL(url);
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setRequestMethod("GET");
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
InputStream inputStream = connection.getInputStream();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
// 将视频文件内容编码为 Base64
return encodeToBase64(outputStream.toByteArray());
} else {
throw new RuntimeException("Error fetching file: " + responseCode);
}
}
// 将字节数组编码为 Base64 字符串
private static String encodeToBase64(byte[] data) {
return Base64.getEncoder().encodeToString(data);
}
}
GenerateContentResponse 响应实体类
import lombok.Data;
import java.util.List;
@Data
public class GenerateContentResponse {
private List<Candidate> candidates;
private UsageMetadata usageMetadata;
@Data
public class Candidate {
private Content content;
private String finishReason;
private Integer index;
private List<SafetyRating> safetyRatings;
@Data
public class Content {
private List<Part> parts;
private String role;
@Data
public class Part {
private String text;
}
}
@Data
public class SafetyRating {
private String category;
private String probability;
}
}
@Data
public class UsageMetadata {
private Integer promptTokenCount;
private Integer candidatesTokenCount;
private Integer totalTokenCount;
}
}
5 Vue实现(暂未贴出)
先欠着哈,附美图一张略表歉意
6 综合案例
6.1 名人识别新闻媒体标题报道
基于名人识别、人脸分析能力,识别名人获取名人信息、名人介绍链接。结合图像分析能力,识别所处场景,总结吸引人的标题,如左图所示:”贝氏夫妇合体亮相,贝嫂高冷范儿抢镜!“
图像包含多人的场景下,经过计算匹配人脸坐标,标注名人位置展示名称。
6.2 森林火灾安全防控监测
基于图像标注,可以标注指定物理对象,如起火点、烟雾。标注对象做了实时高亮效果。该能力可应用于森林防火实时监控。
针对视频间隔固定时间取帧,分析森林起火点、火势及位置等信息。做好防火监测。图像分析可以识别是否发生烟雾,是否存在起火点及多少个起火点。及时告警做好通知,待人员介入避免火势蔓延。
6.3 城市道路交通运行监测
基于图像标注,可以标注生活常见的物理对象。如左图所示,标注了一张普通的城市道路照片,可以分析出该道路运行状况良好。
道路车辆检测,自动分析车道情况、车辆数量、是否发生交通拥堵、是否发生交通事故。同时做到对行人的安全检测。
Powered by niaonao