Gemini2.0:图文视频音频反推&电商图像编辑
Gemini2.0简介
在之前的文章中已经介绍谷歌最新LLM视觉大语言模型:Gemini2.0的图文编辑和反推的强大功能。今天我们将继续探索它的更多有用的功能。在近期Wan2.1视频的图生视频LORA训练兴起,但对于视频能够精确打标的模型并不多。Google Gemini Flash 2.0 Experimental 模型就是一款既能够解决图生生成和编辑,同时还能完成图文、音频、视频文件打标的工具利器。并且在这区已有了对应ComfyUI工具集成,能够支持我们直接在 ComfyUI 工作流中对文本、图像、视频帧和音频进行多模态分析,甚至是图像生成和局部修改的全功能插件。
Gemini2.0音频和视频反推ComfyUI体验
本文将使用ComfyUI-Gemini_Flash_2.0_Exp的ComfyUI工作流体验。文末网盘模型下载
• ComfyUI-Gemini_Flash_2.0_Exp:https://github.com/ShmuelRonen/ComfyUI-Gemini_Flash_2.0_Exp
• API KEY申请:访问Google AI Studio获取,注意地区限制。https://aistudio.google.com/apikey
Gemini2.0音频和视频反推ComfyUI体验
Gemini2.0音频和视频反推ComfyUI体验工作流下载:https://www.liblib.art/modelinfo/7bb3101ef23f430a8fecda33452f0003?versionUuid=1383f6d8af6b467e8d01bd9df5f9307f
注意:
- • 因为使用的远程LLM API分析图像、视频、音频,只需要申请API和正确网络访问,就能轻松实现多类型文件反推,并且能够精准分析。
- • 对于视频等文件分析结果精准度很好,特别使用于万相Wan2.1视频LORA训练反推打标。当然需要给于更好的提示词模板,生成固定格式标注。
- • 同时也支持多图文分析和图像生成功能
- • 有时会出现图像蓝屏图像,是因为免费API KEY的并发限制,需要控制下API并发访问频率。
01. 视频反推-火凤凰
能够清晰分析出视频中每一帧的细节信息。
Here's a breakdown of what I see in the video frames: * **Frames 1-4:** A woman in a long, red traditional Chinese dress is standing outdoors. She has her hair up in a bun and is wearing a red veil that covers the lower half of her face. She is standing with her hands pressed together in front of her chest in a gesture that could be a greeting or a sign of respect. The background shows a misty, mountainous landscape. * **Frame 5:** The woman is blurred, suggesting movement. Her arms are out to her sides. * **Frame 6:** A figure of the woman is blurred and superimposed with an image of a phoenix. The phoenix is in shades of orange and yellow, and there is a bright light surrounding the figures. * **Frame 7:** A phoenix is flying in the sky, with a beam of light connecting it to a point on the ground. There is a reflection of the phoenix beneath the point on the ground. Overall, the video seems to be showcasing a transformation or a magical event, possibly involving the woman and a phoenix. The setting and clothing suggest a Chinese cultural context.
02.视频反推-女毒液
Here's a breakdown of what I see in the video frames:
**Frames 1-3:**
* **Subject:** A young woman with long, dark wavy hair.
* **Attire:** She is wearing a white corset, white thigh-high stockings, and a white choker. She has a white fur-like shawl draped over her shoulder.
* **Setting:** She is sitting on a white, ornate pedestal against a gray background.
* **Overall Impression:** The image has a soft, elegant, and slightly sensual aesthetic.
**Frames 4-6:**
* **Transformation:** The woman is now covered in a black, glossy substance, resembling the Marvel character Venom.
* **Venom Mask:** A mask with Venom's characteristic white eyes and sharp teeth covers her face.
* **Overall Impression:** The image has transformed from elegant to monstrous, with the Venom-like appearance dominating the scene. The black substance appears to be engulfing her original attire.
03.音频反推
### 04.音频反推-歌曲成都

Here is the transcription of the audio and who is speaking:
Transcription:
讓我掉下眼淚的 不知昨夜的酒
讓我依依不捨的 不知你的溫柔
雨路還要走多久 你攥著我的手
讓我感到為難的是 掙扎的自由
分別總是在九月 回憶是思念的愁
深秋暖綠的垂柳 親吻著我額頭
在那座陰雨的小城裡 我從未忘記你
成都 帶不走的 只有你
和我在成都的街頭走一走 喔喔喔
直到所有的燈都熄滅了 也不停留
你會挽著我的衣袖 我會把手揣進褲兜
走到玉林路的盡頭 坐在小酒館的門口
分別總是在九月 回憶是思念的愁
深秋暖綠的垂柳 親吻著我額頭
在那座陰雨的小城裡 我從未忘記你
成都 帶不走的 只有你
和我在成都的街頭走一走 喔喔喔
直到所有的燈都熄滅了 也不停留
你會挽著我的衣袖 我會把手揣進褲兜
走到玉林路的盡頭 坐在小酒館的門口
和我在成都的街頭走一走 喔喔喔
直到所有的燈都熄滅了 也不停留
和我在成都的街頭走一走 喔喔喔
直到所有的燈都熄滅了 也不停留
你會挽著我的衣袖 我會把手揣進褲兜
走到玉林路的盡頭 走過小酒館的門口
和我在成都的街頭走一走 喔喔喔
直到所有的燈都熄滅了 也不停留
Who is speaking?
The song is sung by a male vocalist, and at the end, a child sings the chorus.
### 05.多图文反推-巾帼英雄

Here’s a description of each image:
Image 1:
- Subject: A young woman dressed in ornate, dark-colored armor with gold detailing. She has long, dark hair pulled back into a high ponytail with a red ribbon. She has fair skin, red lipstick, and what appears to be a small, stylized red mark on her cheek.
- Attire: The armor is elaborate, covering her chest and shoulders. A red garment is visible beneath the armor. She holds a sword with a bloodied blade.
- Setting: The background appears to be an out-of-focus cityscape or town, possibly with traditional Asian architecture. The sky is visible above.
- Overall Impression: The image conveys a sense of strength, beauty, and perhaps a hint of danger. The woman looks like a warrior or noblewoman, possibly after a battle.
Image 2:
- Subject: A man dressed in similar ornate armor, but with a more red and gold color scheme. He has dark hair pulled back into a topknot.
- Attire: He wears a full suit of armor with intricate gold detailing. A red robe or cloak is also part of his attire. He holds a sword and stands next to a white horse.
- Setting: The background is a blurred, reddish landscape that could be interpreted as a battlefield or a dramatic natural setting.
- Overall Impression: The image depicts a regal and powerful figure, likely a warrior or nobleman. The horse adds to the impression of status and strength. The red background creates a sense of drama and intensity.
In summary, both images depict figures in elaborate, historical-style armor, suggesting a theme of warriors or nobility in a possibly Asian-inspired setting. The first image is a close-up of a female warrior, while the second image shows a male warrior with a horse in a more dramatic setting.
### 06.图像生成-模特试衣
图一女人穿上图2衣服


### 07.图像合并-拥抱
图1女人和图2男人拥抱,保持人物和服装的一致性
本文多次抽卡风格不固定,还需要优化提示词


### 07.图像修改-局部修改
请将女人衣服换位红色长裙,请保持人物一致性输出


### 08.图像修改-文案
衣服文字logo"我",保持人物一致性


• 推荐不想本地自己折腾的同学一个可在线使用**Runninghub平台可在线体验AI应用和工作流(注册即送1000积分可用)。**主页更多精彩工作流可在线体验: https://www.runninghub.cn/user-center/1890418187312222210?utm_source=kol01-RH059 。**阿里万相-最强开源\**图生视频AI应用\**:**https://www.runninghub.cn/ai-detail/1894632237306937345?utm_source=kol01-RH059 。**AI工作流**:https://www.runninghub.cn/post/1894584540348743681/aiDetail?utm_source=kol01-RH059

为了帮助大家更好地掌握 ComfyUI,我花了几个月的时间,撰写并录制了一套ComfyUI的基础教程,共六篇。这套教程详细介绍了**选择ComfyUI的理由、其优缺点、下载安装方法、模型与插件的安装、工作流节点和底层逻辑详解、遮罩修改重绘/Inpenting模块以及SDXL工作流手把手搭建。**
> 由于篇幅原因,本文精选几个章节,详细版点击下方卡片免费领取
### 一、**ComfyUI配置指南**
- 报错指南
- 环境配置
- 脚本更新
- 后记
- .......

### 二、ComfyUI基础入门
- 软件安装篇
- 插件安装篇
- ......

### 三、 ComfyUI工作流节点/底层逻辑详解
- ComfyUI 基础概念理解
- Stable diffusion 工作原理
- 工作流底层逻辑
- 必备插件补全
- ......

### 四、ComfyUI节点技巧进阶/多模型串联
- 节点进阶详解
- 提词技巧精通
- 多模型节点串联
- ......

### 五、ComfyUI遮罩修改重绘/Inpenting模块详解
- 图像分辨率
- 姿势
- ......

### 六、ComfyUI超实用SDXL工作流手把手搭建
- Refined模型
- SDXL风格化提示词
- SDXL工作流搭建
- ......

> 由于篇幅原因,本文精选几个章节,详细版点击下方卡片免费领取

