[ComfyUI]超越JoyCaptain速度更快更准确显存更低的图像反推Florence2PromptGen，适配Flux打标

最新推荐文章于 2025-04-28 17:42:05 发布

AIGC小王

最新推荐文章于 2025-04-28 17:42:05 发布

阅读量2.1k

点赞数 10

文章标签：人工智能 stable diffusion AI作画 ai画画 midjourney AIGC

本文链接：https://blog.csdn.net/text2201/article/details/142417522

版权

在数字艺术和创意领域，技术的进步往往带来革命性的变革。今天，我们激动地宣布，[ComfyUI]团队推出了一款令人瞩目的创新工具——Florence2PromptGen。这款工具不仅超越了JoyCaptain在速度和精度上的表现，而且在显存消耗上更加节省，完美适配Flux打标，为艺术家们提供了前所未有的创作自由和效率。

Florence2PromptGen的核心优势在于其超凡的速度和精度。无论是在图像反推还是生成过程中，Florence2PromptGen都能以极快的速度完成任务，同时保持高度的准确性。这得益于其先进的算法和优化的计算流程，使得艺术家们可以更快地实现创意，节省宝贵的时间。

此外，Florence2PromptGen在显存消耗上表现出色，相比其他同类工具，它能够显著降低显存的使用，这对于那些拥有有限显存资源的用户来说是一个巨大的优势。这意味着艺术家们可以在性能较低的设备上也能流畅地使用Florence2PromptGen，无需担心内存不足的问题。

Florence2PromptGen还完美适配Flux打标，这为艺术家们提供了更加丰富和高效的创作方式。通过与Flux打标的无缝结合，艺术家们可以更加精准地控制图像的细节和风格，从而创作出更加独特和令人印象深刻的作品。

【ComfyUI】团队一直致力于为用户提供最优质的技术和工具，Florence2PromptGen的推出再次证明了这一点。我们期待看到更多的艺术家和开发者利用Florence2PromptGen创造出令人惊叹的艺术作品，共同推动数字艺术和创意技术的边界。

在这个新的起点，[ComfyUI]团队感谢所有用户和社区成员的支持与贡献。我们将继续致力于技术创新和用户体验的提升，为艺术家们提供更多强大的工具和无限的可能性。

Florence2PromptGen的诞生，只是[ComfyUI]团队创新旅程的一部分。未来，我们将继续探索和突破，为数字艺术和创意领域带来更多的惊喜和变革。让我们一起期待，这个充满无限可能的[ComfyUI]新纪元！

之前介绍的最强图像反推工具Joy_caption占用的显存大概是 7.7G 的样子，而最新的Florence2-PromptGen 最小占用0.7G，精准度跟Joy_caption差不多，性价比简直是完胜。主要功能：

图片提示词反推
tags 模式适用于SD1.5和SDXL
mixed 模式完美适配Flux，分别用于T5XXL和CLIP_L
给目录下所有图片打标，生成同文件名的包含提示词的txt文件，一般用于Lora训练。

介绍

Florence-2-X-PromptGen-v1.5，是在 微软Florence-2 的基础上进行精细调优的一款高级图像标注工具，专门为生成和标注提示词而训练。

相比 Florence-2，不仅提升了反推质量以及准确度，最重要的是继承了 Florence-2 的低显存和超快的推理速度。

最低只需要 1G 左右的显存就可以完成推理，性价比是相当高的。

Florence-2-X-PromptGen-v1.5 也和 Florence-2 一样有 Base 和 Large 版本之分，也就是基础版和升级版，大小分别是 1Gb 和 3Gb，显存占用分别是 0.7G 和 1.8G **左右，作为对比 Joy_caption 占用的显存大概是 7.7G 的样子。

从性比价上来说，large 是完胜 joy 的，毕竟显存占用不到 2G，推理速度也比 joy 快，而且准确率也不比 joy 低。

当然这个我的测试，朋友们还是可以多尝试一些场景，看看哪个更符合自己。

MiaoshouAI:

MiaoshouAI Tagger 是一个基于微软 Florence-2 模型的高级图像标注工具，经过精细调优。该工具为您的项目提供高精度和上下文相关的图像标注。
MiaoshouAI/Florence-2-base-PromptGen 是基于微软最新的 Florence2 模型，并使用精心挑选的 Civitai 图像和标签进行训练，专门为生成和标注提示词而训练。因此，其标注结果更加符合我们通常用于生成图像的提示，提高了准确性和相关性。

MiaoshouAI主要功能

高精度：基于精选的 Civitai 图像和清洗标签数据集进行微调，生成高度精确和上下文相关的标签。
基于节点的系统：利用 ComfyUI 的节点系统的强大功能，将标注节点连接起来，结合描述性打标和关键词标注以获得最佳效果。
多功能集成：可以与其他节点（如Prompt Text Encoder）结合，达到出色的自动图像处理效果。
增强的图像训练：通过使用先进的标注和描述方法，为图像训练打标提供最佳结果。

ComfyUI 安装 MiaoshouAI 插件

插件地址(需要的同学可自行扫描获取模型插件)
在这里插入图片描述

可以在Manager搜索：miaoshou 进行安装：

在这里插入图片描述

重启即可。

注意：至少需要 transformers 版本 4.38.0

Florence-2-X-PromptGen 模型下载

模型地址(需要的同学可自行扫描获取模型插件)

推荐用_Large_ 版本。

下载后放到：ComfyUI\models\LLM 目录下。

当然，如果你有梯子的话，可以直接运行工作流，会自动下载所需的模型。

节点详解

在这里插入图片描述

model支持：
ComfyUI\models\LLM\Florence-2-large-PromptGen-v1.5
ComfyUI\models\LLM\Florence-2-base-PromptGen-v1.5
https://huggingface.co/MiaoshouAI/Florence-2-large-PromptGen-v1.5
https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v1.5
会自动下载到：
caption_method:

选择mixed时，可以生成混合的T5XXL和CLIP_L的clip数据流，这样就可以完美适配FLUX工作流，需要配合MiaoshouAI Flux CLIP Text Encode节点：

在这里插入图片描述

对于SD1.5和SDXL可以用tags。

对比如下，原图：在这里插入图片描述

tags:

solo, open mouth, simple background, sitting, closed eyes, full body, tongue, signature, no humans, chair, animal, animal focus, desk, computer, monitor, pencil, scribbled

独奏，张开嘴，简单背景，坐着，闭上眼睛，全身，舌头，签名，无人，椅子，动物，动物焦点，桌子，电脑，显示器，铅笔，潦草

simple:

a cartoon donkey sitting at a desk with a thought bubble above its head

一头卡通驴坐在桌边，头顶有一个思维泡泡

structured:

a whimsical scene featuring a cartoon donkey sitting in the center of the frame, the donkey is positioned directly in front of the camera, with its upper body facing the viewer, its eyes are closed, and its mouth is slightly open, conveying a happy expression, to the left of the donkey, a wooden chair is visible, set against a plain white background, in the foreground, a computer screen and scattered papers are visible, with a thought bubble above the donkey’s head, the overall atmosphere is one of quiet contemplation, with the donkey appearing to be deep in thought

一个异想天开的场景，一头卡通驴坐在画面中央，驴子正对着镜头，上半身面对着观众，双眼闭合，嘴巴微张，表情幸福，驴子的左边是一把木椅，背景为纯白色，前景是电脑屏幕和散落的纸张，驴子头顶上有一个思考泡泡，整体氛围是安静的沉思，驴子似乎陷入了沉思

detail:

a digital drawing in a whimsical, cartoonish style, it depicts a small, anthropomorphic donkey sitting on a wooden chair in a simple, indoor setting, the donkey has a light brown fur coat with darker brown patches on its face and ears, its eyes are closed, and its mouth is open, revealing a tongue that is slightly open, giving it a mischievous expression, the background is a plain, off-white color, and the chair is positioned to the left of the donkey, with a wooden seat and backrest visible, the chair appears to be made of light-colored wood, and it is placed on a white surface, possibly a desk or table, in front of the chair, there is a computer monitor displaying a blank screen, and a few crumpled papers and a pink eraser are scattered around the desk, suggesting a humorous or playful mood, the overall color palette is warm, with soft, muted tones that enhance the whimsical and endearing nature of the scene, the texture of the drawing is smooth and detailed, with subtle shading and highlights that give the donkey a lifelike quality

这是一幅异想天开的卡通风格的数字绘图，描绘了一只小型的拟人化驴子坐在简单的室内环境中的木椅上，驴子有一件浅棕色的皮毛大衣，脸上和耳朵上有深棕色的斑点，它的眼睛闭着，嘴巴张着，露出一条微微张开的舌头，给人一种调皮的表情，背景是纯白色的，椅子位于驴子的左边，可以看到木质的座椅和靠背，椅子似乎是用浅色木头做的，放在一个白色的表面上，可能是一张桌子或桌子，在椅子的前面，有一台电脑显示器显示一个空白的屏幕，桌子上散落着几张皱巴巴的纸和一块粉色的橡皮擦，暗示着一种幽默或好玩的心情，整体色调温暖，柔和、柔和的色调增强了场景的异想天开和可爱的性质，绘图的纹理光滑细致，微妙的阴影和高光使驴子栩栩如生

mixed:

a digital drawing in a whimsical, cartoonish style, it depicts a small, anthropomorphic donkey sitting on a wooden chair in a classroom setting, the donkey has a light brown fur coat with darker brown patches on its face and ears, and its eyes are closed, giving it a relaxed or content expression, its mouth is open, showing a tongue that is slightly open, and it appears to be smiling, there are two small, gray circles floating above its head, suggesting a thought bubble, the background is a plain, off-white color, and the desk in front of the donkey is cluttered with a computer monitor, a keyboard, and a few crumpled papers, there is also a small wooden chair to the left of the desk, partially visible, with a pink eraser and a piece of paper on it, the overall mood of the drawing is one of introspection or frustration, with the donkey’s expression and the classroom setting conveying a sense of calm and introspection, the texture of the illustration is smooth and detailed, with soft shading that enhances the realism of the character’s fur and the softness of the chair and paper, the drawing style is characterized by clean lines and soft, pastel colors, typical of modern digital art (cartoon), 1girl, solo, smile, open mouth, simple background, sitting, closed eyes, full body, closed mouth, chair, no humans, animal, desk, computer, scribbled

这是一幅异想天开的卡通风格的数字绘图，它描绘了一只坐在教室里的木椅上的小巧的拟人化驴，驴子有一件浅棕色的皮毛大衣，脸上和耳朵上有深棕色的斑点，它的眼睛是闭着的，给人一种放松或满足的表情，它的嘴巴张开，露出一条微微张开的舌头，似乎在微笑，它的头顶上漂浮着两个小小的灰色圆圈，暗示着一个思维泡泡，背景是纯白色的，驴子前面的桌子上堆满了电脑显示器、键盘和几张皱巴巴的纸，桌子左边还有一把小木椅，部分可见，上面有一块粉色的橡皮擦和一张纸，这幅画的整体基调是内省或沮丧，驴子的表情和教室环境传达出一种平静和内省的感觉，插图的纹理光滑细致，柔和的阴影增强了角色的真实感毛皮和椅子与纸张的柔软度，绘画风格以干净的线条和柔和的色彩为特征，是现代数字艺术的典型特征 \（卡通\），1女孩，独奏，微笑，张嘴，简单背景，坐着，闭上眼睛，全身，闭上嘴巴，椅子，没有人，动物，桌子，电脑，潦草