Omost - 一句话即可AI绘画，自动生成关键词，本地一键整合包

本文链接：https://blog.csdn.net/yihuaixu/article/details/139466298

近日，ControlNet的作者推出了一个全新的项目——Omost。这个项目对现有图像模型的提示词理解有着巨大的帮助。通过很短的提示词，就可以生成非常详细并且空间表现很准确的图片。
划重点，这个项目需要8G显存显卡运行，据说老显卡，比如10系可能用不了，具体自行测试。

项目地址：https://github.com/lllyasviel/Omost

本地一键整合包下载地址：Omost - 一句话即可AI绘画，自动生成关键词，本地一键整合包

从此，天下没有难写的生图提示词

当然Omost还是和AI绘画相关：解决大伙不会写提示词的痛点。

除了自动写词外，会话模式改图也是一大亮点，第一次生成的效果不满意，可以以“聊天”方式再次发布会话内容，修改上次生成的内容。

现在只需一句超简单的提示词说明意图，Agent就会自己开始“构图”

比如写一句简短的提示词
“ generate an image of the fierce battle of warriors and the dragon”（生成勇士与龙的激烈战斗的图像）

软件就会自动开始构图：
以下构图流程部分做了中文翻译，让大家能更直观看到整个流程

# Initialize the canvas
canvas = Canvas()

# 为画布设置全局描述
canvas.set_global_description(
description='一个以勇士和龙为主角的激烈战斗场景.',
detailed_descriptions=[
      '这张照片拍摄了勇敢的战士和可怕的龙之间的史诗般的战斗.',
      '战士们身穿中世纪盔甲，挥舞着剑和盾牌，展示了他们凶猛的决心.',
      '巨龙张开巨大的翅膀，喷火照亮了黑暗的天空.',
      '背景是崎岖的山区景观，增加了场景的强度和戏剧性.',
      '战士们战略性地围绕着巨龙展开，有些人试图躲避巨龙的火焰，而另一些人则准备发动攻击.',
      '当龙在战场上威胁性地逼近时，气氛充满了紧张和即将到来的厄运感.',
      '灯光是戏剧性的，火焰在整个场景中投射出炽热的光芒和阴影.',
],
关键词='战斗，战士，龙，中世纪的盔甲，剑，盾牌，火，翅膀，黑暗的天空，崎岖的风景，山脉，紧张，戏剧，战场，火焰，阴影，史诗，激烈，激烈，决心，勇敢，迫在眉睫，威胁，充份，即将到来的厄运',
HTML_web_color_name='firebrick',
)

# 加上猛龙.
canvas.add_local_description(
location='in the center',
offset='no offset',
area='a large square area',
distance_to_viewer=5.0,
description='The fierce dragon.',
detailed_descriptions=[
      'The dragon is the centerpiece of the battle scene.',
      'Its massive wings are spread wide, revealing intricate scales and powerful muscles.',
      'The dragon’s eyes are fierce and glowing, reflecting its anger and determination.',
      'It breathes fire, which illuminates the darkening sky with bright, fiery hues.',
      'The dragon’s body is muscular and covered in protective scales, and its tail is poised to strike.',
      'The overall appearance of the dragon is one of pure power and menace, making it a formidable opponent in this epic battle.',
],
关键词='龙，翅膀，鳞片，肌肉，眼睛，火，发光，愤怒，强大，威胁，强大，对手，战斗，核心，明亮，色调，黑暗的天空，肌肉，保护，沉着，打击',
atmosphere='The atmosphere is charged with tension and the sense of impending doom.',
style='The style is dramatic and intense, with a focus on the dragon’s menacing presence.',
quality_meta='The dragon is depicted with intricate details and a powerful presence.',
HTML_web_color_name='darkred',
)

# 添加准备攻击的战士.
canvas.add_local_description(
location='on the left',
offset='slightly to the lower-right',
area='a medium-sized vertical area',
distance_to_viewer=4.5,
description='Warriors preparing to strike.',
detailed_descriptions=[
      'A group of warriors, clad in medieval armor, are positioned on the left side of the scene.',
      'They are armed with swords and shields, ready to face the dragon.',
      'Their expressions are determined and focused, showing their bravery and readiness for battle.',
      'The warriors are strategically placed to prepare for an attack, with some members slightly to the lower-right, positioned to dodge the dragon’s flames.',
      'The armor they wear is detailed, with engravings and protective elements, showcasing their skill and craftsmanship.',
      'The lighting casts dramatic shadows on their figures, emphasizing the tension and intensity of the battle.',
],
关键词='战士，中世纪盔甲，剑，盾牌，决心，专注，勇敢，准备，战斗，战略，躲闪，火焰，详细，雕刻，保护，技能，工艺，戏剧性，阴影，紧张，强度',
atmosphere='The atmosphere is intense and charged with the energy of the impending battle.',
style='The style is focused and detailed, capturing the bravery of the warriors.',
quality_meta='The warriors are depicted with detailed armor and expressions of determination.',
HTML_web_color_name='saddlebrown',
)

# 添加试图躲避火焰的战士.
canvas.add_local_description(
location='on the right',
offset='slightly to the upper-left',
area='a medium-sized vertical area',
distance_to_viewer=4.5,
description='Warriors attempting to dodge flames.',
detailed_descriptions=[
      'On the right side of the scene, a group of warriors is seen attempting to dodge the dragon’s flames.',
      'They are positioned slightly to the upper-left, with their movements swift and precise.',
      'The warriors are clad in medieval armor, and their expressions are a mix of urgency and focus.',
      'The flames from the dragon’s mouth create a dramatic contrast, with bright hues of fire illuminating the darkening sky.',
      'The warriors’ armor is detailed, with protective elements and engravings, showcasing their readiness for battle.',
      'The lighting emphasizes the urgency and intensity of the moment, casting dynamic shadows and highlights.',
],
关键词='战士，中世纪盔甲，躲闪，火焰，快速，精确，紧迫，焦点，戏剧性，对比，明亮，色调，火焰，照明，变暗

放大翻译成中文来看，用户简短的提示词会被拆解扩展，从图像全局描述到局部每个元素的都会详细说明，直观地指定图像中各个元素的位置和大小。

之后，特定图像生成器根据LLM描绘的“蓝图”创建最终的图像。比如，我们刚才生成的“人龙大战”

而且，已经完成的图像整体布局可以保留，想修改画面中的某个元素，也只需一句提示词。
比如我想把当前生成的龙改成恐龙，只需要执行一句话 " change the dragon to a dinosaur "，效果如下

我们可以将Omost生成的关键词复制到其他AI绘画软件，比如SD里，同样能生成酷炫的效果。

目前，Omost用来生成代码的LLM有基于Llama3和Phi3变体的三种模型。

项目亮点:

　　自动扩展提示词:Omost能够将简单的提示词拆解成详细的描述，从图像整体到局部元素的位置和大小均能详细说明。例如输入“a funny cartoon batman fights joker”，系统会生成蝙蝠侠与小丑战斗的完整图像。

　　高灵活性:生成的图像布局可以保留，用户可以通过简单的提示词对图像中的某个元素进行修改。比如，将龙变成恐龙，系统会根据新提示生成修改后的图像。

　　图像位置编码:Omost通过将图像划分为729个不同的位置来简化图像元素的描述。每个位置包括预定义的参数，如位置、偏移量和区域，确保图像生成的准确性和细致度。

　　子提示系统:所有Omost LLM都经过训练，可以提供严格定义的“子提示”，这些子提示可以独立描述事物，并任意组合形成完整的提示。这种设计提高了提示词的灵活性和准确性。

　　注意力操纵:Omost使用注意力分数调整技术来控制图像生成过程中的区域关注度，实现更精细的图像生成。通过调整注意力分数，Omost能够生成符合提示词描述的图像元素。

　　提示前缀树:Omost引入提示前缀树技术，通过合并子提示来改进提示理解和描述。例如，可以将路径“a cat and a dog. the cat on the sofa”作为提示，从而生成相应图像。

Omost的实现和使用：

　　Omost项目基于Llama3和Phi3变体模型，用户可以通过提供简单的提示词来生成复杂的图像。以下是该项目的几个关键组件:

　　位置和偏移量:将图像划分为9个位置，每个位置进一步划分为81个偏移量，共有729个边界框，用于描述图像元素的位置。

　　distance_to_viewer和HTML_web_color_name:用于调整图像元素的视觉表现，通过组合这些参数可以生成粗略的图像构图。

　　注意力操纵:基于注意力分数操作的baseline渲染器，通过调整注意力分数来控制不同区域的模型关注度。

应用和前景

　　Omost技术的推出，不仅简化了提示词的编写，还提高了图像生成的精确度和灵活性。其应用场景包括但不限于AI绘画、图像设计、广告创意、教育等领域。用户可以通过简单的提示词生成复杂的图像，为创意设计提供了强大的工具支持。