目录
1.IC-LoRA的创新性
2.模型下载与ComfyUI中使用
3.实测IC-LoRA不同任务上的效果
4.训练
5.资料
一. IC-LoRA的创新性
In-Context LoRA(IC-LoRA)是基于Flux训练的一类lora模型,和常规LoRA的一个大的区别在于,训练集上,它把相关联的图片拼接在一起,对应的文字描述也拼接在一起,然后使用ai-toolkit进行LoRA训练, 实现一次性生成多张风格和ID一致且内容相关联的图像集,作者发布了10个IC-LoRA的预训练模型,支持电影分镜头照,人物一致的人像,字体设计,图案设计等.
核心思想是将多张相关联的图像合成一张图像,把对应的文字描述也拼接整合在一起,进行LoRA训练, 将不同类型生成任务在统一的简单的范式下生成(例如人物一致性,风格一致性,字体设计,图案设计等等)。
训练数据集如下所示:
[MOVIE-SHOTS] In a series of dynamic and vivid shots, the image captures the journey of <Alex>, a stylish traveler navigating a bustling train setting; [SCENE-1] beginning with <Alex> sprinting towards a departing train, clutching a large suitcase, against a backdrop of vibrant sunset hues and distant figures wrapped in daily routine, [SCENE-2] followed by a close-up as <Alex> pauses at the train door, exuding confidence with sunglasses momentarily raised and eyes set firmly on the horizon, evoking a sense of anticipation and introspection, [SCENE-3] and concluding with <Alex> stepping into a warmly-lit cabin interior, the intricate patterns on the walls and his nonchalant demeanor suggesting both the allure of exploration and the quiet introspection of new beginnings, effectively encapsulating the essence of adventure and self-discovery.
二.模型下载与ComfyUI中使用
魔搭社区地址: https://www.modelscope.cn/models/iic/In-Context-LoRA/files
from modelscope import snapshot_download
model_dir = snapshot_download('iic/In-Context-LoRA')
print(model_dir)
下载模型,放在ComfyUI的loras文件夹下
然后使用IC-LoRA提供的工作流:https://github.com/ali-vilab/In-Context-LoRA/tree/main/workflow/film-storyboard.json加载对应模型即可
三. 实测IC-LoRA不同任务上的效果
3.1 电影分镜图像
portrait-photography.safetensors
width: 1344, height: 1728
In a tender exploration of first love, [IMAGE1] we see <Jamie> nervously arranging flowers in a park, glancing around as if waiting for someone
special, [IMAGE2] transitioning to the moment <Sam> arrives, their eyes locking in a shy smile that speaks volumes, [IMAGE3] finally showing them seated on
a bench, sharing stories and laughter, surrounded by blooming blossoms, embodying the magic of young romance
portrait-photography.safetensors
width: 1344, height: 1728
This [FOUR-PANEL] image illustrates a young artist's creative process in a bright and inspiring studio; [TOP-LEFT] she stands before a large canvas, brush in hand, adding vibrant colors to a partially completed painting, [TOP-RIGHT] she sits at a cluttered wooden table, sketching ideas in a notebook with various art supplies scattered around, [BOTTOM-LEFT] she takes a moment to step back and observe her work, adjusting her glasses thoughtfully, and [BOTTOM-RIGHT] she experiments with different textures by mixing paints directly on the palette, her focused expression showcasing her dedication to her craft.
portrait-photography.safetensors
width: 1344, height: 1728
The set of four images highlights the playful energy of a chinese young boy in a city playground. [IMAGE1] He climbs up a jungle gym with a look of
determination, his hands gripping the bars as he pulls himself up; [IMAGE2] he swings high on a set of swings, his head thrown back in laughter as his feet
touch the sky; [IMAGE3] a close-up captures him mid-slide, his eyes wide with excitement as he descends down a bright yellow slide; [IMAGE4] he races down
a pathway lined with trees, his arms pumping with energy as he chases after a soccer ball, his face alight with joy.
3.2 Couple Profile Design 情侣头像设计
couple-profile.safetensors
width: 2048, height: 1024
This pair of images features a couple as cartoon characters in
medieval attire; [IMAGE1] shows a knight with a plumed helmet and a
determined look, holding a small shield, while [IMAGE2] displays a character
dressed as a princess with a crown, smiling as they hold a flower, both against
a castle background
The pair of images depicts a couple in a cartoon-style grocery
shopping scene; [IMAGE1] one character reaches for a snack on a high shelf
with a playful grin, while [IMAGE2] the other character with wide eyes and a
towering cart of food holds a grocery list, all set in a colorful grocery aisle.
3.3 Portrait Illustration 卡通肖像
portrait-illustration.safetensors
width: 1152, height: 1088
This two-panel image presents a transformation from a realistic portrait to a playful illustration, capturing both detail and artistic flair; [LEFT] the photograph shows a woman standing in a bustling marketplace, wearing a wide-brimmed hat, a flowing bohemian dress, and a leather crossbody bag; [RIGHT] the illustration panel exaggerates her accessories and features, with the bohemian dress depicted in vibrant patterns and bold colors, while the background is simplified into abstract market stalls, giving the scene an animated and lively feel.
3.4 Font Design 字体设计
font-design.safetensors
width: 1792, height: 1216
The four-panel image showcases a playful bubble font in a vibrant pop-art style. [TOP-LEFT] displays "Python" in bright pink with a polka dot background; [TOP-RIGHT] shows "Java" in purple, surrounded by candy illustrations; [BOTTOM-LEFT] has "C++" in a mix of bright colors; [BOTTOM-RIGHT] shows "JavaScript" against a striped background, perfect for fun, kid-friendly products.
font-design.safetensors
width: 1792, height: 1216
The set of four images features a minimalist handwriting font for casual use. [IMAGE1] shows "Everyday" on a coffee cup; [IMAGE2] displays
"Notes" on a small journal; [IMAGE3] has "Live Simply" on a white pillow; [IMAGE4] shows "Good Vibes" on a cozy blanket, perfect for lifestyle and home
decor branding.
3.5 Visual Identity Design 图案设计
visual-identity-design.safetensors
width: 1472, height: 1024
In this set of two images, an eye-catching animal-themed logo is introduced and applied to a lifestyle product; [IMAGE 1] There is a simplistic black logo featuring an Android robot and the brand name "Android" on a sky blue background; [IMAGE 2] The design is printed on a gray gym bag and water bottle, and both items are placed on a wooden gym bench.
visual-identity-design.safetensors
width: 1792, height: 1216
The pair of images highlights a logo and its real-world use for a
rustic coffee brand; [IMAGE1] a striking teal background showcases a logo
with a stylized, perched bird in black and white, titled “mediajourney” in an
elegant serif font, with a leafy branch detail underneath; [IMAGE2] this logo
is applied to a coffee mug sitting atop a woven coaster on a dark mahogany
table, with a blurred background that emphasizes the warm tones and classic
aesthetic of the branding in a cozy setting
四. 训练
将数据集和配置文件,分别放入https://github.com/ostris/ai-toolkit的data和config目录下
模型基于Flux进行训练, 需配置下config下的yml中的flux的路径
然后执行 python run.py config/movie-shots.yml开始训练, 但需要的显存比较大,Flux可以使用Q4量化版本
小结
IC-LoRA巧妙的把多张图片和文本分别拼接在一起,训练LoRA模型后,一次性生成多张风格和ID一致的图像集,提供一种很好的解决人物一致性的思路.在多种应用场景都使用,个人觉得在电影分镜, 风格字体,图案设计很有意思.可以把生成的风格和id一致的图片,用于上游任务,比如:可以尝试把生成的电影分镜图像,作为AI图生视频的关键帧.
五. 资料
1.论文:https://arxiv.org/pdf/2410.23775
2.项目介绍:https://ali-vilab.github.io/In-Context-LoRA-Page/
3.b站视频:https://www.bilibili.com/video/BV1LqmYYCEVb
4.b站视频:https://www.bilibili.com/video/BV12Um6YEEHB
5.阿里重磅开源基于FLUX的In-Context LoRA,可一次生成多张风格和ID一致的图片集:https://mp.weixin.qq.com/s/wl52jpWe3rdVHzASo-S-Kg
感谢你的阅读
接下来我们继续学习输出AI相关内容,欢迎关注公众号“音视频开发之旅”,一起学习成长。
欢迎交流