音视频开发之旅(101)-In-Context LoRA生成人物&风格一致性的电影分镜图集

音视频开发之旅

已于 2024-11-14 10:17:06 修改

阅读量1.5k

点赞数 28

分类专栏：音视频开发之旅深度学习 AIGC 文章标签：音视频 AIGC

于 2024-11-13 21:55:14 首次发布

本文链接：https://blog.csdn.net/u011570979/article/details/143753488

版权

音视频开发之旅同时被 3 个专栏收录

103 篇文章

订阅专栏

深度学习

17 篇文章

订阅专栏

AIGC

15 篇文章

订阅专栏

一. IC-LoRA的创新性

In-Context LoRA(IC-LoRA)是基于Flux训练的一类lora模型,和常规LoRA的一个大的区别在于,训练集上,它把相关联的图片拼接在一起,对应的文字描述也拼接在一起,然后使用ai-toolkit进行LoRA训练, 实现一次性生成多张风格和ID一致且内容相关联的图像集,作者发布了10个IC-LoRA的预训练模型,支持电影分镜头照,人物一致的人像,字体设计,图案设计等.

核心思想是将多张相关联的图像合成一张图像,把对应的文字描述也拼接整合在一起,进行LoRA训练, 将不同类型生成任务在统一的简单的范式下生成（例如人物一致性,风格一致性,字体设计,图案设计等等）。

训练数据集如下所示:

[MOVIE-SHOTS] In a series of dynamic and vivid shots, the image captures the journey of <Alex>, a stylish traveler navigating a bustling train setting; [SCENE-1] beginning with <Alex> sprinting towards a departing train, clutching a large suitcase, against a backdrop of vibrant sunset hues and distant figures wrapped in daily routine, [SCENE-2] followed by a close-up as <Alex> pauses at the train door, exuding confidence with sunglasses momentarily raised and eyes set firmly on the horizon, evoking a sense of anticipation and introspection, [SCENE-3] and concluding with <Alex> stepping into a warmly-lit cabin interior, the intricate patterns on the walls and his nonchalant demeanor suggesting both the allure of exploration and the quiet introspection of new beginnings, effectively encapsulating the essence of adventure and self-discovery.

二.模型下载与ComfyUI中使用

魔搭社区地址: https://www.modelscope.cn/models/iic/In-Context-LoRA/files

from modelscope import snapshot_downloadmodel_dir = snapshot_download('iic/In-Context-LoRA')print(model_dir)

下载模型,放在ComfyUI的loras文件夹下

然后使用IC-LoRA提供的工作流:https://github.com/ali-vilab/In-Context-LoRA/tree/main/workflow/film-storyboard.json加载对应模型即可

三. 实测IC-LoRA不同任务上的效果

3.1 电影分镜图像

portrait-photography.safetensors width: 1344, height: 1728     In a tender exploration of first love, [IMAGE1] we see <Jamie> nervously arranging flowers in a park, glancing around as if waiting for someonespecial, [IMAGE2] transitioning to the moment <Sam> arrives, their eyes locking in a shy smile that speaks volumes, [IMAGE3] finally showing them seated ona bench, sharing stories and laughter, surrounded by blooming blossoms, embodying the magic of young romance

portrait-photography.safetensorswidth: 1344, height: 1728    
This [FOUR-PANEL] image illustrates a young artist's creative process in a bright and inspiring studio; [TOP-LEFT] she stands before a large canvas, brush in hand, adding vibrant colors to a partially completed painting, [TOP-RIGHT] she sits at a cluttered wooden table, sketching ideas in a notebook with various art supplies scattered around, [BOTTOM-LEFT] she takes a moment to step back and observe her work, adjusting her glasses thoughtfully, and [BOTTOM-RIGHT] she experiments with different textures by mixing paints directly on the palette, her focused expression showcasing her dedication to her craft.

portrait-photography.safetensorswidth: 1344, height: 1728    
The set of four images highlights the playful energy of a chinese young boy in a city playground. [IMAGE1] He climbs up a jungle gym with a look ofdetermination, his hands gripping the bars as he pulls himself up; [IMAGE2] he swings high on a set of swings, his head thrown back in laughter as his feettouch the sky; [IMAGE3] a close-up captures him mid-slide, his eyes wide with excitement as he descends down a bright yellow slide; [IMAGE4] he races downa pathway lined with trees, his arms pumping with energy as he chases after a soccer ball, his face alight with joy.

3.2 Couple Profile Design 情侣头像设计

couple-profile.safetensors    width: 2048, height: 1024
This pair of images features a couple as cartoon characters inmedieval attire; [IMAGE1] shows a knight with a plumed helmet and adetermined look, holding a small shield, while [IMAGE2] displays a characterdressed as a princess with a crown, smiling as they hold a flower, both againsta castle background

The pair of images depicts a couple in a cartoon-style groceryshopping scene; [IMAGE1] one character reaches for a snack on a high shelfwith a playful grin, while [IMAGE2] the other character with wide eyes and atowering cart of food holds a grocery list, all set in a colorful grocery aisle.

3.3 Portrait Illustration 卡通肖像

portrait-illustration.safetensors    width: 1152, height: 1088    
This two-panel image presents a transformation from a realistic portrait to a playful illustration, capturing both detail and artistic flair; [LEFT] the photograph shows a woman standing in a bustling marketplace, wearing a wide-brimmed hat, a flowing bohemian dress, and a leather crossbody bag; [RIGHT] the illustration panel exaggerates her accessories and features, with the bohemian dress depicted in vibrant patterns and bold colors, while the background is simplified into abstract market stalls, giving the scene an animated and lively feel.

3.4 Font Design 字体设计

font-design.safetensors    width: 1792, height: 1216
The four-panel image showcases a playful bubble font in a vibrant pop-art style. [TOP-LEFT] displays "Python" in bright pink with a polka dot background; [TOP-RIGHT] shows "Java" in purple, surrounded by candy illustrations; [BOTTOM-LEFT] has "C++" in a mix of bright colors; [BOTTOM-RIGHT] shows "JavaScript" against a striped background, perfect for fun, kid-friendly products.

font-design.safetensors    width: 1792, height: 1216
The set of four images features a minimalist handwriting font for casual use. [IMAGE1] shows "Everyday" on a coffee cup; [IMAGE2] displays"Notes" on a small journal; [IMAGE3] has "Live Simply" on a white pillow; [IMAGE4] shows "Good Vibes" on a cozy blanket, perfect for lifestyle and homedecor branding.

3.5 Visual Identity Design 图案设计

visual-identity-design.safetensors    width: 1472, height: 1024
In this set of two images, an eye-catching animal-themed logo is introduced and applied to a lifestyle product; [IMAGE 1] There is a simplistic black logo featuring an Android robot and the brand name "Android" on a sky blue background; [IMAGE 2] The design is printed on a gray gym bag and water bottle, and both items are placed on a wooden gym bench.

visual-identity-design.safetensors    width: 1792, height: 1216
The pair of images highlights a logo and its real-world use for arustic coffee brand; [IMAGE1] a striking teal background showcases a logowith a stylized, perched bird in black and white, titled “mediajourney” in anelegant serif font, with a leafy branch detail underneath; [IMAGE2] this logois applied to a coffee mug sitting atop a woven coaster on a dark mahoganytable, with a blurred background that emphasizes the warm tones and classicaesthetic of the branding in a cozy setting

四. 训练

将数据集和配置文件,分别放入https://github.com/ostris/ai-toolkit的data和config目录下

模型基于Flux进行训练, 需配置下config下的yml中的flux的路径

然后执行 python run.py config/movie-shots.yml开始训练, 但需要的显存比较大,Flux可以使用Q4量化版本

小结

IC-LoRA巧妙的把多张图片和文本分别拼接在一起,训练LoRA模型后,一次性生成多张风格和ID一致的图像集,提供一种很好的解决人物一致性的思路.在多种应用场景都使用,个人觉得在电影分镜, 风格字体,图案设计很有意思.可以把生成的风格和id一致的图片,用于上游任务,比如:可以尝试把生成的电影分镜图像,作为AI图生视频的关键帧.