Sora训练与AI短片生成——Datawhale AI视频生成学习3

本文介绍了DatawhaleAI在视频生成领域的技术实践,包括Sora训练流程、VQVAE和DiffusionTransformer模型的选择与应用,以及Video-caption模型在视频生成中的作用。通过视频预处理和生成实践,展示了如何使用这些技术创作高质量的日本旅游视频脚本。
摘要由CSDN通过智能技术生成

Sora训练与AI短片生成——Datawhale AI视频生成学习3

Sora训练流程

在这里插入图片描述

Sora训练链路

搞清楚这个VAE怎么去压缩是一个关键。videoGPT中的VQVAE用了3D卷积,获得video中的时序关系的潜在表示。

在这里插入图片描述

视频数据预处理

Data-Juicer 扩展了对多模态数据的支持

视频生成模型选择

视频VQVAE

VideoGPT 使用 VQ-VAE,通过采用 3D 卷积轴向自注意力来学习原始视频的下采样离散潜在表示。然后使用一个简单的类似 GPT 的架构,使用时空位置编码对离散潜在变量进行自回归建模。用于 BAIR Robot 数据集上的视频生成,并从 UCF-101 和 Tumbler GIF 生成高保真自然图像数据集(TGIF)。

在这里插入图片描述

Diffusion Transformer
  • U-ViT
  • DiT
  • SiT

三种模型对比:(参考https://zhuanlan.zhihu.com/p/619033826?utm_psn=1743677564626051072)
在这里插入图片描述

Video-caption

OpenAI的视频标题生成模型(GPT4V),进行自动高质量文本标注,再用此组成视频文本对进行训练,保证了良好的alignment。

开发者也可以通过视频抽帧+开源VLM生成描述+LLM总结描述的方式,生成较好的视频描述。

算力评估

在这里插入图片描述

视频生成流程

在这里插入图片描述

视频生成实践

prompt:

Write a video script for a japanese traveling vlog marketed towards traveling lovers. It should focus on the scenery and famous buiding in Japan. Write in a happy mood.
The video is free for anyone to access. The video will be on-screen narrato, so factor this into your script.

generated text:

Video Script: Exploring the Landscapes and Architecture of Japan

[Opening Scene]

  • Visual: Aerial view of Mount Fuji at sunrise, with the sky painted in soft hues of pink and orange.
  • Music: Upbeat, instrumental background music begins.
  • Narration: “Konnichiwa, travelers! Welcome to our journey through Japan, a land where ancient traditions meet modern wonders. Today, we’re exploring breathtaking landscapes and iconic buildings that make Japan a must-visit destination for every travel enthusiast.”

[Scene 1: The Majesty of Mount Fuji]

  • Visual: Transition to closer shots of Mount Fuji, showcasing its snow-capped peak against clear blue skies. Cut to scenes of the surrounding lakes reflecting the mountain.
  • Narration: “Our adventure begins with the majestic Mount Fuji, Japan’s tallest peak and a symbol of beauty and tranquility. Whether you’re admiring its reflection in the serene Lake Kawaguchiko or standing in awe at its base, Fuji-san is a sight to behold.”

[Scene 2: The Ancient Capital of Kyoto]

  • Visual: Gently transition to the charming streets of Kyoto. Show iconic temples like Kinkaku-ji (The Golden Pavilion) and Fushimi Inari Shrine with its thousand torii gates.
  • Narration: “Next, we wander through Kyoto, the heart of Japan’s ancient heritage. From the glittering Golden Pavilion to the endless vermilion gates of Fushimi Inari Shrine, Kyoto captivates with its timeless beauty.”

[Scene 3: The Bustling Metropolis of Tokyo]

  • Visual: Fast-paced montage of Tokyo’s skyscrapers, the busy Shibuya Crossing, and tranquil scenes from Shinjuku Gyoen National Garden.
  • Narration: “From the tranquility of Kyoto, we dive into the vibrant energy of Tokyo. Experience the bustling Shibuya Crossing, explore the futuristic skyscrapers, and find peace within the city’s gorgeous green spaces like Shinjuku Gyoen.”

[Scene 4: The Traditional Charm of Takayama]

  • Visual: Show the preserved streets of Takayama’s old town, traditional wooden houses, and morning markets.
  • Narration: “Let’s take a step back in time in Takayama, where traditional wooden houses line the streets, offering a glimpse into Japan’s rich history and culture. The morning markets here are a treasure trove of local crafts and fresh produce.”

[Scene 5: The Scenic Beauty of Hokkaido]

  • Visual: Shots of Hokkaido’s vast landscapes, including flower fields in Furano and the serene Lake Toya.
  • Narration: “Our journey takes us north to Hokkaido, known for its breathtaking natural beauty. From the vibrant flower fields of Furano to the tranquil waters of Lake Toya, Hokkaido is a paradise for nature lovers.”

[Scene 6: The Architectural Wonder of Hiroshima]

  • Visual: Pan over the Peace Memorial Park, the Atomic Bomb Dome, and Hiroshima Castle.
  • Narration: “In Hiroshima, we reflect at the Peace Memorial Park and marvel at the resilience of the Atomic Bomb Dome. Hiroshima Castle, a testament to Japan’s feudal past, stands proudly amidst the city’s modern landscape.”

[Closing Scene]

  • Visual: Montage of highlights from the trip, ending with a beautiful sunset over a traditional Japanese garden.
  • Narration: “From awe-inspiring natural wonders to architectural marvels, Japan offers an endless journey of discovery. Thank you for joining us on this adventure through the Land of the Rising Sun. Where will your travels take you next?”

[End Screen]

  • Visual: Text overlay - “Subscribe for more travel adventures | Share the journey | Sayonara!”
  • Music: Fade out the upbeat music.
  • Narration: “Sayonara, friends! Don’t forget to subscribe and share your own travel stories with us. Until next time, keep exploring and embracing the beauty of our world.”
Video GPT by VEED尝试

https://www.veed.io/view/9a527f49-c4bb-4679-8ae3-409bd2c8859d?panel=share

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值