Sora训练与AI短片生成——Datawhale AI视频生成学习3
文章目录
Sora训练流程
Sora训练链路
搞清楚这个VAE怎么去压缩是一个关键。videoGPT中的VQVAE用了3D卷积,获得video中的时序关系的潜在表示。
视频数据预处理
Data-Juicer 扩展了对多模态数据的支持
视频生成模型选择
视频VQVAE
VideoGPT 使用 VQ-VAE,通过采用 3D 卷积和轴向自注意力来学习原始视频的下采样离散潜在表示。然后使用一个简单的类似 GPT 的架构,使用时空位置编码对离散潜在变量进行自回归建模。用于 BAIR Robot 数据集上的视频生成,并从 UCF-101 和 Tumbler GIF 生成高保真自然图像数据集(TGIF)。
Diffusion Transformer
- U-ViT
- DiT
- SiT
三种模型对比:(参考https://zhuanlan.zhihu.com/p/619033826?utm_psn=1743677564626051072)
Video-caption
OpenAI的视频标题生成模型(GPT4V),进行自动高质量文本标注,再用此组成视频文本对进行训练,保证了良好的alignment。
开发者也可以通过视频抽帧+开源VLM生成描述+LLM总结描述的方式,生成较好的视频描述。
算力评估
视频生成流程
视频生成实践
prompt:
Write a video script for a japanese traveling vlog marketed towards traveling lovers. It should focus on the scenery and famous buiding in Japan. Write in a happy mood.
The video is free for anyone to access. The video will be on-screen narrato, so factor this into your script.
generated text:
Video Script: Exploring the Landscapes and Architecture of Japan
[Opening Scene]
- Visual: Aerial view of Mount Fuji at sunrise, with the sky painted in soft hues of pink and orange.
- Music: Upbeat, instrumental background music begins.
- Narration: “Konnichiwa, travelers! Welcome to our journey through Japan, a land where ancient traditions meet modern wonders. Today, we’re exploring breathtaking landscapes and iconic buildings that make Japan a must-visit destination for every travel enthusiast.”
[Scene 1: The Majesty of Mount Fuji]
- Visual: Transition to closer shots of Mount Fuji, showcasing its snow-capped peak against clear blue skies. Cut to scenes of the surrounding lakes reflecting the mountain.
- Narration: “Our adventure begins with the majestic Mount Fuji, Japan’s tallest peak and a symbol of beauty and tranquility. Whether you’re admiring its reflection in the serene Lake Kawaguchiko or standing in awe at its base, Fuji-san is a sight to behold.”
[Scene 2: The Ancient Capital of Kyoto]
- Visual: Gently transition to the charming streets of Kyoto. Show iconic temples like Kinkaku-ji (The Golden Pavilion) and Fushimi Inari Shrine with its thousand torii gates.
- Narration: “Next, we wander through Kyoto, the heart of Japan’s ancient heritage. From the glittering Golden Pavilion to the endless vermilion gates of Fushimi Inari Shrine, Kyoto captivates with its timeless beauty.”
[Scene 3: The Bustling Metropolis of Tokyo]
- Visual: Fast-paced montage of Tokyo’s skyscrapers, the busy Shibuya Crossing, and tranquil scenes from Shinjuku Gyoen National Garden.
- Narration: “From the tranquility of Kyoto, we dive into the vibrant energy of Tokyo. Experience the bustling Shibuya Crossing, explore the futuristic skyscrapers, and find peace within the city’s gorgeous green spaces like Shinjuku Gyoen.”
[Scene 4: The Traditional Charm of Takayama]
- Visual: Show the preserved streets of Takayama’s old town, traditional wooden houses, and morning markets.
- Narration: “Let’s take a step back in time in Takayama, where traditional wooden houses line the streets, offering a glimpse into Japan’s rich history and culture. The morning markets here are a treasure trove of local crafts and fresh produce.”
[Scene 5: The Scenic Beauty of Hokkaido]
- Visual: Shots of Hokkaido’s vast landscapes, including flower fields in Furano and the serene Lake Toya.
- Narration: “Our journey takes us north to Hokkaido, known for its breathtaking natural beauty. From the vibrant flower fields of Furano to the tranquil waters of Lake Toya, Hokkaido is a paradise for nature lovers.”
[Scene 6: The Architectural Wonder of Hiroshima]
- Visual: Pan over the Peace Memorial Park, the Atomic Bomb Dome, and Hiroshima Castle.
- Narration: “In Hiroshima, we reflect at the Peace Memorial Park and marvel at the resilience of the Atomic Bomb Dome. Hiroshima Castle, a testament to Japan’s feudal past, stands proudly amidst the city’s modern landscape.”
[Closing Scene]
- Visual: Montage of highlights from the trip, ending with a beautiful sunset over a traditional Japanese garden.
- Narration: “From awe-inspiring natural wonders to architectural marvels, Japan offers an endless journey of discovery. Thank you for joining us on this adventure through the Land of the Rising Sun. Where will your travels take you next?”
[End Screen]
- Visual: Text overlay - “Subscribe for more travel adventures | Share the journey | Sayonara!”
- Music: Fade out the upbeat music.
- Narration: “Sayonara, friends! Don’t forget to subscribe and share your own travel stories with us. Until next time, keep exploring and embracing the beauty of our world.”
Video GPT by VEED尝试
https://www.veed.io/view/9a527f49-c4bb-4679-8ae3-409bd2c8859d?panel=share