超越Sora支持120秒超长AI视频模型免费开玩，免费无限制生成视频的ai，这样的ai你需要吗？附代码及详细搭建步骤，包含全套工具iPhone、Mac、Android直接下载使用

本文链接：https://blog.csdn.net/u014374009/article/details/142260249

超越Sora支持120秒超长AI视频模型免费开玩，免费无限制生成视频的ai，这样的ai你需要吗？附代码及详细搭建步骤，包含全套工具iPhone、Mac、Android直接下载使用。

在这里插入图片描述

120秒超长AI视频模型来了！不仅超越Sora极限，还免费开源！

近日，Picsart AI Resarch等团队联合发布了StreamingT2V，可以生成长达1200帧、时长为2分钟的视频，视频质量也毫不逊色。此外，作为开源世界的强大组件，StreamingT2V可以无缝兼容SVD和animatediff等模型。

在这里插入图片描述

剑指Sora！两分钟不是极限

罗马不是一天建成的！事实上，在Sora之前，Pika、Runway、Stable Video Diffusion（SVD）等视频生成模型，一般只能生成几秒钟的视频，最多延长到十几秒。Sora一出，60秒的时长直接秒杀一众模型，话题热度居高不下。

就在Sora在视频生成领域一骑绝尘时，一只拦路虎——StreamingT2V骤然上线，瞬间成为科技界的焦点。120秒的超长AI视频说来就来，虽说不能马上撼动Sora的统治地位，但至少在时长上扳回一城。

而且StreamingT2V的作者也表示，两分钟并不是模型的极限，就像之前Runway的视频可以延长一样，StreamingT2V理论上可以做到无限长。

值得一提的是，StreamingT2V作为开源世界的强大组件，还可以兼容SVD和animatediff等项目，更好地促进开源生态的发展。不过现阶段兼容的效果还不算成熟，但技术进步只是时间问题，我们可以期待它越来越精彩的表现！

在这里插入图片描述
免费开玩！体验感拉满

目前，StreamingT2V已在GitHub开源，同时还在huggingface上提供了免费试玩。消息一出，无数AI爱好者和视频创作者纷纷开始了体验。试玩的界面可以输入文字和图片两种提示，后者需要在下面的高级选项中开启。

StreamingT2V可以创建具有丰富运动动态的长视频，确保整个视频的时间一致性，并保持高帧级图像质量，而且不会出现任何停滞。

在这里插入图片描述
如今现有的文本到视频扩散模型，主要集中在高质量的短视频生成（通常为16或24帧）上，当扩展到长视频时，会出现明显的质量下降、表现生硬或者停滞等问题。而StreamingT2V，则可以将视频扩展到80、240、600、1200帧，甚至更长，并具有平滑过渡，在一致性和运动性方面优于其他模型。

在这里插入图片描述

AI长视频的发展前景可以说令人充满期待。随着StreamingT2V等先进技术的开源和普及，我们有理由相信，未来的视频创作将变得更加高效和多元，也将带给我们更多惊喜！

安装步骤：

1、下载代码：

链接: https://pan.baidu.com/s/1OBI6zDsePpy_8dX2OfFvUg 提取码: rc7j 复制这段内容后打开百度网盘手机App，操作更方便哦

2、安装依赖：

cd StreamingT2V-main
virtualenv -p python3.9 venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

3、安装FFMPEG
直接下载安装：https://www.ffmpeg.org/download.html

4、开始使用

图像到视频
从 StreamingT2V 文件夹运行由图像到视频、视频增强（包括我们的随机混合）和视频帧插值组成的整个管道：

cd code
python inference_i2v.py --input $INPUT --output $OUTPUT

$INPUT 必须是图像文件或包含图像的文件夹的路径。每幅图像的宽高比应为 16:9。

$OUTPUT 必须是存储结果的文件夹路径。

调整超参数
生成的帧数
在调用中添加 --num_frames $FR A MES ，以定义要生成的帧数。默认值：$ FRAMES=200

使用随机混合
在调用中添加 --use_randomized_blending $RB ，以定义是否使用随机混合。默认值：$ RB=False。使用随机混合时，chunk_size 和 overlap_size 参数的推荐值分别为 --chunk_size 38 和 --overlap_size 12。请注意，随机混合会减慢生成速度，因此如果 GPU 内存充足，请尽量避免使用。

输出 FPS
在调用中添加 --out_fps $FPS ，以定义输出视频的 FPS 。默认值：$ FPS=24

StreamingT2V 是一种先进的自回归技术，能制作出具有丰富运动动态的长视频，而不会出现任何停滞。它能确保整个视频的时间一致性，与描述性文本紧密配合，并保持较高的帧级图像质量。我们的演示包括多达 1200 帧、跨度达 2 分钟的成功视频实例，并可扩展至更长的时间。重要的是，StreamingT2V 的有效性不受所使用的特定 Text2Video 模型的限制，这表明基础模型的改进可以产生更高质量的视频。

运行步骤

链接: https://pan.baidu.com/s/1OBI6zDsePpy_8dX2OfFvUg 提取码: rc7j 复制这段内容后打开百度网盘手机App，操作更方便哦

cd StreamingT2V-StreamingModelscope

Install requirements using Python 3.10 and CUDA >= 11.6

conda create -n st2v python=3.10
conda activate st2v
pip install -r requirements.txt

(Optional) Install FFmpeg if it’s missing on your system

conda install conda-forge::ffmpeg

Download the weights from HF and put them into the t2v_enhanced/checkpoints directory.

mkdir t2v_enhanced/checkpoints
cd t2v_enhanced/checkpoints
wget https://huggingface.co/PAIR/StreamingT2V/resolve/main/streaming_t2v.ckpt
cd -

Inference

For Text-to-Video

cd t2v_enhanced
python inference.py --prompt="A cat running on the street"

To use other base models add the --base_model=AnimateDiff argument. Use python inference.py --help for more options.

For Image-to-Video

cd t2v_enhanced
python inference.py --image=../__assets__/demo/fish.jpg --base_model=SVD

Inference Time

ModelscopeT2V as a Base Model

Number of Frames	Inference Time for Faster Preview (256x256)	Inference Time for Final Result (720x720)
24 frames	40 seconds	165 seconds
56 frames	75 seconds	360 seconds
80 frames	110 seconds	525 seconds
240 frames	340 seconds	1610 seconds (~27 min)
600 frames	860 seconds	5128 seconds (~85 min)
1200 frames	1710 seconds (~28 min)	10225 seconds (~170 min)

AnimateDiff as a Base Model

Number of Frames	Inference Time for Faster Preview (256x256)	Inference Time for Final Result (720x720)
24 frames	50 seconds	180 seconds
56 frames	85 seconds	370 seconds
80 frames	120 seconds	535 seconds
240 frames	350 seconds	1620 seconds (~27 min)
600 frames	870 seconds	5138 seconds (~85 min)
1200 frames	1720 seconds (~28 min)	10235 seconds (~170 min)

SVD as a Base Model

Number of Frames	Inference Time for Faster Preview (256x256)	Inference Time for Final Result (720x720)
24 frames	80 seconds	210 seconds
56 frames	115 seconds	400 seconds
80 frames	150 seconds	565 seconds
240 frames	380 seconds	1650 seconds (~27 min)
600 frames	900 seconds	5168 seconds (~86 min)
1200 frames	1750 seconds (~29 min)	10265 seconds (~171 min)

All measurements were conducted using the NVIDIA A100 (80 GB) GPU. Randomized blending is employed when the frame count surpasses 80. For Randomized blending, the values for chunk_size and overlap_size are set to 112 and 32, respectively.

Gradio

The same functionality is also available as a gradio demo

cd t2v_enhanced
python gradio_demo.py

Results

Detailed results can be found in the Project page.

MAWE (Motion Aware Warp Error)

To compute the MAWE metric for a given video (see our paper for its definition) use get_mawe function from mawe.py, which you can find in the project root.

You can run it using CLI via:

python mawe.py --video_path PATH_TO_VIDEO

Or from inside your python script as:

from mawe import get_mawe

mawe = get_mawe(video_path)
print(f"MAWE for {video_path} is {mawe:0.2f}")

另一个好工具：

Viggle AI的核心魅力，在于它能够神奇地将静态图片与视频融合得天衣无缝，彻底改变了我们对视频制作的传统认知。想象一下，只需一张照片，无论是家庭聚会的温馨瞬间，还是脑洞大开的创意设计，Viggle都能让它“活”起来，成为视频中的主角。这不仅仅是一次简单的嵌入，而是通过先进的AI算法，智能识别并替换视频中的人物，让图片中的人物仿佛真的置身于那个动态场景之中，讲述属于他们的故事。

我这这里有下载了pc端就按照pc端操作

1.打开Discord，选择里面任意一个通道。

在这里插入图片描述