Diffusion 加速系列之二|Consistency Models

1.Ref

  • https://arxiv.org/pdf/2303.01469

  • https://zeqiang-lai.github.io/blog/posts/ai/consistency_model/

  • https://kevinng77.github.io/posts/notes/articles/%E7%AC%94%E8%AE%B0speed_sd.html#cm-lcm-lcm-lora

  • https://wrong.wang/blog/20231111-consistency-is-all-you-need/

  • https://blog.csdn.net/weixin_54338498/article/details/130174582

2. Consistency Models(CM)

  A. DM multi-step denoising

        特别消耗计算资源,所以模型的加速需求是比较强烈的。

  B. DM 优化方案

  • 剪枝策略:通用剪枝策略 diff-pruning,在多步 model 采用统一的剪枝策略。

  • Handcraft Arch

  • Block-wise pruning:DeepCach

### Stable Video Diffusion Implementation and Techniques In the context of video processing within computer vision, stable video diffusion refers to methods that ensure consistency across frames while performing tasks such as denoising or pose estimation. For instance, VIBE (Video Inference for Human Body Pose and Shape Estimation) employs a robust approach combining temporal smoothing with spatial refinement to achieve stability over time[^1]. #### Temporal Smoothing Temporal smoothing is crucial for maintaining consistent estimations throughout consecutive frames. This technique leverages information from neighboring frames to refine current frame predictions. Specifically, algorithms like Kalman filters can be utilized where prediction updates are made based on previous states. ```python import numpy as np def kalman_filter(x, P, measurement): """ Simple example of applying Kalman filter. :param x: State estimate vector :param P: Estimate covariance matrix :param measurement: Current measurement value """ H = np.eye(len(x)) # Measurement function R = np.eye(len(measurement)) * 0.1 # Measurement noise y = measurement - np.dot(H, x) S = np.dot(np.dot(H, P), H.T) + R K = np.dot(np.dot(P, H.T), np.linalg.inv(S)) x = x + np.dot(K, y) I = np.eye(len(x)) P = np.dot((I - np.dot(K, H)), P) return x, P ``` #### Spatial Refinement Spatial refinement focuses on enhancing local details by considering pixel-level relationships within individual frames. Convolutional neural networks (CNNs) play an essential role here due to their ability to capture hierarchical patterns effectively. By integrating CNN-based models into pipelines, more accurate feature extraction becomes possible leading to improved overall performance. For both aspects mentioned above, it's important to note how they contribute towards achieving stable results when dealing with dynamic scenes involving human body movements captured through videos.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值