Phase-Based Video Motion Processing论文复现

Bawinn

已于 2024-01-09 21:41:03 修改

阅读量983

点赞数 7

文章标签：论文阅读 python 经验分享

于 2023-12-30 19:22:32 首次发布

本文链接：https://blog.csdn.net/qq_47415763/article/details/135280308

版权

Phase-Based Video Motion Processing论文复现

一、论文阅读

Neal Wadhwa, Michael Rubinstein, Frédo Durand, and William T. Freeman. Phase-based video motion processing. ACM Transactions on Graphics (TOG), 2013. Harvard
精读了一遍，大致了解了要解决的问题、灵感、思路来源、实验设置、结果、结论等。

二、系统、软件安装

前面的博客已经介绍了，ubuntu+pycharm+github
链接: https://blog.csdn.net/qq_47415763/article/details/135263742?spm=1001.2014.3001.5502
链接: https://blog.csdn.net/qq_47415763/article/details/135263742?spm=1001.2014.3001.5502

三、具体方法原理

参考github代码复现：https://rxian.github.io/phase-video/#complex-steerable-pyramid
参考：图像金字塔，原理、实现及应用http://wed.xjx100.cn/news/18980.html?action=onClick
参考：10.图像金字塔原理解析https://www.bilibili.com/video/BV1VT4y1Z71u/
待解决：复杂可控金字塔如何实现？
github作者的实现伪代码看不懂。

基于相位的欧拉视频运动放大方法

# 1、利用复杂可控金字塔对输入视频帧提取频谱
# 2、对金字塔中每个图像的相位分量进行带通滤波，隔离特定频率下的运动
# 3、使用滤波器进行图像平滑
# 4、将得到的图像乘以倍数因子，加回相应金字塔的相应帧的相位分量（倍数大于1则放大，倍数小于1则衰减）
# 5、通过折叠金字塔重建视频

①复杂可控金字塔

# 一、复杂可控金字塔
# 原理
# 1、对快速傅里叶变换后图像进行滤波
# 2、滤波器可用于构造和折叠金字塔；另外还有选择方向
# 3、高通滤波乘以低通滤波代表整张图像的频谱，频谱再乘以一个方向就可以实现特定方向的图像滤波
# 步骤
# 1、输入特定的实值正方形图像
# 2、确定金字塔的深度
# 3、确定滤波的方向
# 4、确定滤波的频程
# 5、确定要使用的滤波器
# 6、金字塔X频程数X方向数 = 金字塔

参考伪代码
在这里插入图片描述

// 代码学习
import numpy as np
import scipy.fftpack
from tqdm import tqdm

#图像转换金字塔代码注释
def im2pyr(im,D,N,K,verbose=False):
    '''
    Transform an image to complex steerable pyramid representation.

    @type  im: real-valued numpy.ndarray of shape (B,H,W)
    @param im: Batched images to be transformed.
    @type  D: integer
    @param D: Depth of pyramid (number of octaves).
    @type  N: integer
    @param N: Number of suboctaves per octave.
    @type  K: integer
    @param K: Number of pyramid orientations.
    @rtype:   (P, Rh, Rl) 3-tuple; P is a nested list of shape 
              (D,N,K), Rh and Rl are numpy.ndarrays of 2D
    @return:  P stores the images in the pyramid; Rh and Rl are
              highpass and lowpass residuals.
    '''
    dft = scipy.fftpack.fft2 #定义傅里叶变换函数
    idft = scipy.fftpack.ifft2 #定义逆傅里叶变换函数

    if verbose: pbar = tqdm(total=D*N*K) #这个是什么意思？

    I = dft(im)#对图像进行二维傅里叶变换，将其从空域转为频域
    Rh = idft(apply_filter(I,lambda r, th: highpass_filter(r/2.,th)))#进行高通滤波，获得高通残差
    P = []#构建图像金字塔
    for d in range(D):#以金字塔深度作为循环，从下往上
        this_D = []
        for n in range(N):#以频程数作为循环
            this_n = []
            for k in range(K):#以金字塔方向数作为循环
                this_n.append(idft(apply_filter(I,lambda r, th: pyramid_filter(r,th,n,N,k,K))))#进行金字塔滤波后逆傅里叶变换
                if verbose: pbar.update(1)#这个是什么意思？
            this_D.append(this_n)
        P.append(this_D)
        I = downsample2(apply_filter(I,lowpass_filter))#进行低通滤波后下采样
    Rl = idft(I)#对下采样图像进行二维傅里叶逆变换

    if verbose: pbar.close()
    return P, Rh, Rl#返回图像金字塔、高通滤波残差、低通滤波残差

四、实验过程

请添加图片描述 github源代码中标注了用到的库的版本，据此在安装好的pycharm环境下另外安装需要的库

①result.py的更改

// 修改内容
a.运行result.py主程序时，报错（具体我忘了，但这种一开始肯定是输入路径的问题）
b.由于原博主的输入视频crane_crop.mp4找不到，就去2013年经典代码里找到了color-Crane003_crop-quarterOctave.avi替代
c.由于在运行过程中报错，运行内存不足，由于D,N,K是三个for循环，我就将方向K=8->2
d.由于输出视频效果失真，我将放大倍率75改为5

    '''
    #input_path = 'crane_crop.mp4'
    input_path = '/home/xiao/PycharmProjects/pythonProject/color-Crane003_crop-quarterOctave.avi'
    #output_path = 'crane_crop_magnified.mpeg'
    output_path = '/home/xiao/PycharmProjects/pythonProject/crane_crop_magnified1.avi'
    
    #alpha = 75
    alpha = 5
    #D,N,K = 3,2,8
    D, N, K = 3, 2, 2
    fl,fh = 0.2, 0.25

2、phasebased.py的更改

// 修改内容
np.complex(0,1)改为np.complex128(1)
括号里的数字0,1代表了实数部分和虚数部分，可以查下具体用法

    ## Motion editting
    print("Modifying motion")
    if verbose: pbar = tqdm(total=D*N*K)
    for d in range(D):
        for n in range(N):
            for k in range(K):
                P_frames = np.pad(Ps[d][n][k],((0,pad),(0,0),(0,0)),mode='edge')
                P_frames = np.moveaxis(P_frames,0,-1)
                delta_phi_dft = scipy.fftpack.fft(np.angle(P_frames),axis=-1) * np.broadcast_to(F,P_frames.shape)
                delta_phi = np.real(scipy.fftpack.ifft(delta_phi_dft,axis=-1))[:,:,:T]
                P_frames = P_frames[:,:,:T]
                # P_frames *= np.exp(alpha*np.complex(0,1)*delta_phi)
                P_frames *= np.exp(alpha*np.complex128(1)*delta_phi)
                Ps[d][n][k] = np.moveaxis(P_frames,-1,0)
                if verbose: pbar.update(1)
    if verbose: pbar.close()

    ## Inverse transform the edited frames
    print("Converting modified frames from pyramid")
    mframes = np.real(pyr2im(Ps,Rhs,Rls,verbose=verbose))

    return mframes

3、complexsteerablepyramid.py的更改

和2部分的代码修改一致，np.complex(0,1)改为np.complex128(1)

4、utils.py的更改

// 修改内容
由于输出视频格式无法在我的ubuntu系统上正常播放题，因此更改了输出格式，#后为原代码
以及result.py的outputpath的路径
参考https://blog.csdn.net/weixin_40671425/article/details/109035231

ef numpy2video(path, frames, fs=30.0):
    '''
    Inverse operation of video2numpy.
    '''
    #codec = cv2.VideoWriter_fourcc(*'XVID')
    codec = cv2.VideoWriter_fourcc('M', 'J', 'P', 'G')  # MJPG
    writer = cv2.VideoWriter(path, codec, fs, (frames.shape[2], frames.shape[1]), isColor=True)

    T = len(frames)
    for t in range(T):
        writer.write(frames[t])

    writer.release()