CodeTalker 踩坑实录

最新推荐文章于 2024-06-22 09:30:50 发布

AI算法网奇

最新推荐文章于 2024-06-22 09:30:50 发布

阅读量351

点赞数 7

分类专栏：深度学习基础文章标签：人工智能

本文链接：https://blog.csdn.net/jacke121/article/details/138222186

版权

深度学习基础专栏收录该内容

166 篇文章 17 订阅

订阅专栏

文章讲述了在使用GitHub上Doubiiu/CodeTalker项目进行语音驱动3D面部动画时遇到的问题，包括Wav2Vec2模块的transpose错误、特征投影处理、以及渲染和文件权限问题，同时提到了解决这些问题的方法和限制条件。

摘要由CSDN通过智能技术生成

开源地址

GitHub - Doubiiu/CodeTalker: [CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

提供了预训练

运行报错

  File "D:\Program Files\miniconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 397, in forward
    hidden_states = hidden_states.transpose(1, 2)
AttributeError: 'tuple' object has no attribute 'transpose'

原因：Wav2Vec2FeatureProjection返回了两组向量

class Wav2Vec2FeatureProjection(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.layer_norm = nn.LayerNorm(config.conv_dim[-1], eps=config.layer_norm_eps)
        self.projection = nn.Linear(config.conv_dim[-1], config.hidden_size)
        self.dropout = nn.Dropout(config.feat_proj_dropout)

    def forward(self, hidden_states):
        # non-projected hidden states are needed for quantization
        norm_hidden_states = self.layer_norm(hidden_states)
        hidden_states = self.projection(norm_hidden_states)
        hidden_states = self.dropout(hidden_states)
        return hidden_states, norm_hidden_states

临时解决方法：

把向量取一个

        encoder_outputs = self.encoder(
            hidden_states[0],
            attention_mask=attention_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

报错2，渲染生成mp4报错没有权限

E:\project\audio\audio2face\CodeTalker-main\demo\output2\tmpdxzolz5y.mp4: Permission denied

这个原因是路径有点复杂，把路径名字取得简单点就好了。

一次只能生成20094长度，12秒，否则计算报错

File "D:\Program Files\miniconda3\lib\site-packages\torch\nn\functional.py", line 5359, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).