深度学习整理篇（二）语音分段和讲话人语音文件识别

最新推荐文章于 2023-09-05 11:35:54 发布

我还要去追逐我的梦

最新推荐文章于 2023-09-05 11:35:54 发布

阅读量2k

点赞数 1

分类专栏：深度学习文章标签：深度学习语音识别

本文链接：https://blog.csdn.net/penker_zhao/article/details/107757419

版权

深度学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.语音分段

具体安装工具请参考深度学习整理篇（一）

我们采用了py_speech_seg做AB角对话分割 https://github.com/wblgers/py_speech_seg

A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM

分割完后，进行语音转文字，正确转文字如下截图：

2.讲话人识别（识别这段话是谁讲的）

安装Kaldi 5.3版本

一定要安装kaldi 5.3版本，下载5.3的源码

#>cd kaldi

#>cd tools

#>cat INSTALL

#>extras/check_dependencies.sh

#>extras/install_mkl.sh

#>apt-get install sox gfortran subversio

#>make -j 4

#>cd ../src/

#>./configure --shared

#>make depend -j 8

#>make -j 8

2.安装libfvad: voice activity detection (VAD) library

https://github.com/dpirch/libfvad

[libfvad]#cd libfvad
[libfvad]# autoreconf -i
[libfvad]#./configure
[libfvad]#make
[libfvad]#make install

下载KaldiBasedSpeakerVerification并编译源码

https://github.com/qianhwan/KaldiBasedSpeakerVerification

#>cd KaldiBasedSpeakerVerification/mat

#>cat iepart* -> final.ie

#>cd KaldiBasedSpeakerVerification/src

#>vim makefile

#把kaldi,libfvad,KaldiBasedSpeakerVerification路径配置正确

修改代码，最后make

报下图错误，要修改源码，源码下面已经说明

The error on line 504 is straightforward to fix. It's a casting error in the return from checking to see if a file exists. By adding "(bool)" to that line 504, you will cast the return as a boolean and that solves it. Here is what the fexists bool should look like when you're done:

bool fexists(const char *filename){

ifstream ifile(filename);

return (bool)ifile;

}

Save that change and the code will compile after that.

#>make

2.修改test1Test.sh

#!/bin/bash
# KaldiBasedSpeakerVerification
# test1Test.sh
# ========================================
# Author: Qianhui Wan
# Version: 1.0.0
# Date : 2018-01-23
# ========================================
# The following lines will setup the path to each lib
# path to kaldi/src/lib
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/src/lib:$LD_LIBRARY_PATH
# path to altas
export LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH
# path to openfst
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/tools/openfst/lib:$LD_LIBRARY_PATH
# path to usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

../src/identifySpeaker ./example_data/test/174/174-168635-0000.wav

3.运行示例

先后运行examples目录的

#>test1Enroll.sh

#>test1Test.sh

3.说话人识别（第二种方案）

3.1在centos 7.8安装

使用韩国人写的说话人识别源码

https://github.com/jymsuper/SpeakerRecognition_tutorial

使用pip3进行安装所需软件：

pytorch 1.0.0
pandas 0.23.4
numpy 1.13.3
pickle 4.0
matplotlib 2.1.0

pip3 install wheel
pip3 install torch

pip3 install torchvision
pip3 install pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
pip3 install librosa -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

pickle不用安装了，因为已经包含在python3.7当中。

pip install pickle not required for python v3.7 for sure

yum install libsndfile

补充说明：如果使用CentOS 7.8并做了yum update操作后，

会出现运行文件报错，要做下面几处地方修改：

a.修改DB_wav_reader.py文件
import sys
from glob import glob

import librosa
import numpy as np
import pandas as pd

from configure import SAMPLE_RATE

np.set_printoptions(threshold=sys.maxsize)--重点