1.语音分段
具体安装工具请参考深度学习整理篇(一)
我们采用了py_speech_seg做AB角对话分割 https://github.com/wblgers/py_speech_seg
A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM
分割完后,进行语音转文字,正确转文字如下截图:
2.讲话人识别(识别这段话是谁讲的)
- 安装Kaldi 5.3版本
一定要安装kaldi 5.3版本,下载5.3的源码
#>cd kaldi
#>cd tools
#>cat INSTALL
#>extras/check_dependencies.sh
#>extras/install_mkl.sh
#>apt-get install sox gfortran subversio
#>make -j 4
#>cd ../src/
#>./configure --shared
#>make depend -j 8
#>make -j 8
2.安装libfvad: voice activity detection (VAD) library
https://github.com/dpirch/libfvad
[libfvad]#cd libfvad
[libfvad]# autoreconf -i
[libfvad]#./configure
[libfvad]#make
[libfvad]#make install
-
下载KaldiBasedSpeakerVerification并编译源码
https://github.com/qianhwan/KaldiBasedSpeakerVerification
#>cd KaldiBasedSpeakerVerification/mat
#>cat iepart* -> final.ie
#>cd KaldiBasedSpeakerVerification/src
#>vim makefile
#把kaldi,libfvad,KaldiBasedSpeakerVerification路径配置正确
修改代码,最后make
报下图错误,要修改源码,源码下面已经说明
The error on line 504 is straightforward to fix. It's a casting error in the return from checking to see if a file exists. By adding "(bool)" to that line 504, you will cast the return as a boolean and that solves it. Here is what the fexists bool should look like when you're done:
bool fexists(const char *filename){
ifstream ifile(filename);
return (bool)ifile;
}
Save that change and the code will compile after that.
#>make
2.修改test1Test.sh
#!/bin/bash
# KaldiBasedSpeakerVerification
# test1Test.sh
# ========================================
# Author: Qianhui Wan
# Version: 1.0.0
# Date : 2018-01-23
# ========================================
# The following lines will setup the path to each lib
# path to kaldi/src/lib
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/src/lib:$LD_LIBRARY_PATH
# path to altas
export LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH
# path to openfst
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/tools/openfst/lib:$LD_LIBRARY_PATH
# path to usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
../src/identifySpeaker ./example_data/test/174/174-168635-0000.wav
3.运行示例
先后运行examples目录的
#>test1Enroll.sh
#>test1Test.sh
3.说话人识别(第二种方案)
3.1在centos 7.8安装
使用韩国人写的说话人识别源码
https://github.com/jymsuper/SpeakerRecognition_tutorial
使用pip3进行安装所需软件:
pytorch 1.0.0
pandas 0.23.4
numpy 1.13.3
pickle 4.0
matplotlib 2.1.0
pip3 install wheel
pip3 install torch
pip3 install torchvision
pip3 install pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
pip3 install librosa -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
pickle不用安装了,因为已经包含在python3.7当中。
pip install pickle not required for python v3.7 for sure
yum install libsndfile
补充说明:如果使用CentOS 7.8并做了yum update操作后,
会出现运行文件报错,要做下面几处地方修改:
a.修改DB_wav_reader.py文件
import sys
from glob import glob
import librosa
import numpy as np
import pandas as pd
from configure import SAMPLE_RATE
np.set_printoptions(threshold=sys.maxsize)--重点