1. Ubuntu上安装Kaldi ToolKit
安装git
i) git --version
ii) sudo apt install git
iii) git –version : 2.7.4
ix) git config --global user.name “git”
x) git config --global user.email zhaodpx@163.com
xi) git config --list
xii) git init
xiii) git init newrepo
安装Kaldi Toolkit
主要参考: http://kaldi-asr.org/doc/install.html
Git主页:https://github.com/tzyll/kaldi
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
下载后,所需操作为:http://kaldi-asr.org/doc/tutorial_setup.html
cd kaldi/tools/; make; cd ../src; ./configure; make
第一步:cd kaldi/tools/
第二步:make,显示:
zhaodeng@ubuntu:~/kaldi/tools$ make
extras/check_dependencies.sh
extras/check_dependencies.sh: zlib is not installed.
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: sox is not installed.
extras/check_dependencies.sh: gfortran is not installed.
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: Intel MKL is not installed. Run extras/install_mkl.sh to install it.
... You can also use other matrix algebra libraries. For information, see:
... http://kaldi-asr.org/doc/matrixwrap.html
extras/check_dependencies.sh: Some prerequisites are missing; install them using the command:
sudo apt-get install zlib1g-dev automake autoconf sox gfortran libtool subversion
Makefile:38: recipe for target 'check_required_programs' failed
make: *** [check_required_programs] Error 1
根据显示的command,在此输入:
sudo apt-get install zlib1g-dev automake autoconf sox gfortran libtool subversion
2. 下载VoiceBox
HomePage: VoiceBox
Git: source code
下载后,添加到matlab路径中即可。
Installation
1)Pull the GitHub repository or unzip the zip archive into any suitable folder (assumed below to be C:\sap-voicebox)
2)Start MATLAB, click “Set Path”, click “Add Folder …”, navigate to C:\sap-voicebox\voicebox, click “Select Folder” then click “Save”.
3)[Optional] The routine v_voicebox.m contains various installation-dependent parameters which may need to be altered before using the toolbox. In particular it contains a number of default directory paths indicating where temporary files should be created, where speech data normally resides, etc. You can override the defaults by editing v_voicebox.m directly or, more conveniently, by setting an environment variable VOICEBOX to the path of an initializing m-file. See the comments in v_ voicebox.m for a fuller description.
4)[Optional] You may find it convenient to install the non-unicode IPA phonetic symbol fonts developed by SIL which are in the C:\sap-voicebox\external\silipa93 folder.
5)[Optional] The folder C:\sap-voicebox\external\shorten contains the source code and compiled executable for the SHORTEN program written by Tony Robinson and SoftSound Limited www.softsound.com. This is needed for reading compressed SPHERE format files. You may wish to move it elsewhere but, if so, you will need to edit v_voicebox.m to give its location.
在matlab中输入:
what voicebox
即可出现voicebox内所有文件
输入:
help voicebox
即会出现所有文件的说明:
>> help voicebox
Voicebox: Speech Processing Toolbox for MATLAB
Function names have been prefixed "v_" to avoid name conflicts; the
unprefixed aliases will be removed in a future version. Use the function
v_voicebox_update to update old code which, by default, updates all .m files
in the current folder.
Audio File Input/Output
v_readwav - Read a WAV file
v_writewav - Write a WAV file
v_readhtk - Read HTK waveform files
v_writehtk - Write HTK waveform files
v_readsfs - Read SFS files
v_readsph - Read SPHERE/TIMIT waveform files
v_readaif - Read AIFF Audio Interchange file format file
v_readcnx - Raed BT Connex database files
v_readau - Read AU files (from SUN)
v_readflac - Read FLAC files
wavread - Emulation of legacy MATLAB function to read a WAV file
wavwrite - Emulation of legacy MATLAB function to write a WAV file
Frequency Scales
v_frq2bark - Convert Hz to the Bark frequency scale
v_frq2cent - Convert Hertz to cents scale
v_frq2erb - Convert Hertz to erb rate scale
v_frq2mel - Convert Hertz to mel scale
v_frq2midi - Convert Hertz to midi scale of semitones
v_bark2frq - Convert the Bark frequency scale to Hz
v_cent2frq - Convert cents scale to Hertz
v_erb2frq - Convert erb rate scale to Hertz
v_mel2frq - Convert mel scale to Hertz
v_midi2frq - Convert midi scale of semitones to Hertz
Fourier/DCT/Hartley Transforms
v_rfft - FFT of real data
v_irfft - Inverse of FFT of real data
v_rsfft - FFT of real symmetric data
v_rdct - DCT of real data
v_irdct - Inverse of DCT of real data
v_rhartley - Hartley transform of real data
v_zoomfft - calculate the fft over a portion of the spectrum with any resolution
v_sphrharm - calculate forward and inverse shperical harmonic transformations
Probability Distributions
v_berk2prob - Convert Berksons to probability
v_gaussmix - Fit a gaussian mixture model to data values
v_gaussmixd - Calculate marginal and conditional density distributions and perform inference
v_gaussmixk - Estimate Kuleck-Leibler divergence between two GMMs
v_gaussmixg - Calculate global mean, covariance and mode of a Gaussian mixture
v_gaussmixm - Estimate mean and variance of GMM vector magnitude
v_gaussmixp - Calculates and plots full and marginal probability density from a GMM
v_gaussmixt - multiplies two GMMs together
v_gausprod - Calculate the product of multiple gaussians
v_gmmlpdf - OBSOLETE - use v_gaussmixp instead
v_histndim - N-dimensional histogram (+ plot 2-D histogram)
v_lognmpdf - Prob density function of a lognormal distribution
v_maxgauss - Calculate the mean and variance of max(x) where x is a gaussian vector
v_normcdflog - Calculate the log of the Normal cdf without underflow
v_pdfmoments - Convert between central moments, raw moments and cumulants
v_prob2berk - Convert probability to Berksons
v_randvec - Generate random vectors
v_randiscr - Generate discrete random values with prescribed probabilities
v_rnsubset - Select a random subset
v_randfilt - Generate filtered random noise without transients
v_stdspectrum - Generate standard audio and speech spectra
v_usasi - Generate USASI noise (obsolete: use v_stdspectrum instead)
v_chimv - Approximate mean and variance of non-central chi distribution
v_vonmisespdf - Calculate the pdf of the Von Mises (circular normal) distribution
Vector Distances
v_disteusq - Calculate euclidean/mahanalobis distances between two sets of vectors
v_distchar - COSH spectral distance between AR coefficient sets
v_distitar - Itakura spectral distance between AR coefficient sets
v_distisar - Itakura-Saito spectral distance between AR coefficient sets
v_distchpf - COSH spectral distance between power spectra
v_distitpf - Itakura spectral distance between power spectra
v_distispf - Itakura-Saito spectral distance between power spectra
Speech Analysis
v_activlev - Calculate the active level of speech (ITU-T P.56)
v_activlevg - Calculate the active level of speech robustly to added noise
v_dypsa - Estimate glottal closure instants from a speech waveform
v_enframe - Divide a speech signal into frames for frame-based processing
v_correlogram - calculate a 3-D v_correlogram
v_ewgrpdel - Energy-weighted group delay waveform
v_fram2wav - Interpolate frame-based values to a waveform
v_filtbankm - Transformation matrix for a linear/mel/erb/bark-spaced v_filterbank from dft output
v_fxpefac - PEFAC pitch tracker
v_fxrapt - RAPT pitch tracker
v_gammabank - Calculate a bank of IIR gammatone filters
v_importsii - Calculate the SII importance function (ANSI S3.5-1997)
v_modspect - Caluclate the modulation specrogram
v_mos2pesq - Convert MOS values to equivalent PESQ scores
v_overlapadd - Reconstitute an output waveform after frame-based processing
v_pesq2mos - Convert PESQ scores to equivalent MOS values
v_phon2sone - Convert signal levels from phons to sones
v_psycdigit - Experimental estimation of monotonic/unimodal psychometric function using TIDIGITS
v_psycest - Experimental estimation of monotonic psychometric function
v_psycestu - Experimental estimation of unimodal psychometric function
v_psychofunc - Psychometric functions
v_sigma - Identify glottal closure and opening intstants from Lx or EGG waveform
v_snrseg - Segmental SNR and Global SNR calculation
v_sone2phon - Convert signal levels from sones to phons
v_soundspeed - Returns the speed of sound in air as a function of temperature
v_spgrambw - Spectrogram with many options
v_stoi2prob - Convert STOI intelligibility measure to probability of correct recognition
v_txalign - Align two sets of time markers
v_vadsohn - Voice activity detector
v_ppmvu - Calculate the PPM, VU or EBU levels of a signal
LPC Analysis of Speech
v_ccwarpf - warp complex cepstrum coefficients
v_lpcauto - LPC analysis: autocorrelation method
v_lpcbwexp - Bandwidth expansion of LPC filter
v_lpccovar - LPC analysis: covariance method
v_lpcconv - Arbitrary conversion between LPC representations
v_lpcifilt - inverse filter a speech signal
v_lpcrand - create random stable filters
v_lpcrr2am - Matrix with all LPC filters up to order p
v_lpcstable - check for stability and force stable filters
v_lpc--2-- - Convert between alternative LPC representation
Speech Synthesis
v_sapisynth - Text-to-speech synthesis of a string or matrix
v_glotros - Rosenberg model of glottal waveform
v_glotlf - Liljencrants-Fant model of glottal waveform
Speech Enhancement
v_estnoiseg - Estimate the noise spectrum from noisy speech using MMSE method
v_estnoisem - Estimate the noise spectrum from noisy speech using minimum statistics
v_specsub - Speech enhancement using spectral subtraction
v_ssubmmse - Speech enhancement using MMSE estimate of spectral amplitude or log amplitude
v_ssubmmsev - Speech enhancement using MMSE estimate and VAD-based noise estimation
v_specsubm - (obsolete algorithm) Spectral subtraction
v_spendred - Speech Enhancement and Dereverberation (Doire's algorithm)
Speech Coding
v_lin2pcmu - Convert linear PCM to mu-law PCM
v_pcma2lin - Convert A-law PCM to linear PCM
v_pcmu2lin - Convert mu-law PCM to linear PCM
v_lin2pcma - Convert linear PCM to A-law PCM
v_kmeanlbg - Vector quantisation: LBG algorithm
v_kmeanhar - Vector quantization: K-harmonic means
v_potsband - Create telephone bandwidth filter
v_kmeans - Vector quantisation: k-means algorithm
Speech Recognition
v_ldatrace - constrained Linear Discriminant Analysis to maximize trace(W\B)
v_melbankm - Mel v_filterbank transformation matrix
v_melcepst - Mel cepstrum frontend for recogniser
v_cep2pow - Convert mel cepstram means & variances to power domain
v_pow2cep - Convert power domain means & variances to mel cepstrum
Signal Processing
v_addnoise - Add noise to a signal at a chosen SNR
v_convfft - 1-dimensional convolution/corrolation using FFT
v_ditherq - Add dither and quantize a signal
v_filterbank - Apply a bank of IIR filters to a signal
v_findpeaks - Find peaks in a signal or spectrum
v_maxfilt - Running maximum filter
v_meansqtf - Output power of a filter with white noise input
v_momfilt - Generate running moments
v_resample - Resamples a signal: identical to MATLAB resample but removes filter transients
v_schmitt - Pass a signal through a v_schmitt trigger
v_sigalign - Align a clean refeence with a noisy signal
v_teager - Calculate the Teager energy waveform
v_windinfo - Calculate window properties and figures of merit
v_windows - Window function generation
v_zerocros - Find interpolated zero crossings
Information Theory
v_huffman - Generate Huffman code
v_entropy - Calculate v_entropy and conditional v_entropy
Computer Vision
v_imagehomog - Apply a homography transformation to an image with bilinear interpolation
v_polygonarea - Calculate the area of a polygon
v_polygonwind - Test if points are inside or outside a polygon
v_polygonxline - Find where a line crosses a polygon
v_qrabs - Absolute value of a real quaternion
v_qrdivide - divide two real quaternions (or invert one)
v_qrdotdiv - elmentwise division of two real quaternion arrays
v_qrdotmult - elmentwise multiplication of two real quaternion arrays
v_qrmult - multiply two real quaternion arrays
v_qrpermute - permute the indices of a quaternion array
v_rectifyhomog - Apply rectifing homographies to a set of cameras to make their optical axes parallel
v_rot--2-- - Convert between different representations of rotations
v_rotqrmean - Find the average of several v_rotation quaternions
v_rotqrvec - Apply a quaternion rotation to an array of 3D vectors
v_sphrharm - forward and inverse spherical harmonic transform using uniform, Gaussian
or arbitrary inclination (elevation) grids and a uniform azimuth grid.
v_upolyhedron - Calculate the vertex coordinates and other characteristics of a uniform polyhedron
Printing and Display functions
v_axisenlarge - Selectively enlarge figure axis for clarity
v_cblabel - Add a label onto the colorbar
v_figbolden - Make a figure bold and adjust colours for printing clearly
v_fig2emf - Make a figure bold and save as a windows metafile
v_fig2pdf - Make a figure bold and save as pdf, eps or ps
v_frac2bin - Convert numbers to fixed-point binary strings
v_lambda2rgb - convert wavelength to XYZ or RGB colour triplets
v_sprintsi - Print a value with an SI multiplier
v_sprintcpx - Print a complex number with real and imaginary parts
v_texthvc - write text on a plot with specified alignment and colour
v_tilefigs - Arrange all figures on the screen
v_colormap - Set and plot colormap information
v_xticksi - Label x-axis tick marks using SI multipliers
v_yticksi - Label y-axis tick marks using SI multipliers
v_xyzticksi - Helper function for v_xticksi and v_yticksi
Voicebox Parameters and System Interface
v_hostipinfo - Get information about the computer name and internet connections
v_regexfiles - Recursively find files that match a regular expression pattern
v_unixwhich - Search the WINDOWS system path for an executable program (like UNIX which)
v_voicebox - Global installation-dependent parameters
v_winenvar - Obtain WINDOWS environment variables
v_voicebox_update - Update matlab files in the current folder to include the v_ prefix where needed
Utility Functions
v_atan2sc - arctangent function that returns the sin and cos of the angle
v_besselratio - calculate the Bessel function ratio: besseli(v+1,x)./besseli(v,x)
v_besselratioi - calculate the inverse of v_besselratio [only for v=0]
v_bitsprec - Rounds values to a precision of n bits
v_choosenk - All choices of k elements out of 1:n without replacement
v_choosrnk - All choices of k elements out of 1:n with replacement
v_dlyapsq - Solve the discrete lyapunov equation
v_dualdiag - Simultaneously diagonalise two hermitian matrices
v_finishat - Estimate the finishing time of a long loop
v_fopenmkd - Like FOPEN() but creates any missing directories/folders
v_gammalns - Calculates log(gamma(x)) for signed real-valued x
v_horizdiff - Estimate the horizontal difference between two functions of x
v_hypergeom1f1 - Confluent Hypergeometric function or Kummer's M function
v_logsum - Calculates log(sum(exp(x))) without overflow/underflow
v_minspane - calculate the minimum (or shortest) spanning tree
v_mintrace - find a row permutation to minimize the trace of a matrix
v_m2htmlpwd - Create HTML documentation of matlab routines in the current directory
v_nearnonz - Replace each zero element with the nearest non-zero element
v_paramsetch - Set a parameter structure and do valididty checks
v_permutes - All n! permutations of 1:n
v_quadpeak - Find quadratically-interpolated peak in a 2D array
v_rotation - Generate v_rotation matrices
v_skew3d - Generate 3x3 skew symmetric matrices
v_zerotrim - Remove empty trailing rows and columns
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
voicebox 既是目录也是函数。
For calling details please see v_voicebox.m
This dummy routine is included for backward compatibility only
and will be removed in a future release of voicebox. Please use
v_voicebox.m in future and/or update with v_voicebox_update.m
Copyright (C) Mike Brookes 2018
Version: $Id: voicebox.m 10863 2018-09-21 15:39:23Z dmb $
3. 连接共享文件夹
ubuntu里输入
sudo apt install nfs-kernel-server
sudo mount -t nfs -o nolock 192.168.1.152:/home/zhuguili/mnt ~/tmp
即可实现文件共享(本地文件夹为tmp)