今天找了科大讯飞的语音识别库,感觉只有后半部分,想实现前半部分的采集却找不到可以用的东西。于是自己做一个。
可以随便拿个开源的代码改一下,我们就使用arecord吧。
这只是个简单的端点检测,根据声音能量,也就是分贝大小来调整的。
下载源码:
http://www.alsa-project.org/main/index.php/Main_Page
arecord在alsa的alsa-utils里面。当然依赖于Library (alsa-lib),所以两个都要下下来
下载页在这:
http://www.alsa-project.org/main/index.php/Download
然后在本地就可以编译了。编译时会提示需要各种工具,直接按提示下载即可。 我用的是ubuntu直接apt-get,就得到了。
编好之后,进入aplay文件夹下。可以看到arecord其实是一个链接链到aplay了。aplay.c就是源码,我们直接改它就行
一:定义变量:
95行左右,
static char *command;
static snd_pcm_t *handle;
static struct {
snd_pcm_format_t format;
unsigned int channels;
unsigned int rate;
} hwparams, rhwparams;
static int timelimit = 0;
//add by ljh
static int silflag = 0; //出现静音情况
static off64_t silcount= 0; //j静音时要收到的包
static int enablesil = 0; //启用静音检测状态
1648行,修改函数如下
static void print_vu_meter_mono(int perc, int maxperc)//因为是单音输入,所以我们只改这一块,调用时配合mono设置。
{
const int bar_length = 50;
char line[80];
int val;
//add by ljh
if(perc>=2)//speak
{
enablesil = 1;// sel open
silflag = 0;
}else if(silflag==0)//into silflag
{
//init silcount 1s
silcount = 2* snd_pcm_format_size(hwparams.format, hwparams.rate * hwparams.channels);
silflag =1;
}else // already silence
{
}
for (val = 0; val <= perc * bar_length / 100 && val < bar_length; val++)
line[val] = '#';
for (; val <= maxperc * bar_length / 100 && val < bar_length; val++)
line[val] = ' ';
line[val] = '+';
for (++val; val <= bar_length; val++)
line[val] = ' ';
if (maxperc > 99)
sprintf(line + val, "| MAX");
else
sprintf(line + val, "| %02i%%", maxperc);
fputs(line, stderr);
if (perc > 100)
fprintf(stderr, _(" !clip "));
}
最后是capture函数,分别在 3050和3103处修改:
static void capture(char *orig_name)
{
int tostdout=0;/* boolean which describes output stream */
int filecount=0;/* number of files written */
char *name = orig_name;/* current filename */
char namebuf[PATH_MAX+1];
off64_t count, rest;/* number of bytes to capture */
struct stat statbuf;
/* get number of bytes to capture */
count = calc_count();
if (count == 0)
count = LLONG_MAX;
/* compute the number of bytes per file */
max_file_size = max_file_time *
snd_pcm_format_size(hwparams.format,
hwparams.rate * hwparams.channels);
/* WAVE-file should be even (I'm not sure), but wasting one byte
isn't a problem (this can only be in 8 bit mono) */
if (count < LLONG_MAX)
count += count % 2;
else
count -= count % 2;
/* display verbose output to console */
header(file_type, name);
/* setup sound hardware */
set_params();
/* write to stdout? */
if (!name || !strcmp(name, "-")) {
fd = fileno(stdout);
name = "stdout";
tostdout=1;
if (count > fmt_rec_table[file_type].max_filesize)
count = fmt_rec_table[file_type].max_filesize;
}
init_stdin();
//add by ljh
silcount = 2* snd_pcm_format_size(hwparams.format, hwparams.rate * hwparams.channels);
do {
/* open a file to write */
if(!tostdout) {
/* upon the second file we start the numbering scheme */
if (filecount || use_strftime) {
filecount = new_capture_file(orig_name, namebuf,
sizeof(namebuf),
filecount);
name = namebuf;
}
/* open a new file */
if (!lstat(name, &statbuf)) {
if (S_ISREG(statbuf.st_mode))
remove(name);
}
fd = safe_open(name);
if (fd < 0) {
perror(name);
prg_exit(EXIT_FAILURE);
}
filecount++;
}
rest = count;
if (rest > fmt_rec_table[file_type].max_filesize)
rest = fmt_rec_table[file_type].max_filesize;
if (max_file_size && (rest > max_file_size))
rest = max_file_size;
/* setup sample header */
if (fmt_rec_table[file_type].start)
fmt_rec_table[file_type].start(fd, rest);
/* capture */
fdcount = 0;
while (rest > 0 && recycle_capture_file == 0 && !in_aborting) {
size_t c = (rest <= (off64_t)chunk_bytes) ?
(size_t)rest : chunk_bytes;
size_t f = c * 8 / bits_per_frame;
if (pcm_read(audiobuf, f) != f)
break;
if (write(fd, audiobuf, c) != c) {
perror(name);
prg_exit(EXIT_FAILURE);
}
count -= c;
rest -= c;
fdcount += c;
//add by ljh
if(silflag == 1 && enablesil == 1)//silence sil open
{
silcount -= c;
if(silcount <= 0)// after 1s
{
// game over
count = 0;
rest = 0;
}
}
}
/* re-enable SIGUSR1 signal */
if (recycle_capture_file) {
recycle_capture_file = 0;
signal(SIGUSR1, signal_handler_recycle);
}
/* finish sample container */
if (fmt_rec_table[file_type].end && !tostdout) {
fmt_rec_table[file_type].end(fd);
fd = -1;
}
if (in_aborting)
break;
/* repeat the loop when format is raw without timelimit or
* requested counts of data are recorded
*/
} while ((file_type == FORMAT_RAW && !timelimit) || count > 0);
}
然后再编译就好了。配合到代码里使用可以这样:
system("./arecord -D hw:0,0 -f S16_LE -V mono -r8000 -c 2 -t wav wav/mytest.wav");
把aplay拷过去,建立个链接ln ./alplay arecord
记得把依赖的库libasound.so.2也得考过去哦。
原理:capture中,每次会把读到的数据做个能量计算并打印出来,通过其计算的部分来设置个静音开始状态,同时在设置接下来需要读多少个字节,这部分可以仿照其原有的count计算方法。这样,当进入静音状态开始时,系统再读入设定好的字节后,就自动退出了。当然在此期间,假如用户又说话了。那么静音状态关闭,silcount将不会生效。直到再此进入静音状态开始
还有一个点是当录音刚启动时,如果用户不说话,我们不希望它退出,要一直等在那里,于是又需要个变量就是启用录音状态enablesil ,默认是不启用,一旦有用户说话,就把它启用。这样silflag就开始生效了。
如果想直接拿来用,这里有编译好的程序,和源码可以下载:
http://download.csdn.net/detail/lijin6249/9580171