HTK中Network的结构

hjx5200

于 2021-06-15 16:37:34 发布

阅读量150

点赞数

分类专栏：语音识别文章标签： HTK Network 语音识别解码过程识别网络图构建

本文链接：https://blog.csdn.net/hjx5200/article/details/117356960

版权

语音识别专栏收录该内容

45 篇文章 4 订阅

订阅专栏

Network是由lattice扩展而来的。除了依赖lattice，还有HMMSet、pronunciation等。

在lattice中，每个节点LNode最主要的信息就是有一个Word指向一个DictEntry，在DictEntry中指示发音总体信息，比如有几个发音；还有一个指针Pron指向WordPron结构，它是这个word的第一个发音。

这个WordPron结构，存储了这个单词（Word/DictEntry）的更详细的发音信息。

typedef struct _WordPron{   /* storage for each pronunciation */
   short pnum;     /* Pronunciation number 1..nprons */
   short nphones;  /* Number of phones in pronuciation */
   LabId *phones;  /* Array[0..nphones-1] of phones */
   LogFloat prob;  /* Log probability of pronunciation */
   LabId outSym;   /* Output symbol generated when pronunciation recognised */
   Word word;      /* Word this is a pronuciation of */
   Pron next;      /* Next pronunciation of word */
   void *aux;      /* hook for temp info */
} WordPron;

首先，它指出当前这个WordPron是对应单词的第几个发音，然后指出它包含的音子数，以及音子序列，输出符号等。

Network扩展，就是在Lattice中的LNode的sublat指向PronHolder的结构。

先回忆下LNode是什么样子。

typedef struct lnode
{

   Word word;          /* Word represented by arc (labels may be on nodes) */

   short v;            /* Pronunciation variant number */
   SubLatDef *sublat;  /* SubLat for node (if word==lat->voc->subLatWord) */

   ArcId foll;         /* Linked list of arcs following node */
   ArcId pred;         /* Linked list of arcs preceding node */

}
LNode;

我把一些暂时无关的代码删除了。上面的struct可以看出来Lattice中的LNode大体包括哪些信息。

这里的sublat，就是在扩展需要给以定义的地方。这个过程是在InitPronHolder函数中处理的。它的处理结果还是在Lattice网络中，不同的是，lattice网络中的每个节点LNode的sublat附着了这个词的发音信息组成的子lattice。这个词有几个发音，就有几个子lattice，它们是以单链表的形式组成。

接着要处理词之间的跳转了，由函数void ProcessCrossWordLinks(MemHeap *heap,Lattice *lat,int xc)完成。

显然，依赖的重要信息就是lattice中的LArc对象，

typedef struct larc
{
   NodeId start;       /* Node at start of word */
   NodeId end;         /* Node at end of word */
   LogFloat lmlike;    /* Language model likelihood of word */

   ArcId farc;         /* Next arc following start node */
   ArcId parc;         /* Next arc preceding end node */

   LogFloat aclike;    /* Acoustic likelihood of word */

   short nAlign;       /* Number of alignment records in word */
   LAlign *lAlign;     /* Array[0..nAlign-1] of alignment records */

   float score;        /* Field used for pruning/sorting */
   LogFloat prlike;    /* Pronunciation likelihood of arc */
}
LArc;

其中start和end节点是LArc的两端，且每个节点的sublat已经由上一步扩展过了。现在就是要处理它们之间连接关系。

结果是，Lattice中有多少条边，就构建多少个NetNode节点。这个节点的类型是Word End类型。

/* The network nodes themselves just store connectivity info */
struct _NetNode {
   NetNodeType type;    /* Type of this node (includes context) */
   union {
      HLink  hmm;       /* HMM (physical) definition */
      Pron   pron;      /* Word represented (may == null) */
   }
   info;                /* Extra information specific to type of node */
   char    *tag;        /* Semantic tagging information */
   int nlinks;          /* Number of nodes connected to this one */
   NetLink *links;      /* Array[0..nlinks-1] of links to connected nodes */
   NetInst *inst;       /* Model Instance (if one exists, else NULL) */   
   NetNode *chain;
   int aux;
};

info存放的是起始节点的pronunciation信息。生成的62个词节点都存放在wnHashTab中。

接下来处理整个NetWork中节点之间的连接和初始、结尾节点。