HTK中函数ProcessCrossWordLinks处理流程

最新推荐文章于 2024-05-23 21:55:25 发布

hjx5200

最新推荐文章于 2024-05-23 21:55:25 发布

阅读量263

点赞数

分类专栏：语音识别文章标签： HTK 网络扩展跨词连接

本文链接：https://blog.csdn.net/hjx5200/article/details/117790044

版权

语音识别专栏收录该内容

45 篇文章 4 订阅

订阅专栏

首先说下这个函数的主要功能：

第一遍调用时，heap！=null，会创建NetNode节点和links计数。

怎么创建NetNode节点呢？它依据什么呢？就是Lattice中的元素。

我们知道，Lattice叫词格网络，它的节点是词，边表示词之间的连接关系。

所以，一是为每个节点创建一个NetNode；对象存放在wnHashTab中，是n_word类型的。wnHashTab是hash表，它的指针指向NetNode对象。

/* Process the cross word links, the first time heap!=NULL and */
/*  the links are counted and wordnodes created, the second    */
/*  time heap==NULL and link likelihoods/destinations are set. */
void ProcessCrossWordLinks(MemHeap *heap,Lattice *lat,int xc)
{
   PronHolder *lInst,*rInst;
   NetNode *wordNode;
   LArc *thisLArc;
   int i,lc,rc,type;

   /*  Currently a new word end is created for all logical contexts */
   /*  This is only needed for single phone words for which several */
   /*  models (in different contexts) connect to a single word end. */
   /*  For multi-phone words no models are shared and so a single   */
   /*  word end per distinct physical model would be fine.          */
   for (i=0; i<lat->na; i++) {
      thisLArc = NumbLArc(lat, i);
      for (lInst=(PronHolder*)thisLArc->start->sublat;
           lInst!=NULL;lInst=lInst->next)
         for (rInst=(PronHolder*)thisLArc->end->sublat;
              rInst!=NULL;rInst=rInst->next) {
            if (xc==0) {
               wordNode = FindWordNode(heap,lInst->pron,lInst,n_word);
               if (heap!=NULL)
                  wordNode->tag=SafeCopyString(heap,thisLArc->start->tag); 
               if (heap==NULL) {
                  wordNode->links[wordNode->nlinks].node=rInst->starts;
                  wordNode->links[wordNode->nlinks].like=thisLArc->lmlike;
               }
               wordNode->nlinks++;
            }            
   }
}

上面的代码显示，遍历Lattice网格的所有边，为它的每个边的左节点创建NetNode对象，NetNode的类型是n_word，依据的就是该节点的发音实例信息sublat。Lattice词格中所有节点的发音实例，在上一篇博客提到的InitPronHolders函数中完成了创建。

FindWordNode完成该创建过程。

/* Use hash table to lookup word end node */
static NetNode *FindWordNode(MemHeap *heap,Pron pron,
                             PronHolder *pInst,NetNodeType type)
{
   union {
      Ptr ptrs[3];
      unsigned char chars[12];
   }
   un;
   unsigned int hash,i;
   NetNode *node;

   hash=0;
   un.ptrs[0]=pron;un.ptrs[1]=pInst;un.ptrs[2]=(Ptr)type;
   for (i=0;i<12;i++)
      hash=((hash<<8)+un.chars[i])%WNHASHSIZE;

   for (node=wnHashTab[hash];node!=NULL;node=node->chain)
      if (node->info.pron==pron && node->inst==(NetInst*)pInst &&
          node->type==type) break;

   if (node==NULL) {
      nwe++;
      node=(NetNode *) New(heap,sizeof(NetNode));
      node->info.pron=pron;
      node->type=type;
      node->inst=(NetInst*)pInst;
      node->nlinks=0;
      node->links=NULL;
      node->tag=NULL;
      node->aux=0;
      node->chain=wnHashTab[hash];
      wnHashTab[hash]=node;
   }
   return(node);
}

就是在wnHashTab中找个位置，然后new一个NetNode对象，指向它，并给它赋值，包括pron、type、并把chain指向自己。

其中wnHashTab是个hash表，存放了词节点的信息，上面的这个节点为典型。

第二遍调用ProcessCrossWordLinks时heap==null，这时走下面的分支：

               if (heap==NULL) {
                  wordNode->links[wordNode->nlinks].node=rInst->starts;
                  wordNode->links[wordNode->nlinks].like=thisLArc->lmlike;
               }

这里有两个问题，一是links的空间是否创建了；二是rInst->starts指向的什么？

在两次调用ProcessCrossWordLinks之间的代码，就是消除上面两个问题的。

 /* Build models on basis of contexts seen */
   net->teeWords=FALSE;
   for (i=0; i < lat->nn; i++) {
      thisLNode = lat->lnodes+i;
      thisWord = thisLNode->word;

      for(pInst=(PronHolder*)thisLNode->sublat;
          pInst!=NULL;pInst=pInst->next) {

            p=0;
            q=pInst->nphones-1;
         
         pInst->tee=TRUE;
         /* Make wrd-int cd phones (possibly none!) */
         CreateWIModels(pInst,p,q,net,hci);
         if (hci->xc==0) {
            /* Word internal context only */
            CreateIEModels(thisWord,pInst,p,q,net,hci);
         } 
      }
   }

这段代码是处理词内部的节点指向关系的。处理特殊情况的代码我删了，比如!NULL节点，这时它的nphones==0。

我们在发音字典的每个词发音音子序列后面都添加了sp音子。

看下CreateWIModels函数的细节：

void CreateWIModels(PronHolder *pInst,int p,int q, Network *net,HMMSetCxtInfo *hci)
{
   NetNode *node;
   HLink hmm;
   int j;
   
   for(j=q-1;j>p;j--) {
      hmm=GetHCIModel(hci,FindLContext(hci,pInst,j,0),
                      pInst->phones[j],
                      FindRContext(hci,pInst,j,0));
      if (hmm->transP[1][hmm->numStates]<LSMALL) pInst->tee=FALSE;
      
      nwi++;
      node=NewNode(net->heap,hmm,(pInst->chain==NULL?0:1));
      if (pInst->chain!=NULL) {
         nil++;
         node->links[0].node=pInst->chain;
         node->links[0].like=pInst->fct;
      }
      node->chain=pInst->chain;
      pInst->chain=node;
   }
}

这时for循环里可以把j~p看作指针指向音子。刚开始时，j指向的是v，p指向s；hmm分别为v、iy、t，通过NewNode构建节点。然后添加到pInst的chain后面。

其他两个音子没做处理，留给下一个函数来处理，因为它们涉及到跨词扩展的设置。因为我们当前是单音子模型，所以不存在这个问题。

void CreateIEModels(Word thisWord,PronHolder *pInst,int p,int q,
                    Network *net,HMMSetCxtInfo *hci)
{
   NetNode *node,*wordNode;
   HLink hmm;

      /* End */
      hmm=GetHCIModel(hci,FindLContext(hci,pInst,q,0),
                      pInst->phones[q],0);
      if (hmm->transP[1][hmm->numStates]<LSMALL) pInst->tee=FALSE;

      wordNode = FindWordNode(NULL,pInst->pron,pInst,n_word);
      
      nfi++; nil++;
      node=NewNode(net->heap,hmm,1);
      node->links[0].node=wordNode;
      node->links[0].like=pInst->fct;
      
      pInst->ends=node;
      pInst->nend=1;
      
      /* Start */
      hmm=GetHCIModel(hci,0,pInst->phones[p],
                      FindRContext(hci,pInst,p,0));
      if (hmm->transP[1][hmm->numStates]<LSMALL) pInst->tee=FALSE;
      
      nin++; nil++;
      node=NewNode(net->heap,hmm,1);
      node->links[0].node=(pInst->chain?pInst->chain:pInst->ends);
      node->links[0].like=pInst->fct;
      pInst->starts=node;
      pInst->nstart=1;
      
      /* Chain */
      if (pInst->chain!=NULL) {
         for (node=pInst->chain;node->chain!=NULL;
              node=node->chain);
         node->nlinks=1;
         nil++;
         node->links=(NetLink*) New(net->heap,
                                    sizeof(NetLink));
         node->links[0].node=pInst->ends;
         node->links[0].like=pInst->fct;
      }

}

在注释/* End */下面部分的代码，就是处理最后一个音子sp的。

先是获取sp的hmm模型，然后通过pInst构建/查找到词节点也就是STEVE，就是wordNode。然后，New一个sp对应的hmm节点，且设置它的连接数为1。它指向前面的wordNode。我们知道，每个发音音子序列的最后一个音子将指向词节点。如下图所示，其更新部分就是/* End */代码完成的。

接着看/* Start */部分。

创建s节点，并将stats指向该节点，且pInst->chain接在该节点的links后面。

这里可以看出，如果要按次序进行token传递，顺序应该是这样的：starts --> chains .... --> ends。

这时，我应该就能理解了，为什么要分两次调用ProcessCrossWordLinks。第一遍时完成词的创建，然后建立词内的模型之间的串联关系，最后再调用一遍ProcessCrossWordLinks，完成词到词的跨越，而rInst->starts指向LArc右边节点的发音实例（PronHolder）的初始音子模型。