HTK中!NULL节点和SENT_END以及SENT_START的区别与联系

hjx5200

于 2021-06-10 18:04:27 发布

阅读量386

点赞数

分类专栏：语音识别文章标签： HTK 虚拟节点初始节点

本文链接：https://blog.csdn.net/hjx5200/article/details/117782758

版权

语音识别专栏收录该内容

45 篇文章 4 订阅

订阅专栏

SENT_START和SENT_END是词典dict中的存在的，有发音的"sil"，且是个单音子词，在词典voc->wtab中的取"SENT_START"的hash对象。而!NULL是词格Lattice网络中的虚拟节点，在voc中也有对应该虚拟节点的对象，但是它不发音。在字典的初始化时会在voc->wtab中加入“!NULL"的LabId对应的Word，但是Word指向的DictEntry有发音信息和发音个数，这时它们分别为null和0，表明不发音。

typedef struct _DictEntry{
   LabId wordName;  /* word identifier */
   Pron pron;       /* first pronunciation */
   int nprons;      /* number of prons for this word */
   Word next;       /* next word in hash table chain */
   void *aux;       /* hook used by HTK library modules for temp info */
} DictEntry;

一个字典是由一系列上述结构体组成，每个结构体包含wordName、Pron和nprons(int)。

当构建Lattice网络时，每个网络描述里的节点，就是一个LNode，它的构成基础就是词典中的DictEntry。它包含Word，它是指向DictEntry的指针，还有sublat，它是根据word、pron、voc等信息扩展的具体的phone子网络，它是构成识别网络的基础。

这sublat指向的实体就是下面的结构：

typedef struct pronholder
{
   LNode *ln;       /* Node that created this instance */
   Pron pron;       /* Actual pronunciation */
   short nphones;   /* Number of phones for this instance */
   LabId *phones;   /* Phone sequence for the instance */
   
   int nstart;      /* Number of models in starts chain */
   int nend;        /* Number of models in ends chain */
   NetNode *starts; /* Chain of initial models */
   NetNode *ends;   /* Chain of final models */
   NetNode *chain;  /* Chain of other nodes in word */
   struct pronholder *next;
}
PronHolder;

它就是NetWork的基石。PronHolder包含了starts、ends以及chain指针。它分为几个类型，比如是词节点还是hmm节点等等。

static int InitPronHolders(Network *net,Lattice *lat,HMMSetCxtInfo *hci,
                           Vocab *voc,MemHeap *heap,char *frcSil)
{
   PronHolder *pInst;
   NetNode *wordNode;
   Pron thisPron;
   Word thisWord;
   LNode *thisLNode;
   PInstInfo *pii;
   LabId silPhones[MAXPHONES],addPhones[MAXPHONES],labid;
   int i,j,k,l,n,t,lc,type,nNull,npii,nSil,nAdd;
   char *ptr,*p,*nxt,name[MAXSTRLEN],st;

   /* Reset hash table prior to processing lattice */
   for (i=0; i<WNHASHSIZE; i++)
      wnHashTab[i]=NULL;

   /* Determine if we have a real !NULL word */
   net->nullWord = GetWord(voc,GetLabId("!NULL", TRUE),TRUE);
   for (thisPron=net->nullWord->pron;thisPron!=NULL;thisPron=thisPron->next)
      if (thisPron->nphones!=0) {
         net->nullWord=NULL;
         break;
      }
   if (net->nullWord!=NULL) {
      if (net->nullWord->pron==NULL)
         NewPron(voc,net->nullWord,0,NULL,net->nullWord->wordName,1.0);
   }

      nSil=nAdd=0;

   /* Create instance for each pronunciation in lattice */
   for (i=0,nNull=0,t=0; i < lat->nn; i++) {
      thisLNode = lat->lnodes+i;
      thisWord = thisLNode->word;
      if (thisWord==NULL) thisWord=voc->nullWord;
      if (thisWord==voc->subLatWord)
         HError(8220,"InitPronHolders: Expand lattice before making network");
      thisLNode->sublat=NULL;
      if (thisWord->nprons<=0)
         HError(8220,"InitPronHolders: Word %s not defined in dictionary",
                thisWord->wordName->name);

      pii=(PInstInfo *) New(&gstack,(thisWord->nprons+1)*(nAdd+1)*sizeof(PInstInfo));
      pii--;
      /* Scan current pronunciations and make modified ones */
      for (j=1,thisPron=thisWord->pron,npii=0; thisPron!=NULL;
           j++,thisPron=thisPron->next) {
         if (thisPron->nphones==0) n=0;

         if (thisPron->nphones==0 || nAdd==0 || n==0) {
            /* Just need one pronunciation */
            if (thisPron->nphones==0) {
               if (thisWord!=net->nullWord && (trace&T_CXT)) 
                  printf("InitPronHolders: Word %s has !NULL pronunciation\n",
                         thisWord->wordName->name);
               nNull++;
            }
            if (n==0) n=thisPron->nphones;
            pii[++npii].pron=thisPron; pii[npii].silId=-1;
            pii[npii].n=n;pii[npii].t=n;
            pii[npii].phones=thisPron->phones;
         }

      }

      /* Now make the PronHolders */
      for (j=1; j<=npii; j++) {
         /* Don't add duplicates */
         if (pii[j].pron==NULL) continue;
         /* Build inst for each pron */
         pInst=NewPronHolder(heap,hci,pii[j].pron,pii[j].t,pii[j].phones);
         pInst->ln = thisLNode;
         pInst->next = (PronHolder*)thisLNode->sublat;
         thisLNode->sublat = (SubLatDef*) pInst;
         if (pInst->nphones<=0) pInst->fct = 0.0;
         else pInst->fct = thisLNode->score/pInst->nphones;

         /* Fake connections from SENT_[START/END] */

         if (thisLNode->foll==NULL) {
            wordNode = FindWordNode(net->heap,pInst->pron,pInst,n_word);
            wordNode->tag=SafeCopyString(net->heap,thisLNode->tag);
            wordNode->nlinks = 0;
         }
      }
      Dispose(&gstack,++pii);
   }
   
   return(nNull);
}

我删除了很多对理解原理无关的代码，剩下精简的，试着来详细分析，这个函数是如何初始化PronHolder对象的。

刚开始，将net-nullword指向voc->nullword，它是"!NULL"对应的word，我们之前分析知道，它在初始化时，pron为NULL；所这里会走到NewPron这句，使得pron是有效，只是nphones为0，phones=null。这个信息在下面会使用到。

接着就是nSil和nAdd为0，什么作用，我目前不是很清楚。

再然后，就是循环对Lattice的LNode创建发音实例——PronHolder。

这一步，它又分两步走，首先是把发音信息保存在PInstInfo列表中，因为一个节点（对应一个Word）可能有多个发音，所以循环处理。

第一步：[如果发音对象的nphones为0，则对nNull增一，计数一共包含多少个!NULL节点。]把发音信息例如：音子数，音子序列pron对象都保存起来，在下一步中使用；

第二步：就是调用NewPronHolder，传递进去的参数包括发音对象、音子个数、音子序列。然后把pInst与产生该对象的Lattice的节点关联起来，pInst放入LNode的sublat中，而pInst的ln指向该node。

到这一步，Lattice中出现的LNode都以及初步扩展过了，比较特殊的节点，一是SENT_START/END和!NULL也做了说明。

但是呢，我们只有单词内部（每个单词有0、1或多个发音音子）的发音信息，和组网框架，但是对词之间，以及词内部的starts、chain和ends之间的指向关系还没构建呢。