变量在ChatScript中占有很重要的位置,很多用户的重要信息,系统信息都依靠变量来暂时存储。这里的变量不是底层程序里的变量,而是脚本变量。上一篇讲到ChatScript大致由两部分组成:底层驱动程序和脚本,本篇讨论的变量就是脚本变量。
ChatScript的变量分为5种:
1. 用户变量,以$和$$开头,其中$相当于全局变量,可以跨topic,跨vollay,而$$相当于局部变量,只存在于一个volley。
2. 通配变量(或叫匹配变量),以“_”开头,保存通配符或者模式匹配到的字符串。
3. 事实(fact)集合,以@开头。
4. 函数变量,以^开头,在脚本中调用函数,则用函数变量。
5. 系统变量,以%开头。
1. 匹配变量
对话系统在自然语言处理层面主要分三个部分:自然语言理解(NLU)、对话管理(DM)、自然语言合成(NLG)。NLU部分处理用户的文本,往往用户的文本中的某些词是在DM、NLG部分需要用到的,“某些词”通常是在特点位置,或者是某个类型的词,这些词采用通配符和模式的方式匹配,匹配之后存储。如下图:
其中,_~meat表示在eat后匹配概念为meat的词或短语。如果将_~meat替换成_*,则表示匹配eat后的所有内容。_0读取前面匹配到的词或短语,0表示读第一个匹配变量,底层程序设置一个volley中最多可使用20个匹配变量,同样,读取出来的变量在_0到_20之间。当vollay结束,匹配变量自动被清除。(补充一句,个人理解的一个volley就是一轮对话,即用户说一句或几句,bot回答一句或几句,这样就算是一轮,一个volley)。
匹配变量被匹配到后,系统会存储他的原词、标准词(canonical form)以及它在文本中的位置。原词和标准词的概念如,(小朋友、小孩、小娃娃、孩子)这4个词意义相近,于是把系统把他们同意成“小朋友”这个词。Bot在回答时,取标准词做回复,想用原词回答,就在前面加“’”,如’_0。
上面这个rule中,_0存_~fruit,_1存[_~animal _bear],”[]”表示选择其中一个,_2存_~like。
当匹配变量没有匹配到内容,或者变量被人为置null,在使用时,不会报错,但也不会有输出。
要匹配数字时,不能使用(_1),_1表示的是匹配变量1,而应该使用_~number _0=1。
2. 用户变量
前面提到匹配变量只能作用在一个volley中,用户变量则可以让某些内容存储的时间更长。用户变量也分两种——全局和局部。全局变量以“$”开头,如果不清除,可以一直存在,而局部变量,以“$$”开头,也只作用于一个volley。
用户变量大部分的使用场景是存储匹配变量,如下:
用户变量的赋值符号“=”,“=”与变量、值之间必须至少有一个空格,不然很容易读不到值,这是亲测了的。
用户变量可以做一些简单的数学计算,如+=, -=, *=, /=, %=, |=, &=, ^=, and |^=等。同时,变量赋值时,也可以用其他变量做数学运算后赋值,但是,运算过程不能用括号来控制计算顺序。
Fact变量也可以通过以上运算符做运算,只是运算的含义更像是集合操作,
CS内部的运算顺序和C语言运算顺序不同,如
这是先计算“-=”再计算“*”的。
同时,用户变量也可以用在pattern中,如
这里,“=”等同于C语言中的“==”,不是赋值,而是判定。这个pattern的意思是当gender已经被赋值为male,如果user说“I like boys”,bot回应“Oh, dear”。Gender=male和I like boys是“&&”并列关系。
匹配变量和用户变量也可以使用逻辑运算符,如
匹配变量不仅仅可以用来存储匹配到的值,也可以被赋成别的值,只是如果被赋为非匹配值,那么,该变量存储的就只有原词,而没有标准词和所在位置,如果被赋为另一个匹配变量,则其标准词和位置存的是赋值变量的相关属性:
在底层,专门处理匹配变量和用户变量的模块——variableSystem。
要清除变量,则给他赋null。
3. 系统变量
系统变量以%开头,主要用于读取和存储系统的某些值和状态。
(1). 系统日期和时间
(2). 用户的输入
(3).bot的输出
(4). (bot的)系统变量
(5). Build data
其中,用户输入包含
variable | description |
%bot | current bot responding |
%revisedinput | Boolean is current input from ^input not direct from user |
%command | Boolean was the user input a command |
%foreign | Boolean is bulk of the sentence composed of foreign words |
%impliedyou | Boolean was the user input having you as implied subject |
%input | the count of the number of volleys this user has made ever |
%ip | ip address supplied |
%language | current dictionary language |
%length | the length in tokens of the current sentence |
%more | Boolean is there another sentence after this |
%morequestion | Boolean is there a ? or question word in the pending sentences |
%originalinput | all sentences user passed into volley, before adjusted in any way except OOB data is stripped off |
%originalsentence | the current sentence after tokenization but before any adjustments |
%parsed | Boolean was current input parsed successfully |
%question | Boolean was the user input a question – same as ? in a pattern |
%quotation | Boolean is current input a quotation |
%sentence | Boolean does it seem like a sentence (subject/verb or command) |
%tense | past , present, or future simple tense (present perfect is a past tense) |
%user | user login name supplied |
%userfirstline | value of %input that is at the start of this conversation start |
%userinput | Boolean is the current input from the user (vs the chatbot) |
%voice | active or passive on current inpu |
Bot输出:
variable | description |
%inputrejoinder | rule tag of any pending rejoinder for input or 0 if none |
%lastoutput | the text of the last generated response for the current volley |
%lastquestion | Boolean did last output end in a ? |
%outputrejoinder | rule tag if system set a rejoinder for its current output or 0 |
%response | number of committed responses that have been generated for this sentence (see Advanced User- Advanced Output: Committed Responses |
系统变量
variable | description |
%all | Boolean is the :all flag on? (:all to set) |
%document | Boolean is :document running |
%fact | Numeric value most recent fact id |
%freetext | kb of available text space |
%freedict | number of unused dictionary words |
%freefact | number of unused facts |
%maxmatchvariables | highest number of match variables, currently 20 |
%maxfactsets | highest number of @factsets, currently 20 |
%host | name of the current host machine or "local" |
%regression | Boolean is the regression flag on |
%server | Boolean is the system running in server mode |
%rule | get a tag to the current executing rule. Can be used in place of a label |
%topic | name of the current "real" topic . if control is currently in a topic or called from a topic which is not system or nostay, then that is the topic. Otherwise the most recent pending topic is found |
%actualtopic | literally the current topic being processed (system or not) |
%trace | Numeric value of the trace flag (:trace to set) |
%httpresponse | return code of most recent ^jsonopen call |
%pid | Linux process id or 0 for other systems |
%restart | You can set and retrieve a value here across a system restart. |
%timeout | Boolean tells if a timeout has happened, based on the timelimit command line parameter |
Build data
variable | description |
%dict | date/time the dictionary was built |
%engine | date/time the engine was compiled |
%os | os invovled (linux windows mac ios) |
%script | date/time build1 was compiled |
%version | engine version number |
4. Fact集合(fact set)
Fact是三元组集合,类似于知识图谱里面的三元组,包含主语(Subject)、谓语(verb)、宾语(Object),如下就是一个三元组:
词、数字、fact都可以作为fact的值。
对fact可做的操作有^createfact()、^find()、^query()。Fact set以@开始,用于存储^query()的结果。
Query的查询规则:
^query(kind subject verb object countfromset toset propagate match),
其中kind有如下选择:
query flag | description |
direct_s | find all facts with the given subject |
direct_v | find all facts with the given verb |
direct_o | find all facts with the given object |
direct_sv | find all facts with the given subject and verb |
direct_so | find all facts with the given subject and object |
direct_vo | find all facts with the given object and verb |
direct_svo | find all facts given all fields (prove that this fact exists) |
Subject、verb、object不必三个同时出现,出现则表示此field需要匹配。
Count是输出查询结果的个数;fromset定义初始fact的factset;toset定义存储查询结果的factset,剩下两个参数的存在感就很低了。这些参数的缺省值为?,其中,count默认为-1,表示数量不限,toset默认为@0。
一般使用比较简洁的方式:^query( kind subject verb object )
^query查询结果存储到fact set中, fact集合被标记为@0、@1,等。fact set是fact的汇聚,也是s、v、o其中一个field的汇聚(因kind而定)。
Fact set的应用:
Bot在所有fact中查询(I own dog),如果查询到了,则RULE匹配成功,输出“yes”,否则,匹配失败,走别的规则。
Factset的赋值:
Factset赋值后被使用:
其中使用规则:
fields | description |
@1subject | means use the subject field |
@1verb | means use the verb field |
@1object | means use the object field |
@1fact | means keep the fact intact (a reference to the fact) – required if assigning to another set. |
@1+ | means spread the subject,verb,object onto successive match variables – only valid with match variables |
@1- | means spread the object,verb,subject onto successive match variables– only valid with match variables |
@1all | means the same as @1+, spread subject,verb,object,flags onto match variables. |
Factset的操作函数有:
function |
|
^first(factset) | 返回第一个fact |
^last(factset) | 返回最后一个fact |
^pick(factset) | 随机返回一个fact |
^sort(factset{more fact-sets} ) | 排序 |
^delete( factset ) | 删除 |
^length( fact-set ) | 返回fact的个数 |
^nth(factset count) | 检索第count个fact |
unp |
|
5. 函数变量
CS系统提供的函数主要有以下几种:
Topic functions |
Marking functions |
Input functions |
Number functions |
Output functions |
Control flow functions |
External access functions |
JSON functions |
Word manipulation functions |
Multipurpose functions |
Facts functions |
具体每个模块有哪些函数可参看https://github.com/bwilcox-1234/ChatScript/blob/master/WIKI/ChatScript-System-Functions-Manual.md