ChatScript 5 Advanced User Manual -- 2 advanced tokenization

最新推荐文章于 2024-08-12 16:36:32 发布

CopperDong

最新推荐文章于 2024-08-12 16:36:32 发布

阅读量490

点赞数 1

分类专栏：实战3:聊天机器人

本文链接：https://blog.csdn.net/QFire/article/details/79083922

版权

实战3:聊天机器人专栏收录该内容

35 篇文章 1 订阅

订阅专栏

5.2 advanced tokenization

The CS natural language workflow consists of taking the user's input text, splitting it into tokens and stopping each time at a perceived sentence boundary.

The $cs_token variable give you some control over how these work.

The sentence punctuation notion has exceptions, like the period within a floating point number or as part of an abbrviation or webaddress.

So if you need to actually allow a token to have embedded punctuation in it, you can list the token in the LIVEDATA/ENGLISH/SUBSTITUTES/abbreviations.txt file and the tokenizer will respect if.

5.3 advanced concepts

Concepts can have part of speech information attached to them ( using SRC/dictionarySystem.h values). Eg.

concept: ~mynouns NOUN NOUN_SINGULAR ( boxdead foxtrot)
concept: ~myadjectives ADJECTIVE ADJECTIVE_BASIC (moony dizcious)

you can combine pos declarations and ignorespelling.

Note: the system has two kinds of concepts.

Enumerated concepts are ones formed from an explicit list of members. Stuff in definitions of concept: ~xxx() are that.
There are also internal concepts marked by the system. These include part of speech of a word ( requires using the postagger to decide from the input what part of speech it was of possibly several ), grammatical roles, words from infinite sets like ~number and ~placenumber and ~weburl , and so forth.

All internal concepts are members of the concept ~internal_concepts.

5.4 advanced topics

Topic Execution : When a topic is executing rules, it does not stop just because a rule matches. It will keep executing rules until some rule generates output for the user or something issues an appropriate ^end or ^fail call.

Topic Control Flags : control a topic's overall behavior. E.g. topic: ~rust keep random [ rust iron oxide ]

Rules that erase and repeat :

Normally a rule that successfully generates output directly erases itself so it won't run again. Gambits do this and responders do this.

Gambits will erase themselves even if they don't generate output. They are intended to tell a story or progress some action, and so do their thing and then disappear automatically.

Rejoinders don't erase individually, they disappear when the rule they are controlled by disappear. A rule that is marked keep will not erase itself. Nor will responders in a topic marked keep ( but gambits still will ).

Responders that generate output erase themselves. Responders that cause others to generate output will not normally erase themselves ( unless... ):

u: ( * ) respond(~reactor), the rule invoked from the ~reactor topic that actually generated the output will erase itself. But, if the rule generating the output is marked keep, then since someone has to pay the price for output, it will be this calling rule instead.

Repeat does not stop a rule from firing, it merely suppresses its output.

Keywords vs Control Script : A topic can be invoked as a result of its keywords or by a direct call from the control script or some other topic.

Pending Topics : Control flow passes through various topics, some of which become pending, meaning one wants to continue in those topics when talking to the user. Topics that can never be pending are: system topic, blocked topics ( you can block a topic so it won't execute ), and nostay topics.

What makes a remaining topic pending is one of two things, Either the system is currently executing rules in the topic or the system previously generated a user response from the topic. When the system leaves a topic that didn't say anything to the user, it is no longer pending. But once a topic has said something, the system expects to continue in that topic or resume that topic.

The system has an ordered list of pending topics. The order is :

1st-being within that topic executing rules now,
2nd-the most recently added topic ( or revived topic ) is the most pending.

You can get the name of the current most pending topic (%topic) , add pending topics yourself ( ^addtopic() ), and remove a topic off the list (^poptopic()).

Random Gambit : the random gambit, r:

topic: ~beach repeat keep [beach sand ocean sand_castle]
# subtopic about swimming
r: Do you like the ocean?
t: I like swimming in the ocean.
t: I often go to the beach to swim.

# subtopic about sand castles.
r: Have you made sand castles?
   a: (~yes) Maybe sometime you can make some that I can go see.
   a: (~no) I admire those who make luxury sand castles.
t: I've seen pictures of some really grand sand castles.

Note any t: gambits occurring before the first r: gambit, will get executed linearly until the r: gambits can fire.

Overview of the control script :
Normally you start using the system with the pre-given control script. But it's just a topic and you can modify it or write you own.

The typical flow of control is for the control script to try to invoke a pendding rejoinder. This allows the system to directly test rules related to its last output, rules that anticipate how the user will respond.

Unlike responders and gambits, the engine will keep trying rejoinders below a rule until the pattern of one matches and the output doesn't fail.

Not failing does not require that it generate user output. Merely that it doesn't return a fail code. Whereas responders and gambits are tried until user output is generated ( or you run out of them in a topic ).

If no output is generated from rejoinders, the system would test responders. First in the current topic, to see if the current topic can be continued directly. If that fails to generate output, the system would check other topics whose keywords match the input to see if they have responders that match. If that fails, the system would call topics explicitly named which do not involve keywords. These are geneic topics you might have set up.

If finding a responder fails, the system would try to issue a gambit. First, from a topic with matching keywords. If that fails, the system would try to issue a gambit from the current topic. If that fails, the system would generate a random gambit.

Once you find an output, the work of the system is nominally done. It records what rule generated the output, so it can see rejoinders attached to it on next input. And it records the current topic, so that will be biased for responding to the next input. And then the system is done. The next input starts the process of trying to find appropriate rules anew.

There are actually three control scripts ( or one invoked multiple ways ). The first is the preprocess, called before any user sentences are analyzed. The main script is invoked for each input sentence. The postprocess is invoked after all user input is complete. It allows you to examine what was generated ( but not to generate new output except using special routines ^postprintbefore and ^postprintafter ).