Watson Explorer 入门(7):配置 UIMA 管道

配置 UIMA 管道的目的

配置 UIMA 管道的主要任务是确定那些用来分析文档的资源序列。语言资源是用来分析注释文档用的工具。这里“注释”这个概念是 UIMA 的关键概念,用于把自然语言文本转化成带注释的半结构化文本。UIMA 管道配置,实际上就是为语言资源提供一个存放参数的位置。

提醒
如果 UIMA 中一个阶段(Stage)要求的输入类型,上一个阶段没能提供,那么该阶段的输入类型就会显示一个警告图标。右键单击缺少的类型,然后单击“查找”,查看项目中的内容分析工作室资源列表,以生成缺少的类型。然后,将其中一个资源添加到管道中。

配置UIMA的管道是一个迭代的过程。为了创建更多的资源,比如新的自定义词典和语法分析规则的数据库,你必须回去和编辑UIMA的管道配置文件包含这些资源作为分析过程的一部分。

配置步骤

  1. In the Studio Explorer view, right-click the Configuration/Annotators directory in your project and click New > UIMA Pipeline Configuration.

  2. Configure the stages of the UIMA pipeline:

(a) In the UIMA Pipeline Stages list, click Document Language and specify a method for identifying the language of each document. If all documents are in the same language, you can manually specify that language.

Tip If you accept the default option to automatically determine the document language, edit the Acceptable Languages list to specify the languages for which you expect to have documents. Specifying the list of possible languages helps to ensure that Content Analytics Studio identifies the correct language for each document.

(b) Click Lexical Analysis and specify a list of resources such as lexical dictionaries, character rules dictionaries, and custom dictionaries for each language in which you expect to have documents. You can also specify which break rules to use for splitting a document into paragraphs, sentences, and tokens.

If your pipeline includes a parsing rules stage, click Parsing Rules and specify a list of parsing rule files for each language in which you expect to have documents.

Tip If you specify multiple parsing rule files, the order in which you list the files affects the order in which the rules are processed. That is, rules in the first file are processed first, followed by the rules in the second file. If the rules in a file depend on annotations that are created by rules in a different file, ensure that the files are listed in the correct order.

(d) Optional: Add and configure additional pipeline stages. For example, you can add a PEAR stage to include annotators that are packaged as a PEAR file. You can also add a semantic analysis stage to find connections between annotations that are identified in the document. You can add a condition or switch stage to run an annotator stage in only certain conditions, such as running different lexical analysis stages with particular sets of dictionaries depending on the source of the document.
(e) Click Clean Up and select the annotation types that are not to be included in the final output.

Tip
If you want to remove some intermediary types from the final output but still view these types in the Content Analytics Studio annotation editor, select the Show removed types in the annotation editor check box. For example, you might want to view these intermediary types in the annotation editor so that you can use these types as inputs to a parsing rule.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

许野平

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值