(一)UIMA Analysis Engine

UIMA — Unstructured Information Management applications

  1. An Analysis Engine (AE) is a program that analyzes artifacts (e.g. documents) and infers information from them.

  2. Analysis Engines are constructed from building blocks called Annotators

  3. An Analysis Engine (AE) may contain a single annotator (this is referred to as a Primitive AE)

  4. it may be a composition of others and therefore contain multiple annotators (this is referred to as an Aggregate AE)

  5. All feature structures, including annotations, are represented in the UIMA Common Analysis Structure(CAS). The CAS is the central data structure through which all UIMA components communicate.


第一步,定义CAS Feature Structure types,使用XML叫Type System Descriptor

<?xml version="1.0" encoding="UTF-8" ?>
  <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
    <name>TutorialTypeSystem</name>
    <description>Type System Definition for the tutorial examples - 
        as of Exercise 1</description>
    <vendor>Apache Software Foundation</vendor>
    <version>1.0</version>
    <types>
      <typeDescription>
        <name>org.apache.uima.tutorial.RoomNumber</name>
        <description></description>
        <supertypeName>uima.tcas.Annotation</supertypeName>
        <features>
          <featureDescription>
            <name>building</name>
            <description>Building containing this room</description>
            <rangeTypeName>uima.cas.String</rangeTypeName>
          </featureDescription>
        </features>
      </typeDescription>
    </types>
  </typeSystemDescription>

第二步,生成Java类源文件— CAS types即 org.apache.uima.tutorial.RoomNumber

使用JCasGen插件或工具。

第三步,编写annotator代码

Annotator implementations all implement a standard interface (AnalysisComponent or Adapter — JCasAnnotator_ImplBase):

initialize, process, and destroy.

Initialize is called by the framework once when it first creates an instance of the annotator class. process is called once per item being processed. destroy may be called by the application when it is done using your annotator. 

第四步,编写Analysis Engine Descriptor

The UIMA architecture requires that descriptive information about an annotator be represented in an XML file and provided along with the annotator class file(s) to the UIMA framework at run time. This XML file is called an Analysis Engine Descriptor

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
	<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
	<primitive>true</primitive>
	<annotatorImplementationName>org.apache.uima.tutorial.ex1.RoomNumberAnnotator</annotatorImplementationName>

	<analysisEngineMetaData>
		<name>Room Number Annotator</name>
		<description>An example annotator that searches for room numbers in the IBM Watson
			research buildings.</description>
		<version>1.0</version>
		<vendor>The Apache Software Foundation</vendor>
		
		<configurationParameters/>
    <configurationParameterSettings/>
    <!-- TypeSystem Definition -->
		<typeSystemDescription>
			<imports>
				<import location="TutorialTypeSystem.xml"/>
			</imports>
		</typeSystemDescription>
		
		<typePriorities/>
    <fsIndexCollection/>
    <!-- Capabilities: Inputs, Outputs, and Preconditions -->
		<capabilities>
			<capability>
				<inputs/>
				<outputs>
					<type>org.apache.uima.tutorial.RoomNumber</type>
					<feature>org.apache.uima.tutorial.RoomNumber:building</feature>
				</outputs>
			<languagesSupported/>
      </capability>
		</capabilities>
		<operationalProperties>
			<modifiesCas>true</modifiesCas>
			<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
			<outputsNewCASes>false</outputsNewCASes>
		</operationalProperties>
	</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>

如果是Aggregate AE,则使用配置多个Annotators

165152_L7bA_188382.png

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
	<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
	<primitive>false</primitive>

	<delegateAnalysisEngineSpecifiers>
		<delegateAnalysisEngine key="RoomNumber">
			<import location="../ex2/RoomNumberAnnotator.xml"/>
		</delegateAnalysisEngine>

		<delegateAnalysisEngine key="DateTime">
			<import location="TutorialDateTime.xml"/>
		</delegateAnalysisEngine>
	</delegateAnalysisEngineSpecifiers>

	<analysisEngineMetaData>
		<name>Aggregate TAE - Room Number and DateTime Annotators</name>
		<description>Detects Room Numbers, Dates, and Times</description>
		
		<configurationParameters searchStrategy="language_fallback"/>
    <configurationParameterSettings/>
    <flowConstraints>
			<fixedFlow>
				<node>RoomNumber</node>
				<node>DateTime</node>
			</fixedFlow>
		</flowConstraints>
		
		<typePriorities/>
    <fsIndexCollection/>
    <capabilities>
			<capability>
				<inputs/>
				<outputs>
					<type allAnnotatorFeatures="true">org.apache.uima.tutorial.RoomNumber</type>
					<type allAnnotatorFeatures="true">org.apache.uima.tutorial.DateAnnot</type>
					<type allAnnotatorFeatures="true">org.apache.uima.tutorial.TimeAnnot</type>
				</outputs>
				<languagesSupported>
					<language>en</language>
				</languagesSupported>
			</capability>
		</capabilities>
		<operationalProperties>
			<modifiesCas>true</modifiesCas>
			<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
			<outputsNewCASes>false</outputsNewCASes>
		</operationalProperties>
	</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>

注意fixedFlow

第五步,run “UIMA Document Analyzer” launch 或者命令 documentAnalyzer.sh


如何访问前置Annotators分析出的结果?

Annotators can also use the results of other annotators.

The CAS maintains indexes of annotations, and from an index you can obtain an iterator that allows you to step through all annotations of a particular type. 

FSIndex timeIndex = aJCas.getAnnotationIndex(TimeAnnot.type);
Iterator timeIter = timeIndex.iterator();   
while (timeIter.hasNext()) {
  TimeAnnot time = (TimeAnnot)timeIter.next();

  //do something
}


Annotator的多线程运行模式?

The UIMA framework ensures that an Annotator instance is called by only one thread at a time. 

When multiple threading is wanted, for performance, multiple instances of the Annotator are created, each one running on just one thread.


如何访问External Resource Files?

声明:

<resourceManagerConfiguration>
	<externalResources>
		<externalResource>
			<name>UimaAcronymTableFile</name>
			<description>A table containing UIMA acronyms and their expanded forms.</description>
			<fileResourceSpecifier>
				<fileUrl>file:org/apache/uima/tutorial/ex6/uimaAcronyms.txt</fileUrl>
			</fileResourceSpecifier>
			<implementationName>org.apache.uima.tutorial.ex6.StringMapResource_impl</implementationName> //可选,不配置则使用输入流形式读取
		</externalResource>
	</externalResources>
	<externalResourceBindings>
		<externalResourceBinding>
			<key>AcronymTable</key>
			<resourceName>UimaAcronymTableFile</resourceName>
		</externalResourceBinding>
	</externalResourceBindings>
</resourceManagerConfiguration>

依赖:

<externalResourceDependencies>
    <externalResourceDependency>
      <key>UimaTermTable</key>
      <description>Map whose keps are UIMA terms.</description>
      <interfaceName>org.apache.uima.tutorial.ex6.StringMapResource</interfaceName>
      <optional>false</optional>
    </externalResourceDependency>
  </externalResourceDependencies>

访问:

StringMapResource mMap = 
  (StringMapResource)getContext().getResourceObject("AcronymTable");

Aggregrate中可公用和覆盖其下Annotators中的资源绑定,如下Annotators中资源依赖时不用带Annotator名称。

<externalResourceBindings>
	<externalResourceBinding>
		<key>UimaAcronymAnnotator/AcronymTable</key>
		<resourceName>UimaAcronymTableFile</resourceName>
	</externalResourceBinding>
	<externalResourceBinding>
		<key>UimaMeetingAnnotator/UimaTermTable</key>
		<resourceName>UimaAcronymTableFile</resourceName>
	</externalResourceBinding>
</externalResourceBindings>


转载于:https://my.oschina.net/elleneye/blog/604352

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值