(二)UIMA CPE

The UIMA Analysis Engine interface provides support for developing and integrating algorithms that analyze unstructured data. 

The Collection Processing Architecture defines additional components for reading raw data formats from data collections, preparing the data for processing by Analysis Engines, executing the analysis, extracting analysis results, and deploying the overall flow in a variety of local and distributed configurations.

The functionality defined in the Collection Processing Architecture is implemented by a Collection Processing Engine (CPE).

A CPE includes an Analysis Engine and adds a Collection Reader, a CAS Initializer (deprecated as of version 2), and CAS Consumers. The part of the UIMA Framework that supports the execution of CPEs is called the Collection Processing Manager, or CPM.

Collection Reader – interfaces to a collection of data items (e.g., documents) to be analyzed. Collection Readers return CASes that contain the documents to analyze, possibly along with additional metadata.

The CAS Initializer prepares an individual data item for analysis and loads it into the CAS.

CAS Consumer – consume the enriched CAS that was produced by the sequence of Analysis Engines before it, and produce an application-specific data structure, such as a search engine index or database. 

142539_2X5f_188382.png

Analysis Engines and CAS Consumers are both instances of CAS Processors. A Collection Processing Engine (CPE) may contain multiple CAS Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS Processor may be deployed in a variety of local and distributed modes, providing a number of options for scalability and robustness. 


Deploy:

There are three deployment modes for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:

Integrated (runs in the same Java instance as the CPM)

Managed (runs in a separate process on the same machine), and

Non-managed (runs in a separate process, perhaps on a different machine).


For both managed and non-managed CAS Processors, the CAS must be transmitted between separate processes and possibly between separate computers. This is accomplished using Vinci, a communication protocol used by the CPM and which is provided as a part of Apache UIMA. Vinci handles service naming and location and data transport. Service naming and location are provided by a Vinci Naming Service, or VNS. For managed CAS Processors, the CPE uses its own internal VNS. For non-managed CAS Processors, a separate VNS must be running.

015418_nOaJ_188382.png


To run the non-managed example, there are some additional steps.

Start a VNS service by running the startVNS script in the /bin directory, or using the Eclipse launcher “UIMA Start Vinci Service”.

Deploy the Meeting Detector Analysis Engine as a Vinci service, by running the startVinciService script in the /bin directory or using the Eclipse launcher for this, and passing it the location of the descriptor to deploy, in this case %UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml, or if you're using Eclipse and have the uimaj-examples project in your workspace, you can use the Eclipse Menu → Run → Run... → and then pick the launch configuration “UIMA Start Vinci Service”.

Now, run the runCPE script (or if in Eclipse, run the launch configuration “UIMA Run CPE”), passing it the CPE for the non-managed version (%UIMA_HOME%/examples/descriptors/collection_processing_engine/ MeetingFinderCPE_NonManaged.xml ).


转载于:https://my.oschina.net/elleneye/blog/605780

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值