Watson Natural Language Understanding

 

沃森机器人自然语言理解服务的开发教程,学习使用 IBM Bluemix Watson NLU服务 完成自然语言的文本内容分析。

导入 Watson Python SDK

可能需要手动安装Python SDK,可以通过命令:pip install --upgrade watson-developer-cloud 进行安装与更新。

In [1]:

from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import (
    Features, EntitiesOptions, KeywordsOptions)

通过用户名和密码连接服务

用户名和密码都可根据实际情况在服务凭证信息中找到

In [2]:

username = '9461ca60-a53a-48d4-83da-89e52eeb2688'
password = 'qq4GhRKYyVAM'

In [3]:

nlu = NaturalLanguageUnderstandingV1(version='2017-12-01', username=username, password=password)

In [4]:

features = Features(entities=EntitiesOptions(), keywords=KeywordsOptions())

准备一些文本资料开始测试

需要注意的是该服务的中文版本仍在开发状态,所以建议大家用英文进行测试

In [5]:

texts = [
    '''Welcome to the official documentation of Godot Engine,
    the free and open source community-driven 2D and 3D game engine!
    If you are new to this documentation,
    we recommend that you read the introduction page to get
    an overview of what this documentation has to offer.''',
    
   '''Godot Engine is an open source project developed by a community of volunteers.
    It means that the documentation team can always use your feedback and help to
    improve the tutorials and class reference. If you do not manage to understand something,
    or cannot find what you are looking for in the docs,
    help us make the documentation better by letting us know!''' 
]

分析文本列表0的内容

In [6]:

nlu.analyze(features=features, text=texts[0])

Out[6]:

{'entities': [{'count': 1,
   'relevance': 0.992572,
   'text': 'Godot Engine',
   'type': 'Company'},
  {'count': 1, 'relevance': 0.338173, 'text': 'official', 'type': 'JobTitle'}],
 'keywords': [{'relevance': 0.908999, 'text': 'Godot Engine'},
  {'relevance': 0.741794, 'text': 'open source'},
  {'relevance': 0.702832, 'text': 'introduction page'},
  {'relevance': 0.674454, 'text': 'official documentation'},
  {'relevance': 0.660084, 'text': 'game engine'},
  {'relevance': 0.230546, 'text': 'overview'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 282, 'text_units': 1}}

我们可以看到服务为我们识别出来两个主题,即类型为“Company”的“Godot Engine”;和类型为“JobTitle”的“official”。除此之外还得到了一些关键词,如:Godot Engine,open source,introduction page,official documentation……,以及统计出了字符数量和文本单元数量。

封装方法并测试文本列表1的内容

In [7]:

NLUA = lambda text: nlu.analyze(features=features, text=text)

In [8]:

NLUA(texts[1])

Out[8]:

{'entities': [],
 'keywords': [{'relevance': 0.973304, 'text': 'open source project'},
  {'relevance': 0.838422, 'text': 'Godot Engine'},
  {'relevance': 0.624781, 'text': 'class reference'},
  {'relevance': 0.60158, 'text': 'documentation team'},
  {'relevance': 0.369586, 'text': 'docs'},
  {'relevance': 0.32839, 'text': 'feedback'},
  {'relevance': 0.28266, 'text': 'community'},
  {'relevance': 0.282402, 'text': 'volunteers'},
  {'relevance': 0.274328, 'text': 'tutorials'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 372, 'text_units': 1}}

测试更大的文本内容并将关键字保存为CSV文档

通过 Python 的 Pandas 数据分析框架将关键字信息保存在CSV文档中,便于使用Excel之类的工具进行浏览。

In [9]:

import pandas as pd

In [10]:

text = '''
Introduction

The Blender Game Engine (BGE) is Blender’s tool for real time projects,
from architectural visualizations and simulations to games.

A word of warning, before you start any big or serious project with the Blender Game Engine,
you should note that it is currently not very supported and that there are plans for its retargeting and
refactoring that, in the very least, will break compatibility. For further information,
you should get in touch with the developers via mailing list or IRC and read the development roadmap.

Use Cases and Sample Games

Blender has its own built in Game Engine that allows you to create interactive 3D applications or simulations.
The major difference between Game Engine and the conventional Blender system is in the rendering process.
In the normal Blender engine, images and animations are built off-line – once rendered they cannot be modified.
Conversely, the Blender Game Engine renders scenes continuously in real-time,
and incorporates facilities for user interaction during the rendering process.'''

In [11]:

response = NLUA(text)
response

Out[11]:

{'entities': [{'count': 1,
   'relevance': 0.325141,
   'text': 'Blender',
   'type': 'Company'},
  {'count': 1, 'relevance': 0.109868, 'text': 'BGE', 'type': 'Organization'}],
 'keywords': [{'relevance': 0.968481, 'text': 'Blender Game Engine'},
  {'relevance': 0.672281, 'text': 'normal Blender engine'},
  {'relevance': 0.642034, 'text': 'Blender’s tool'},
  {'relevance': 0.585455, 'text': 'conventional Blender'},
  {'relevance': 0.544693, 'text': 'real time projects'},
  {'relevance': 0.516305, 'text': 'Engine renders scenes'},
  {'relevance': 0.50586, 'text': 'interactive 3D applications'},
  {'relevance': 0.503355, 'text': 'rendering process'},
  {'relevance': 0.440065, 'text': 'architectural visualizations'},
  {'relevance': 0.42788, 'text': 'development roadmap'},
  {'relevance': 0.415892, 'text': 'mailing list'},
  {'relevance': 0.415685, 'text': 'major difference'},
  {'relevance': 0.411704, 'text': 'Sample Games'},
  {'relevance': 0.411643, 'text': 'user interaction'},
  {'relevance': 0.358522, 'text': 'simulations'},
  {'relevance': 0.348923, 'text': 'retargeting'},
  {'relevance': 0.326577, 'text': 'compatibility'},
  {'relevance': 0.325289, 'text': 'warning'},
  {'relevance': 0.324025, 'text': 'Introduction'},
  {'relevance': 0.322093, 'text': 'BGE'},
  {'relevance': 0.321985, 'text': 'IRC'},
  {'relevance': 0.320995, 'text': 'plans'},
  {'relevance': 0.320874, 'text': 'touch'},
  {'relevance': 0.320867, 'text': 'information'},
  {'relevance': 0.320457, 'text': 'word'},
  {'relevance': 0.319156, 'text': 'developers'},
  {'relevance': 0.319064, 'text': 'Cases'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 1050, 'text_units': 1}}

In [12]:

keywords = pd.DataFrame(response['keywords'])
keywords

Out[12]:

 relevancetext
00.968481Blender Game Engine
10.672281normal Blender engine
20.642034Blender’s tool
30.585455conventional Blender
40.544693real time projects
50.516305Engine renders scenes
60.505860interactive 3D applications
70.503355rendering process
80.440065architectural visualizations
90.427880development roadmap
100.415892mailing list
110.415685major difference
120.411704Sample Games
130.411643user interaction
140.358522simulations
150.348923retargeting
160.326577compatibility
170.325289warning
180.324025Introduction
190.322093BGE
200.321985IRC
210.320995plans
220.320874touch
230.320867information
240.320457word
250.319156developers
260.319064Cases

In [13]:

keywords.to_csv('keyowrds.csv')

转载于:https://my.oschina.net/u/3341527/blog/1584354

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值