IBM Watson Discovery Knowledge Graph

Last Updated: 2018-06-09Edit in GitHub

Knowledge graphs go beyond just data and information by making connections within your data across documents and generating new knowledge. We provide the AI technology that automatically creates custom knowledge graphs from unstructured data by extracting and disambiguating entities and relationships, enriching the relationships using algorithmic techniques and ranking the results using relevance algorithms. Knowledge Graphs can function as the "knowledge hub" for your company and can be used for enterprise search, summarization, recommendation engines, other decision making processes - for example, detecting fraud, waste, or abuse. The use of a custom model (created in Knowledge Studio) in the Knowledge Graph creation process, can help build domain specific KGs with applicability in domains such as financial, technology, security, intelligence, healthcare and many others. See Integrating with IBM Watson™ Knowledge Studio for more information about integrating Discovery with Knowledge Studio.

Two RESTful end-points added to IBM Watson™ Discovery provide the ability to search for disambiguated, enriched entities and relations across documents in unstructured document collections. Search results can be rank ordered by relevance or popularity. In addition to a search token, the APIs can use optional context word(s) or passages that finds more relevant entities and relations within the large automatically created knowledge graph.

The following figure shows how Knowledge Graph fits in the current IBM Watson™ Discovery pipeline. Natural Language Understanding enrichments use a custom Knowledge Studio model (en-news) to extract entities and relations at the individual document level. During Knowledge Graph creation, implicit (automatic) entity resolution and graph expansion techniques are used to automatically create a connected graph of entities and relations across documents. In addition to the Knowledge Graph being created, the Knowledge Graph analytics service adds relevance-ranking techniques to return results.

Knowledge Graph Process

This connected graph of knowledge and ranking techniques facilitates the following use cases:

  • Disambiguated entities by using a fuzzy search token, type information (optional) and context (optional). Example: Searching for Steve in context of Apple returns Steve Jobs on top while searching for Steve in context of Microsoft returns Steve Ballmer on top.
  • Relevance ranked relationships by inputting fuzzy search token and context (optional). Relevance-ranking utilizes the global properties of the graph to surface more specific information. Example: Searching for relationships of Obama in context of health returns Affordable Care Act and other related entities.
  • Inferences and aggregations across documents by querying for entities and relationships in a connected graph of knowledge. Some examples of such queries are: How is person X connected to person Y? What is the sphere of influence of person X?

Service requirements

During the beta release, Knowledge Graph functionality and the methods associated with it are only available for service instances that are subscribed to Advanced plans, Premium plans, and all dedicated environments.

This beta feature is currently supported in English only, see Language support for details.

Collection requirements

Discovery uses Entities and Relationships extracted from ingested documents to form the Knowledge Graph and allow entity and relationship queries.

Note: Entity similarityEvidence, and Canonicalization and filtering are available in all collections. For collections created before 03-05-2018, you need to reingest your documents to use these features.

Note: Knowledge Graph can be used on private data collections only, it is not designed for use with Discovery News.

To use Knowledge Graph, your collection must be configured to meet specific requirements as follows:

  • Both the entities and relations enrichments must be specified for the fields which will utilize Knowledge Graph and each enrichment must use the same custom model. If the public model is used (available without the use of Knowledge Studio) it must be specified in the form of a custom model model="en-news".

  • The relations enrichments must be specified as follows:

     

    "relations": { "model": "en-news" }

  • The entities enrichment must be specified as follows and must also have the mentionsmentions_types, and sentence_locations parameters specified:

     

    "entities": { "mentions": true, "mention_types": true, "sentence_locations": true, "model": "en-news" }

    Other optional enrichments options such as "sentiment": true can also be specified if desired. They will be stored in the discovery index as enrichments, but will not be used as nodes in the knowledge graph itself.

These options cannot be added using the Discovery tooling, a custom configuration must be uploaded using the API. A copy of the default configuration modified to enrich the text field so that the collection can be used with knowledge graph with the public model is available here.

Create a custom configuration as follows, after creating a Discovery service instance:

  1. Issue the following command to create an environment that is called my-first-environment. Replace {apikey_value} with the value of your service's API key :

     

    curl -X POST -u "apikey":"{apikey_value}" -H "Content-Type: application/json" -d '{ "name":"my-first-environment", "description":"exploring environments"}' "api/v1/environments?version=2017-11-07"

    The API returns information such as your environment ID, environment status, and how much storage your environment is using.

    You will need the {environment_id} that is returned; make sure to save that ID for later use.

  2. Next, create the custom configuration. This procedure assumes that you are uploading the one found here. If you want to build your own custom configuration, see the configuration reference.

     

    curl -X POST -u "apikey":"{apikey_value}" -H "Content-Type: application/json" -d @config-default-kg.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations?version=2017-11-07"

    If you already have a custom configuration, and would like to update it and use it, use the {configuration ID} of your custom configuration in this command.

     

    curl -X PUT -u "apikey":"{apikey_value}" -H "Content-Type: application/json" -d @config-default-kg.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations/{configuration ID}?version=2017-11-07"

  3. After the custom configuration has been uploaded it can be used in any collection that you create, any method to upload documents can be used as long as the custom configuration is specified. If you are unfamiliar with creating collections and uploading documents, see Getting started with the tooling. When you get to step 3 select Knowledge Graph Configuration instead of creating a new configuration.

Canonicalization and filtering

All entities in documents ingested on or after 5 March 2018 will be automatically be normalized with canonical names derived from a public dictionary. In addition, any pronouns included in entities or relations for example: heshethey, or it will automatically be filtered out before ingestion into Knowledge Graph. Documents ingested before 5 March 2018 will not include this level of canonicalization and filtering; you should create new collections and reingest your documents to utilize this feature.

When building an entities query or a relations query in Knowledge Graph, you can enter either the canonical name or original text of the entity into the text field of the query_entities or query_relationsmethod.

Entities queries

The beta release of the Knowledge Graph entities query supports context-based entity disambiguation and similarity queries. A Knowledge Graph entity query is performed by POSTing a JSON object to the v1/environments/{environment_id}/collections/{collection_id}/query_entities endpoint.

You can query entities using the API, or with the Discovery tooling. See Querying Knowledge Graph using the Discovery tooling for tooling information.

The Knowledge Graph entity query JSON object takes the following form:

 

{ "feature": "disambiguate", "entity": { "text": "Steve", "type": "Person", "exact": "false" }, "context": { "text": "iphone" }, "count": 10, "evidence_count": 0 }

  • "feature": string required - the entity query feature to be used. Supported features are: disambiguate and similar_entities.
  • "entity": {} required - an object that contains the entity information to disambiguate.
    • "text": string required - the entity text that will be disambiguated
    • "type": string optional - the optional entity type to disambiguate against, if not specified, all types are included.
    • "exact": boolean optional - If false, implicit disambiguation is performed. Implicit disambiguation will use the top one disambiguated entity for each input entity object. Should be set to false for "feature": "disambiguate". The default is false.
  • "context": {} optional - an optional object that includes contextual requirements for the disambiguation.
    • "text": string optional - entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. Input can be partial names or large passages containing relevant entity terms. Multiple terms can be passed together.
  • "count": INT optional - The number of disambiguated entities to return. The default is 10. The maximum is 1000
  • "evidence_count": INT optional The number of evidence instances to return for each identified entity. The default is 0. The maximum value for the evidence_count field is 10,000 divided by the number specified in the count field. See the Evidence section of this page for a detailed description and examples.

The query returns results of the following form:

 

{ "entities": [ { "text": "Steve Jobs", "type": "PERSON" }, { "text": "Steve Wozniak", "type": "PERSON" } ] }

If no match is found, the following JSON object is returned:

 

{ "entities": [] }

Entity disambiguation

Knowledge Graph entities query provides context-based entity disambiguation. Based on the entity text provided and optional context text, disambiguation identifies unique entities and returns a list of the entities ranked based on the context information.

An entity disambiguation query is requested by specifying "disambiguation" as the value for the "feature" : field in the knowledge graph query object.

For example, disambiguating the entity text Steve in the context of iphone could result in Steve Jobsand Steve Wozniak being returned.

Entity similarity

Knowledge Graph entities query provides context-based entity similarity detection. Based on the entity text provided and optional context text, similar_entities identifies unique entities and returns a list of the entities ranked based on the context information.

An entity similarity query is requested by specifying "similar_entities" as the value for the "feature" : field in the knowledge graph query object.

For example, if you looked for similar entities to Ford in the context car, similar entity results could include GMToyota, and Nissan.

Relations queries

Knowledge Graph relations queries supports finding most relevant relationships based on input entities using implicit entity disambiguation, context based relationships, sorting by relevance score and mention count, and filtering by types and document ids.

You can query relations using the API, or with the Discovery tooling. See Querying Knowledge Graph using the Discovery tooling for tooling information.

A Knowledge Graph entity query is performed by POSTing a JSON object to the v1/environments/{environment_id}/collections/{collection_id}/query_relationsendpoint. The Knowledge Graph relations query JSON object takes the following form:

 

{ "entities": [ { "text": "Steve Jobs", "type": "PERSON", "exact": true } ], "context": { "text": "iphone" }, "sort": "score", "filter": { "relation_types": { "exclude": ["colocation"], "include": ["locatedAt", "employedBy", "managerOf", "founderOf"] }, "entity_types": { "exclude": ["EVENT"], "include": ["PERSON", "GPE", "ORGANIZATION"] }, "document_ids": ["b95df4c1-d00f-4771-abb2-a52baea0444a", "ad340635-bf3e-47a5-bea5-5e778f600c32"] }, "count": 10, "evidence_count": 0 }

  • "entities": [] required - an array that contains the entities that relationships will be queried. All neighbor relationships are returned if only one entity object is defined. When more than one entity object is defined, mutual pairwise relations are returned. Mutual pairwise relations return the direct relations between the input entities rather than the relations with all entity neighbors. Each entity object contains:
    • "text": string required - the entity text.
    • "type": string optional - the optional entity type. This field is required if "exact" is true.
    • "exact": boolean optional - If false, implicit disambiguation is performed. Implicit disambiguation will use the top one disambiguated entity for each input entity object. The default is false.
  • "context": {} optional - an optional object that includes contextual requirements.
    • "text": string optional - Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England. Input can be partial names or large passages containing relevant entity terms. Multiple terms can be passed together.
  • "sort": string optional - the sorting method for the relationships, can be score or frequency. The default is scorescore is based on relevance of relations and neighbors to the input entity and relevance to context if context is provided. frequency is the number of unique times each relation is identified.
  • "filter": {} optional - an object containing the relation types, entity types, and specific documents to filter by for this query. By default nothing is excluded.
    • "relation_types": {} optional a list of relation types to filter.
      • "exclude": [] optional a comma separated list of relation types to exclude from the query.
      • "include": [] optional a comma separated list of relation types explicitly include in the query. If specified, all other types are considered excluded.
    • "entity_types": {} optional a list of entity types to filter neighbors. Not applicable for multi entities input because no new neighbor is returned.
      • "exclude": [] optional a comma separated list of entity types to exclude from the query.
      • "include": [] optional a comma separated list of entity types explicitly include in the query. If specified, all other types are considered excluded.
    • "document_ids": [] optional a comma separated list of documents on which to perform the relationship query on.
  • "count": INT optional The number of relations to return. The default is 10. The maximum is 1000.
  • "evidence_count": INT optional The number of evidence instances to return for each identified relation. The default is 0. The maximum value for the evidence_count field is 10,000 divided by the number specified in the count field. See the Evidence section of this page for a detailed description and examples.

The query returns results in the following form:

 

{ "relations": [ { "type": "FOUNDEROF", "frequency": 7, "arguments": [ { "entities": [ { "type": "PERSON", "text": "Steve Jobs" } ] }, { "entities": [ { "type": "ORGANIZATION", "text": "Apple" } ] } ] } ] }

In each object in the relationship array, an arguments array is returned containing a pair of entities arrays, the first being the source or subject and the second being the target or object of the relationship.

If no match is found, the following JSON object is returned:

 

{ "relations": [] }

Evidence

For some entity or relationship queries it may be valuable to understand where the connections were identified. Evidence of the connections will let you reference the original document, clarify the results, or further disambiguate as appropriate. Beginning with collections created after 03-05-2018, both the query_entities and query_relations endpoints have the option of providing evidence in the returned results. This feature is available for collections created before 03-05-2018, but documents will need to be reingested to use this feature on those older collections.

Evidence is returned by adding the "evidence_count": INT field to the query object. This number represents the number of evidence items that will be retuned per response item. For example, if you specify a "count": of 5 response items, and "evidence_count": 2, the response would contain a total of 10evidence items (2 per response). The maximum number of evidence items returned in total for a single query is 10,000.

In query_entities responses, each object in the entities array will contain the specified number of evidence objects. These objects include the document_id of the document where the evidence was found, which field it was located in, the location of the evidence within that field, and the exact location of the identified entity.

 

{ "text": "Steve Jobs", "type": "Person", "evidence": [ { "document_id": "cb77ce6b-bb93-42a0-8643-dfb523e14da8", "field": "description", "start_offset": 305, "end_offset": 392, "entities": [ { "type": "Person", "text": "Steve Jobs", "start_offset": 311, "end_offset": 321 } ] } ] }

In query_relations each object in the relations array will contain the specified number of evidenceobjects. The returned evidence is structured the same as in query_relations with the locations of all related entities specified:

 

{ "type": "founderOf", "frequency": 7, "arguments": [ { "entities": [ { "type": "Person", "text": "Steve Jobs" } ] }, { "entities": [ { "type": "Organization", "text": "Apple" } ] } ], "evidence": [ { "document_id": "b95df4c1-d00f-4771-abb2-a52baea0444a", "field": "text", "start_offset": 243, "end_offset": 303, "entities": [ { "type": "Organization", "text": "Apple", "start_offset": 293, "end_offset": 298 }, { "type": "Person", "text": "Steve Jobs", "start_offset": 243, "end_offset": 253 } ] } ] }

Querying Knowledge Graph using the Discovery tooling

Those with service instances subscribed to the Advanced plan can query private collections with Knowledge Graph using the Discovery tooling.

To access Knowledge Graph querying in the Discovery tooling:

  1. Click Query icon to open the query page.
  2. Select your collection and click Get started.
  3. On the Build queries screen, choose the Knowledge graph tab, then Entities or Relationships.

Note: Not all Knowledge Graph features are available when using the Discovery tooling.

https://console.bluemix.net/docs/services/discovery/building-kg.html#watson-discovery-knowledge-graph

### 回答1: 知识图谱嵌入是一种将知识图谱中的实体和关系映射到低维向量空间中的技术。它可以帮助我们更好地理解和利用知识图谱中的信息,例如实体之间的相似性和关系的强度。知识图谱嵌入在自然语言处理、推荐系统、问答系统等领域有着广泛的应用。 ### 回答2: 知识图谱嵌入(Knowledge Graph Embedding)是指将知识图谱中的实体和关系表示为低维向量的过程,从而方便计算机进行数据处理和分析。通常情况下,知识图谱以三元组的形式呈现,即(头实体,关系,尾实体)。但是,这种表示方式存在一些问题,如数据稀疏性、无法进行复杂的语义推理和不适合用于大规模机器学习等问题。 知识图谱嵌入方法通过将实体和关系嵌入到低维向量中,使得实体之间和关系之间的相似度可以被量化,方便计算机进行数据处理和分析。常用的嵌入方法有TransE、TransR、TransH等等,这些方法可以将实体和关系嵌入到低维向量空间中,并保持一定的语义一致性和结构一致性,从而实现对实体和关系的推断和理解。 知识图谱嵌入技术可以应用于许多领域,如自然语言处理、推荐系统、问答系统等等。例如,在自然语言处理中,可以将实体和关系嵌入到低维向量空间中,从而实现对于实体关系的理解和推断,提高问答系统的准确性;在推荐系统中,可以将用户和商品嵌入到低维向量空间中,从而实现对于用户和商品之间的相似度计算,提高推荐系统的效果。 总之,知识图谱嵌入技术的发展可以有效地解决实体关系表示的问题,提高了计算机对于知识图谱数据的处理和分析能力,为我们提供了更加高效和精确的数据处理和分析方法。 ### 回答3: 知识图谱嵌入(knowledge graph embedding)是一种用于将知识图谱中的实体和关系等复杂结构进行编码的技术。知识图谱是一个用于存储和展示关于世界知识的图形化数据库,它由实体(例如人、地点、事件)和实体之间的关系(例如拥有、出生于、是)等构成。嵌入技术使得知识图谱更容易被机器学习算法所处理和理解。 传统的方式是将知识图谱变换为二元组形式进行处理,但这种处理方式不仅容易碰到零件、稀疏性问题,而且无法很好地进行计算。知识图谱嵌入技术的出现改变了这一问题。它通过将实体和关系嵌入到连续向量空间中,将高维空间中的非线性模式映射到低维空间中,从而方便距离计算和关系推理。这些嵌入向量能够保留知识图谱中实体和关系之间的语义关系,并且能够提供非常丰富而有效的信息。 知识图谱嵌入技术的应用包含了许多领域,如自然语言处理、计算机视觉、推荐系统等等。例如,在自然语言处理领域中,嵌入技术可以将单词和短语嵌入到向量空间中,以便于计算单词和短语之间的相似度。在推荐系统领域中,嵌入技术可以将用户和物品嵌入到向量空间中,从而在用户和物品之间建立起距离和相似度的关系,进而提高个性化推荐的效果。 目前,实体嵌入方法主要分为基于矩阵分解的方法、基于跳数预测的方法和基于神经网络的方法。而关系嵌入方法主要分为基于旋转法的模型、基于距离法的模型和基于神经网络的模型。这些方法都通过学习实体和关系嵌入表示,从而实现知识图谱的语义建模、推理和图谱补全等功能。 总之,知识图谱嵌入技术是一种将知识图谱中实体和关系嵌入到向量空间中的高效手段,其应用已经渗透到各个领域。未来,这项技术将继续发挥巨大的作用,为人们带来更多更好的智能应用。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值