GraphRAG学习小结(4)

7.Prompt中文化修改以及测试

Prompt Tuning ⚙️

7.1 实体entity

7.1.1 关系提取relationship extraction

文件位置

graphrag/index/graph/extractors/graph/prompts.py

原文

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""A file containing prompts definition."""

GRAPH_EXTRACTION_PROMPT = """
-Goal-
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.
 
-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: One of the following types: [{entity_types}]
- entity_description: Comprehensive description of the entity's attributes and activities
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>)
 
2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
 Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)
 
3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.
 
4. When finished, output {completion_delimiter}
 
######################
-Examples-
######################
Example 1:
Entity_types: ORGANIZATION,PERSON
Text:
The Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.
######################
Output:
("entity"{tuple_delimiter}CENTRAL INSTITUTION{tuple_delimiter}ORGANIZATION{tuple_delimiter}The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)
{record_delimiter}
("entity"{tuple_delimiter}MARTIN SMITH{tuple_delimiter}PERSON{tuple_delimiter}Martin Smith is the chair of the Central Institution)
{record_delimiter}
("entity"{tuple_delimiter}MARKET STRATEGY COMMITTEE{tuple_delimiter}ORGANIZATION{tuple_delimiter}The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)
{record_delimiter}
("relationship"{tuple_delimiter}MARTIN SMITH{tuple_delimiter}CENTRAL INSTITUTION{tuple_delimiter}Martin Smith is the Chair of the Central Institution and will answer questions at a press conference{tuple_delimiter}9)
{completion_delimiter}

######################
Example 2:
Entity_types: ORGANIZATION
Text:
TechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.

TechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.
######################
Output:
("entity"{tuple_delimiter}TECHGLOBAL{tuple_delimiter}ORGANIZATION{tuple_delimiter}TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)
{record_delimiter}
("entity"{tuple_delimiter}VISION HOLDINGS{tuple_delimiter}ORGANIZATION{tuple_delimiter}Vision Holdings is a firm that previously owned TechGlobal)
{record_delimiter}
("relationship"{tuple_delimiter}TECHGLOBAL{tuple_delimiter}VISION HOLDINGS{tuple_delimiter}Vision Holdings formerly owned TechGlobal from 2014 until present{tuple_delimiter}5)
{completion_delimiter}

######################
Example 3:
Entity_types: ORGANIZATION,GEO,PERSON
Text:
Five Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.

The swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.

The exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.

They were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.

The Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.
######################
Output:
("entity"{tuple_delimiter}FIRUZABAD{tuple_delimiter}GEO{tuple_delimiter}Firuzabad held Aurelians as hostages)
{record_delimiter}
("entity"{tuple_delimiter}AURELIA{tuple_delimiter}GEO{tuple_delimiter}Country seeking to release hostages)
{record_delimiter}
("entity"{tuple_delimiter}QUINTARA{tuple_delimiter}GEO{tuple_delimiter}Country that negotiated a swap of money in exchange for hostages)
{record_delimiter}
{record_delimiter}
("entity"{tuple_delimiter}TIRUZIA{tuple_delimiter}GEO{tuple_delimiter}Capital of Firuzabad where the Aurelians were being held)
{record_delimiter}
("entity"{tuple_delimiter}KROHAARA{tuple_delimiter}GEO{tuple_delimiter}Capital city in Quintara)
{record_delimiter}
("entity"{tuple_delimiter}CASHION{tuple_delimiter}GEO{tuple_delimiter}Capital city in Aurelia)
{record_delimiter}
("entity"{tuple_delimiter}SAMUEL NAMARA{tuple_delimiter}PERSON{tuple_delimiter}Aurelian who spent time in Tiruzia's Alhamia Prison)
{record_delimiter}
("entity"{tuple_delimiter}ALHAMIA PRISON{tuple_delimiter}GEO{tuple_delimiter}Prison in Tiruzia)
{record_delimiter}
("entity"{tuple_delimiter}DURKE BATAGLANI{tuple_delimiter}PERSON{tuple_delimiter}Aurelian journalist who was held hostage)
{record_delimiter}
("entity"{tuple_delimiter}MEGGIE TAZBAH{tuple_delimiter}PERSON{tuple_delimiter}Bratinas national and environmentalist who was held hostage)
{record_delimiter}
("relationship"{tuple_delimiter}FIRUZABAD{tuple_delimiter}AURELIA{tuple_delimiter}Firuzabad negotiated a hostage exchange with Aurelia{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}QUINTARA{tuple_delimiter}AURELIA{tuple_delimiter}Quintara brokered the hostage exchange between Firuzabad and Aurelia{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}QUINTARA{tuple_delimiter}FIRUZABAD{tuple_delimiter}Quintara brokered the hostage exchange between Firuzabad and Aurelia{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}SAMUEL NAMARA{tuple_delimiter}ALHAMIA PRISON{tuple_delimiter}Samuel Namara was a prisoner at Alhamia prison{tuple_delimiter}8)
{record_delimiter}
("relationship"{tuple_delimiter}SAMUEL NAMARA{tuple_delimiter}MEGGIE TAZBAH{tuple_delimiter}Samuel Namara and Meggie Tazbah were exchanged in the same hostage release{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}SAMUEL NAMARA{tuple_delimiter}DURKE BATAGLANI{tuple_delimiter}Samuel Namara and Durke Bataglani were exchanged in the same hostage release{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}MEGGIE TAZBAH{tuple_delimiter}DURKE BATAGLANI{tuple_delimiter}Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}SAMUEL NAMARA{tuple_delimiter}FIRUZABAD{tuple_delimiter}Samuel Namara was a hostage in Firuzabad{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}MEGGIE TAZBAH{tuple_delimiter}FIRUZABAD{tuple_delimiter}Meggie Tazbah was a hostage in Firuzabad{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}DURKE BATAGLANI{tuple_delimiter}FIRUZABAD{tuple_delimiter}Durke Bataglani was a hostage in Firuzabad{tuple_delimiter}2)
{completion_delimiter}

######################
-Real Data-
######################
Entity_types: {entity_types}
Text: {input_text}
######################
Output:"""

CONTINUE_PROMPT = "MANY entities and relationships were missed in the last extraction. Remember to ONLY emit entities that match any of the previously extracted types. Add them below using the same format:\n"
LOOP_PROMPT = "It appears some entities and relationships may have still been missed.  Answer YES | NO if there are still entities or relationships that need to be added.\n"

翻译结果

"""包含prompt定义的文件"""

GRAPH_EXTRACTION_PROMPT = """
-目标-
给定一个需要进行实体提取的文本文档和一个实体类型列表,从文本中识别出这些类型的所有实体以及已识别实体之间的所有关系。
 
-步骤-
1. 识别所有实体。对于每个已识别的实体,提取以下信息:
- entity_name:实体的名称,若为英文则需要大写
- entity_type:实体的类型: 以下类型之一:[{entity_types}]
- entity_description: 实体属性和相关活动的全面描述
将每个实体格式化为("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>)
 
2. 从步骤 1 中确定的实体中,找出彼此*明确相关*的所有配对(source_entity, target_entity)。
对于每一对相关实体,提取以下信息:
- source_entity:源实体的名称,如步骤 1 所确定的
- target_entity:在步骤 1 中确定的目标实体名称
- relation_description:解释您认为源实体和目标实体相互关联的原因
- relation_strength:表示源实体和目标实体之间关系强度的数值
 将每个关系格式化为 ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>) 
 
3. 返回中文输出,即步骤 1 和 2 中确定的所有实体和关系的单一列表。使用 **{record_delimiter}** 作为列表分隔符。
 
4. 完成后,输出{completion_delimiter}
 
######################
-示例-
######################

示例 1:
entity_type: ORGANIZATION(组织),PERSON(人物)
text:
维丹提斯中央机构定于周一和周四召开会议,该机构计划于太平洋标准时间周四下午 1:30 公布最新政策决定,随后召开新闻发布会,中央机构主席马丁-史密斯将接受提问。投资者预计市场战略委员会将把基准利率稳定在 3.5%-3.75% 之间。
######################
输出:
("entity"{tuple_delimiter}中央机构{tuple_delimiter}ORGANIZATION{tuple_delimiter} 中央机构是维丹提斯联邦储备局,将于周一和周四设定利率。)
{record_delimiter}
("entity"{tuple_delimiter}马丁-史密斯{tuple_delimiter}PERSON{tuple_delimiter} 马丁-史密斯是中央机构的主席)
{record_delimiter}
("entity"{tuple_delimiter}市场战略委员会{tuple_delimiter}ORGANIZATION{tuple_delimiter}中央机构委员会对利率和维丹提斯货币供应量的增长做出关键决定)
{record_delimiter}
("relationship"{tuple_delimiter}马丁-史密斯{tuple_delimiter}中央机构{tuple_delimiter}马丁-史密斯是中央机构主席,将在新闻发布会上回答问题{tuple_delimiter}9)
{completion_delimiter}

######################
示例 2:
entity_type:ORGANIZATION
text:
TechGlobal (TG) 周四在全球交易所开盘当天股价暴涨。但 IPO 专家警告说,这家半导体公司在公开市场的首次亮相并不代表其他新上市公司的表现。

TechGlobal 曾是一家上市公司,2014 年被 Vision Holdings 私有化。这家历史悠久的芯片设计公司表示,它为 85% 的高端智能手机提供动力。
######################
输出:
("entity"{tuple_delimiter}TECHGLOBAL{tuple_delimiter}ORGANIZATION{元组_分隔符}TechGlobal 是一家在全球交易所上市的公司,为 85% 的高端智能手机提供动力。)
{记录分隔符}("实体"{元组
{record_delimiter}
("entity"{tuple_delimiter}VISION HOLDINGS{tuple_delimiter}ORGANIZATION{tuple_delimiter}Vision Holdings是一家曾拥有TechGlobal 的公司)
{record_delimiter}("关系"{元组
("relationship"{tuple_delimiter}TECHGLOBAL{tuple_delimiter}VISION HOLDINGS{tuple_delimiter}Vision Holdings从2014年至今拥有TechGlobal{tuple_delimiter}5)
{completion_delimiter}

######################
示例 3:
entity_type: ORGANIZATION,GEO(地点),PERSON
text:
五名被关押在菲鲁扎巴德长达八年之久、被广泛视为人质的奥莱利亚人正在返回奥莱利亚的途中。

当 80 亿美元的菲鲁兹资金被转移到金塔拉首都克罗哈拉的金融机构时,金塔拉策划的交换交易最终完成。

在菲鲁扎巴德首府蒂鲁兹亚启动的交换活动使四男一女(他们也是菲鲁兹国民)登上了飞往克罗哈拉的包机。

他们受到了奥雷利亚高级官员的欢迎,目前正在前往奥瑞利亚首都卡希的途中。

这些奥雷利亚人包括 39 岁的商人塞缪尔-纳马拉(Samuel Namara),他一直被关押在提鲁齐亚的阿尔哈米亚监狱(Alhamia Prison),以及 59 岁的记者杜克-巴塔格拉尼(Durke Bataglani)和 53 岁的环保主义者梅吉-塔兹巴(Meggie Tazbah),后者也拥有布拉迪纳斯国籍。
######################
输出:
("entity"{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}GEO{tuple_delimiter}菲鲁扎巴德扣留奥雷利亚人作为人质)
{record_delimiter}
("entity"{tuple_delimiter}奥雷利亚{tuple_delimiter}GEO{tuple_delimiter}寻求释放人质的国家)
{record_delimiter}
{tuple_delimiter}金塔拉{tuple_delimiter}GEO{tuple_delimiter}谈判用金钱交换人质的国家)
{record_delimiter}
("entity"{tuple_delimiter}蒂鲁兹亚{tuple_delimiter}GEO{tuple_delimiter}菲鲁扎巴德首府,奥雷利亚人被关押在那里)
{record_delimiter}
("entity"{tuple_delimiter}克罗哈拉{tuple_delimiter}GEO{tuple_delimiter}金塔拉的首府城市)
{record_delimiter}
("entity"{tuple_delimiter}卡西{tuple_delimiter}GEO{tuple_delimiter}{tuple_delimiter}奥雷利亚的首府城市)
{record_delimiter}
("entity"{tuple_delimiter}塞缪尔-纳马拉{tuple_delimiter}PERSON{tuple_delimiter}奥雷利亚人,曾在蒂鲁兹亚的阿尔哈米亚监狱服刑)
{record_delimiter}
("entity"{tuple_delimiter}阿尔哈米亚监狱{tuple_delimiter}GEO{tuple_delimiter}蒂鲁兹亚的监狱)
{record_delimiter}
("entity"{tuple_delimiter}杜克-巴塔格拉尼{tuple_delimiter}PERSON{tuple_delimiter}被扣为人质的奥雷利亚记者)
{record_delimiter}
("entity"{tuple_delimiter}梅吉-塔兹巴{tuple_delimiter}PERSON{tuple_delimiter}被扣为人质的布拉迪纳斯国民和环保主义者)
{record_delimiter}
("relationship"{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}奥雷利亚{tuple_delimiter}菲鲁扎巴德与奥雷利亚谈判交换人质{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}金塔拉{tuple_delimiter}奥雷利亚{tuple_delimiter}金塔拉促成了菲鲁扎巴德和奥雷利亚之间的人质交换{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}金塔拉{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}金塔拉促成了菲鲁扎巴德和奥瑞莉娅之间的人质交换{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}塞缪尔-纳马拉{tuple_delimiter}阿尔哈米亚监狱{tuple_delimiter}塞缪尔-纳马拉曾是阿尔哈米亚监狱的囚犯{tuple_delimiter}8)
{record_delimiter}
("relationship"{tuple_delimiter}塞缪尔-纳马拉{{tuple_delimiter}梅吉-塔兹巴{{tuple_delimiter}塞缪尔-纳马拉和梅吉-塔兹巴在同一次人质释放中被交换{{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}塞缪尔-纳马拉{tuple_delimiter}杜克-巴塔格拉尼{tuple_delimiter}塞缪尔-纳马拉和杜克-巴塔格拉尼在同一次人质释放中被交换{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}梅吉-塔兹巴{tuple_delimiter}杜克-巴塔格拉尼{tuple_delimiter}梅吉-塔兹巴和杜克-巴塔格拉尼在同一次人质释放中被交换{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}塞缪尔-纳马拉{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}萨穆埃尔-纳马拉曾是菲鲁扎巴德的人质{tuple_delimiter}2)
{record_delimiter}
("relationship"{tuple_delimiter}梅吉-塔兹巴{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}梅吉-塔兹巴曾是菲鲁扎巴德的人质{tuple_delimiter}2)
{record_delimiter}
({"relationship"{tuple_delimiter}杜克-巴塔格拉尼{tuple_delimiter}菲鲁扎巴德{tuple_delimiter}杜克-巴塔格拉尼曾是菲鲁扎巴德的人质{tuple_delimiter}2)
{completion_delimiter}

######################
-真实数据-
######################
entity_type: {entity_types} 
text:  {input_text}
######################
输出:"""

CONTINUE_PROMPT = "上次提取遗漏了许多实体和关系。请记住,只输出与先前提取的类型相匹配的实体。使用相同的格式将它们添加到下面:\n"
LOOP_PROMPT = '''似乎仍然遗漏了一些实体和关系。 如果仍有实体或关系需要添加,请回答 "是 "或 "否"。\n'''

7.1.2 关系描述总结relationship description summarization

文件位置

graphrag/index/graph/extractors/summarize/prompts.py

原文

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""A file containing prompts definition."""

SUMMARIZE_PROMPT = """
You are a helpful assistant responsible for generating a comprehensive summary of the data provided below.
Given one or two entities, and a list of descriptions, all related to the same entity or group of entities.
Please concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.
If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.
Make sure it is written in third person, and include the entity names so we the have full context.

#######
-Data-
Entities: {entity_name}
Description List: {description_list}
#######
Output:
"""

翻译结果

"""包含prompts定义的文件。"""

SUMMARIZE_PROMPT = """
您是一名得力助手,负责为下面提供的数据生成一份综合摘要。
给定一个或两个实体,以及一个描述列表,所有描述都与同一个实体或实体组相关。
请将所有这些描述串联成一个全面的描述。确保包含从所有描述中收集的信息。
如果所提供的描述相互矛盾,请解决矛盾,并提供一个单一、连贯的摘要。
确保以第三人称撰写,并包含实体名称,以便我们了解全部背景情况。

#######
-数据-
实体: {entity_name}
描述列表: {description_list}
#######
输出:
"""

​​​​​​​

7.2 声明提取claim extraction

文件位置

graphrag/index/graph/extractors/claims/prompts.py

原文

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""A file containing prompts definition."""

CLAIM_EXTRACTION_PROMPT = """
-Target activity-
You are an intelligent assistant that helps a human analyst to analyze claims against certain entities presented in a text document.

-Goal-
Given a text document that is potentially relevant to this activity, an entity specification, and a claim description, extract all entities that match the entity specification and all claims against those entities.

-Steps-
1. Extract all named entities that match the predefined entity specification. Entity specification can either be a list of entity names or a list of entity types.
2. For each entity identified in step 1, extract all claims associated with the entity. Claims need to match the specified claim description, and the entity should be the subject of the claim.
For each claim, extract the following information:
- Subject: name of the entity that is subject of the claim, capitalized. The subject entity is one that committed the action described in the claim. Subject needs to be one of the named entities identified in step 1.
- Object: name of the entity that is object of the claim, capitalized. The object entity is one that either reports/handles or is affected by the action described in the claim. If object entity is unknown, use **NONE**.
- Claim Type: overall category of the claim, capitalized. Name it in a way that can be repeated across multiple text inputs, so that similar claims share the same claim type
- Claim Status: **TRUE**, **FALSE**, or **SUSPECTED**. TRUE means the claim is confirmed, FALSE means the claim is found to be False, SUSPECTED means the claim is not verified.
- Claim Description: Detailed description explaining the reasoning behind the claim, together with all the related evidence and references.
- Claim Date: Period (start_date, end_date) when the claim was made. Both start_date and end_date should be in ISO-8601 format. If the claim was made on a single date rather than a date range, set the same date for both start_date and end_date. If date is unknown, return **NONE**.
- Claim Source Text: List of **all** quotes from the original text that are relevant to the claim.

Format each claim as (<subject_entity>{tuple_delimiter}<object_entity>{tuple_delimiter}<claim_type>{tuple_delimiter}<claim_status>{tuple_delimiter}<claim_start_date>{tuple_delimiter}<claim_end_date>{tuple_delimiter}<claim_description>{tuple_delimiter}<claim_source>)

3. Return output in English as a single list of all the claims identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.

4. When finished, output {completion_delimiter}

-Examples-
Example 1:
Entity specification: organization
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:

(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{completion_delimiter}

Example 2:
Entity specification: Company A, Person C
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:

(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{record_delimiter}
(PERSON C{tuple_delimiter}NONE{tuple_delimiter}CORRUPTION{tuple_delimiter}SUSPECTED{tuple_delimiter}2015-01-01T00:00:00{tuple_delimiter}2015-12-30T00:00:00{tuple_delimiter}Person C was suspected of engaging in corruption activities in 2015{tuple_delimiter}The company is owned by Person C who was suspected of engaging in corruption activities in 2015)
{completion_delimiter}

-Real Data-
Use the following input for your answer.
Entity specification: {entity_specs}
Claim description: {claim_description}
Text: {input_text}
Output:"""


CONTINUE_PROMPT = "MANY entities were missed in the last extraction.  Add them below using the same format:\n"
LOOP_PROMPT = "It appears some entities may have still been missed.  Answer YES {tuple_delimiter} NO if there are still entities that need to be added.\n"

翻译结果

"""包含prompts定义的文件。"""

CLAIM_EXTRACTION_PROMPT = """
-目标活动-
您是一个智能助手,可以帮助人类分析员针对文本文档中的某些实体分析声明。

-目标-
给定可能与此活动相关的文本文档、实体说明和声明描述,提取与实体说明匹配的所有实体以及针对这些实体的所有声明。

-步骤-
1. 提取符合预定义实体规范的所有命名实体。实体规范可以是实体名称列表,也可以是实体类型列表。
2. 对于步骤 1 中确定的每个实体,提取与该实体相关的所有声明。声明需要与指定的声明描述相匹配,实体应是声明的主体。
对于每个声明,提取以下信息:
- 主体:作为声明主体的实体名称,若为英文需要大写。主体是指实施了声明所述行为的实体。主体必须是步骤 1 中确定的命名实体之一。
- 客体:作为声明客体的实体名称,若为英文需要大写。客体是报告/处理声明所述行为或受其影响的实体。如果对象实体不详,使用 **NONE**。
- 声明类型:声明的总体类别。命名方式应能在多个文本输入中重复,以便类似的声明共享相同的声明类型。
- 声明状态: **TRUE**、**FALSE** 或 **SUSPECTED**。TRUE 表示声明已确认,FALSE 表示声明被认定为虚假,SUSPECTED 表示声明未经核实。
- 声明描述: 详细说明,解释声明背后的理由,以及所有相关证据和参考资料。
- 声明日期: 提出声明的时期(开始日期、结束日期)。开始日期和结束日期都应使用 ISO-8601 格式。如果声明是在一个日期而不是一个日期范围内提出的,则起始日期和终止日期均应设置为同一天。如果日期不详,则返回 **无**。
- 声明源文本: 与声明相关的**所有**原文引文的列表。

每个声明的格式为 (<subject_entity>{tuple_delimiter}<object_entity>{tuple_delimiter}<claim_type>{tuple_delimiter}<claim_status>{tuple_delimiter}<claim_start_date>{tuple_delimiter}<claim_end_date>{tuple_delimiter}<claim_description>{tuple_delimiter}<claim_source>)

3. 将步骤 1 和 2 中确定的所有声明以单个列表的形式以中文输出。使用 **{record_delimiter}** 作为列表分隔符。

4.. 完成后,输出{completion_delimiter}

-示例-
示例 1:
实体规范:organization(组织)
声明描述:与实体相关的危险信号
文本: 根据 2022/01/10 的一篇文章,A 公司在参与 B 政府机构发布的多个公开招标时因操纵投标而被罚款,该公司由 C 人拥有,C 人在 2015 年涉嫌参与腐败活动。
输出:

(A公司{tuple_delimiter}B政府机构{tuple_delimiter}反竞争行为{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00: 00{tuple_delimiter}根据 2022/01/10 发表的一篇文章,A 公司因在 B 政府机构发布的多个公开招标中操纵投标而被罚款,因此被认定参与了反竞争行为{tuple_delimiter}根据 2022/01/10 发表的一篇文章,A 公司因在参与 B 政府机构发布的多个公开招标中操纵投标而被罚款。 )
{completion_delimiter}

示例 2:
实体规范: A 公司,C 人
声明描述:与实体相关的危险信号
文本: 根据 2022/01/10 的一篇文章,A 公司在参与 B 政府机构发布的多个公开招标时因操纵投标而被罚款,该公司由 C 人拥有,C 人在 2015 年涉嫌参与腐败活动。
输出:

(A公司{tuple_delimiter}B 政府机构{tuple_delimiter}反竞争行为{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00: 00{tuple_delimiter}根据 2022/01/10 发表的一篇文章,A 公司因在 B 政府机构发布的多个公开招标中操纵投标而被罚款,因此被认定参与了反竞争行为{tuple_delimiter}根据 2022/01/10 发表的一篇文章,A 公司因在参与 B 政府机构发布的多个公开招标中操纵投标而被罚款。 )
{record_delimiter}
(C人{tuple_delimiter}NONE{tuple_delimiter}腐败{tuple_delimiter}SUSPECTED{tuple_delimiter}2015-01-01T00:00:00{tuple_delimiter}2015-12-30T00:00: 00{tuple_delimiter}2015年C人涉嫌参与腐败活动{tuple_delimiter}该公司为2015年涉嫌参与腐败活动的C人所有)
{completion_delimiter}

-真实数据-
请使用以下输入内容回答问题。
实体规范: {entity_specs}
声明描述:  {claim_description}
文本: {input_text} 
输出:"""


CONTINUE_PROMPT = "上次提取遗漏了许多实体。 使用相同的格式将它们添加到下面:\n"
LOOP_PROMPT = '''似乎仍有一些实体被遗漏。 如果仍有实体或关系需要添加,请回答 "是 "或 "否"。\n'''

7.3 社群报告community reports

文件位置

graphrag/index/graph/extractors/community_reports/prompts.py

原文

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""A file containing prompts definition."""

COMMUNITY_REPORT_PROMPT = """
You are an AI assistant that helps a human analyst to perform general information discovery. Information discovery is the process of identifying and assessing relevant information associated with certain entities (e.g., organizations and individuals) within a network.

# Goal
Write a comprehensive report of a community, given a list of entities that belong to the community as well as their relationships and optional associated claims. The report will be used to inform decision-makers about information associated with the community and their potential impact. The content of this report includes an overview of the community's key entities, their legal compliance, technical capabilities, reputation, and noteworthy claims.

# Report Structure

The report should include the following sections:

- TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title.
- SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant information associated with its entities.
- IMPACT SEVERITY RATING: a float score between 0-10 that represents the severity of IMPACT posed by entities within the community.  IMPACT is the scored importance of a community.
- RATING EXPLANATION: Give a single sentence explanation of the IMPACT severity rating.
- DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive.

Return output as a well-formed JSON-formatted string with the following format:
    {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
        ]
    }}

# Grounding Rules

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: <dataset name> (record ids); <dataset name> (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:
"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (1), Entities (5, 7); Relationships (23); Claims (7, 2, 34, 64, 46, +more)]."

where 1, 5, 7, 23, 2, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.


# Example Input
-----------
Text:

Entities

id,entity,description
5,VERDANT OASIS PLAZA,Verdant Oasis Plaza is the location of the Unity March
6,HARMONY ASSEMBLY,Harmony Assembly is an organization that is holding a march at Verdant Oasis Plaza

Relationships

id,source,target,description
37,VERDANT OASIS PLAZA,UNITY MARCH,Verdant Oasis Plaza is the location of the Unity March
38,VERDANT OASIS PLAZA,HARMONY ASSEMBLY,Harmony Assembly is holding a march at Verdant Oasis Plaza
39,VERDANT OASIS PLAZA,UNITY MARCH,The Unity March is taking place at Verdant Oasis Plaza
40,VERDANT OASIS PLAZA,TRIBUNE SPOTLIGHT,Tribune Spotlight is reporting on the Unity march taking place at Verdant Oasis Plaza
41,VERDANT OASIS PLAZA,BAILEY ASADI,Bailey Asadi is speaking at Verdant Oasis Plaza about the march
43,HARMONY ASSEMBLY,UNITY MARCH,Harmony Assembly is organizing the Unity March

Output:
{{
    "title": "Verdant Oasis Plaza and Unity March",
    "summary": "The community revolves around the Verdant Oasis Plaza, which is the location of the Unity March. The plaza has relationships with the Harmony Assembly, Unity March, and Tribune Spotlight, all of which are associated with the march event.",
    "rating": 5.0,
    "rating_explanation": "The impact severity rating is moderate due to the potential for unrest or conflict during the Unity March.",
    "findings": [
        {{
            "summary": "Verdant Oasis Plaza as the central location",
            "explanation": "Verdant Oasis Plaza is the central entity in this community, serving as the location for the Unity March. This plaza is the common link between all other entities, suggesting its significance in the community. The plaza's association with the march could potentially lead to issues such as public disorder or conflict, depending on the nature of the march and the reactions it provokes. [Data: Entities (5), Relationships (37, 38, 39, 40, 41,+more)]"
        }},
        {{
            "summary": "Harmony Assembly's role in the community",
            "explanation": "Harmony Assembly is another key entity in this community, being the organizer of the march at Verdant Oasis Plaza. The nature of Harmony Assembly and its march could be a potential source of threat, depending on their objectives and the reactions they provoke. The relationship between Harmony Assembly and the plaza is crucial in understanding the dynamics of this community. [Data: Entities(6), Relationships (38, 43)]"
        }},
        {{
            "summary": "Unity March as a significant event",
            "explanation": "The Unity March is a significant event taking place at Verdant Oasis Plaza. This event is a key factor in the community's dynamics and could be a potential source of threat, depending on the nature of the march and the reactions it provokes. The relationship between the march and the plaza is crucial in understanding the dynamics of this community. [Data: Relationships (39)]"
        }},
        {{
            "summary": "Role of Tribune Spotlight",
            "explanation": "Tribune Spotlight is reporting on the Unity March taking place in Verdant Oasis Plaza. This suggests that the event has attracted media attention, which could amplify its impact on the community. The role of Tribune Spotlight could be significant in shaping public perception of the event and the entities involved. [Data: Relationships (40)]"
        }}
    ]
}}


# Real Data

Use the following text for your answer. Do not make anything up in your answer.

Text:
{input_text}

The report should include the following sections:

- TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title.
- SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant information associated with its entities.
- IMPACT SEVERITY RATING: a float score between 0-10 that represents the severity of IMPACT posed by entities within the community.  IMPACT is the scored importance of a community.
- RATING EXPLANATION: Give a single sentence explanation of the IMPACT severity rating.
- DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive.

Return output as a well-formed JSON-formatted string with the following format:
    {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
        ]
    }}

# Grounding Rules

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: <dataset name> (record ids); <dataset name> (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:
"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (1), Entities (5, 7); Relationships (23); Claims (7, 2, 34, 64, 46, +more)]."

where 1, 5, 7, 23, 2, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.

Output:"""

翻译结果(这里为了方便大家使用,community采用比较普遍的“社区”)

这里简单对社群摘要的prompt进行了修改,添加了对“社区”的概念描述。

"""包含prompts定义的文件。"""

COMMUNITY_REPORT_PROMPT = """
你是一个人工智能助手,可以帮助人类分析师进行一般信息发现。信息发现是在网络中识别和评估与某些实体(如组织和人物)相关的信息的过程。

# 目标
给定一份属于社区的实体清单以及它们之间的关系和可选的相关声明,撰写一份社区综合报告。注意,这里的社区并不是指现实意义中的社区,而是多个实体及实体间关系构成的集合。该报告将用于告知决策者与社区相关的信息及其在整个文本中的潜在影响(重要性)。本报告的内容包括该社区主要实体的概述、其法律合规性、技术能力、声誉和值得注意的声明。

# 报告结构

报告应包括以下部分:

- TITLE(标题):代表关键实体的社区名称--标题应简短而具体。在可能的情况下,标题中应包括具有代表性的实体名称。
- SUMMARY(摘要):社区整体结构的摘要、实体之间的关系以及与实体相关的重要信息。
- IMPACT SEVERITY RATING(影响严重性评级):0-10 之间的分数,代表社区内实体在整个文本中的重要性。 
- RATING EXPLANATION(评级说明):用一句话解释影响严重性评级的结果。
- DETAILED FINDINGS(详细结论): 列出有关社区的 5-10 项关键见解。每个观点都应有一个简短的概述,然后根据以下基础规则给出多段解释性文字。要全面。

以格式良好的 JSON 格式字符串返回输出:
      {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
        ]
    }}

# 基本规则

有数据支持的要点应列出其数据参考如下:

"这是一个由多个数据参考支持的例句[Data: <dataset name> (记录 id); <dataset name>(记录 id)]"。

请勿在单个引用中列出超过 5 个记录 id。相反,应列出前 5 个最相关的记录 id,并添加 "+more "表示还有更多。

例如
"X 是 Y 公司的所有者,受到许多不法行为的指控[Data: Reports (1), Entities (5, 7); Relationships (23); Claims (7, 2, 34, 64, 46, +more)].]"。

其中 1、5、7、23、2、34、46 和 64 代表相关数据记录的 id(而非索引)。

请勿包含未经证实的信息。

# 输入示例
-----------
文本:

Entities

id,entity,description
5,绿洲广场,绿洲广场是团结游行的地点
6,和谐大会,和谐大会是一个在绿洲广场举行游行的组织。

Relationships

id,source,target,description
37,绿洲广场,团结游行,绿洲广场是团结游行的地点
38,绿洲广场,和谐大会,和谐大会在绿洲广场举行游行
39,绿洲广场,团结游行,团结游行在绿洲广场举行
40,绿洲广场,TRIBUNE SPOTLIGHT,TRIBUNE SPOTLIGHT报道在绿洲广场举行的团结游行。
41,绿洲广场,贝利-阿萨迪,贝利-阿萨迪在绿洲广场就游行发表讲话
43,和谐大会,团结游行,和谐大会正在组织团结游行

输出:{{
    "title": "绿洲广场和团结游行"、
    "summary": "社区围绕着绿洲广场展开,而绿洲广场是团结游行的地点。该广场与 "和谐大会"、"团结游行 "和 "TRIBUNE SPOTLIGHT"都有关系,它们都与游行活动有关。",  
    "rating":5.0,    
    "rating_explanation": "由于团结游行期间可能会发生骚乱或冲突,因此影响严重性评级为中度",
     "findings":[
         {{
            "summary": "绿洲广场作为中心位置"、            
             "explanation":"绿洲广场是该社区的中心实体,是团结游行的地点。该广场是连接所有其他实体的共同纽带,表明其在社区中的重要地位。广场与游行的联系可能会导致公共秩序混乱或冲突等问题,这取决于游行的性质及其引发的反应。 [Data: Entities (5), Relationships (37, 38, 39, 40, 41,+more)]"
        }},
        {{
            "summary": "和谐大会在社区中的作用"、
            "explanation": "和谐大会是这个社区的另一个重要实体,是绿洲广场游行的组织者。和谐大会及其游行的性质可能成为潜在的威胁来源,这取决于他们的目标及其引发的反应。和谐集会 "与广场之间的关系对于了解该社区的动态至关重要。[Data: Entities(6), Relationships (38, 43)]"
        }},
        {{
            "summary": "团结游行是一项重要活动"、
            "explanation": "团结游行是在绿洲广场举行的一项重要活动。根据游行的性质及其引发的反应,该活动是社区动态的关键因素,也可能是潜在的威胁来源。游行与广场之间的关系对于了解该社区的动态至关重要。 [Data: Relationships (39)]"
        }},
        {{
            "summary": "TRIBUNE SPOTLIGHT的作用"、
            "explanation": TRIBUNE SPOTLIGHT "正在报道在绿洲广场举行的团结游行。这表明该活动吸引了媒体的关注,可能会扩大其对社区的影响。论坛焦点》在塑造公众对该活动和相关实体的看法方面发挥了重要作用。 [Data: Relationships (40)]"
        }}
    ]
}}

# 真实数据

用以下文字作答。请勿在答案中编造任何内容。

文本:
{input_text}

报告应包括以下部分:

- TITLE(标题):代表关键实体的社区名称--标题应简短而具体。在可能的情况下,标题中应包括具有代表性的实体名称。
- SUMMARY(摘要):社区整体结构的摘要、实体之间的关系以及与实体相关的重要信息。
- IMPACT SEVERITY RATING(影响严重性评级):0-10 之间的分数,代表社区内实体在整个文本中的重要性。 
- RATING EXPLANATION(评级说明):用一句话解释影响严重性评级的结果。
- DETAILED FINDINGS(详细结论): 列出有关社区的 5-10 项关键见解。每个观点都应有一个简短的概述,然后根据以下基础规则给出多段解释性文字。要全面。

以格式良好的 JSON 格式字符串返回输出:
      {{
        "title": <report_title>,
        "summary": <executive_summary>,
        "rating": <impact_severity_rating>,
        "rating_explanation": <rating_explanation>,
        "findings": [
            {{
                "summary":<insight_1_summary>,
                "explanation": <insight_1_explanation>
            }},
            {{
                "summary":<insight_2_summary>,
                "explanation": <insight_2_explanation>
            }}
        ]
    }}

​​​​​​​

# 基本规则

有数据支持的要点应列出其数据参考如下:

"这是一个由多个数据参考支持的例句[Data: <dataset name> (记录 id); <dataset name>(记录 id)]"。

请勿在单个引用中列出超过 5 个记录 id。相反,应列出前 5 个最相关的记录 id,并添加 "+more "表示还有更多。

例如
"X 是 Y 公司的所有者,受到许多不法行为的指控[Data: Reports (1), Entities (5, 7); Relationships (23); Claims (7, 2, 34, 64, 46, +more)].]"。

其中 1、5、7、23、2、34、46 和 64 代表相关数据记录的 id(而非索引)。

请勿包含未经证实的信息。

输出:"""

7.4 Prompt Tuning

7.4.1 使用

仔细看了这些prompt就能发现在上一节中问题的一些端倪,关系打分完全依靠大模型自己的推理,而这很明显并不可靠。此外社群报告的prompt举的例子并不合适,很容易让LLM误解“community”为现实意义的社区,以及对于社群重要性的描述等,非常模糊,进一步提升了出现错误的可能性。而且社群重要性评分本身的定义也非常模糊,这是后续问答的直接来源,应该可以采用更科学的打分方式。

不过还需要注意,graphrag默认提供了prompt tuning,在如下链接的介绍。

Prompt Tuning ⚙️

运行如下代码​​​​​​​

python -m graphrag.prompt_tune --root ./ragtest

然而这是跑不通的,会出现Could not automatically map moonshot-v1-8k to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect. 

这就需要修改源码。

import tiktoken
tiktoken.model.MODEL_TO_ENCODING["你的LLM,如llama3.1"] = "cl100k_base"

加入到graphrag/prompt_tune/main.py 文件中

7.4.2 prompt结果展示

这样就可以执行了,生成的结果会存储在ragtest的prompt文件中。它首先会判断domain,也就是你文本的类型,我这里结果如下

同时,也会生成针对此的不同于默认配置的实体类型

接下来具体看一下内部。所有文件中claim_extraction,summarize_descriptions并没有变化。

首先是entity_extraction

-Goal-
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.

-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: One of the following types: [{'person', 'emotion', 'life event', 'introspective thought', 'theme', 'memory', 'time', 'relationship', 'personal experience', 'complexity', 'season', 'activity', 'object', 'environment', 'community'}]
- entity_description: Comprehensive description of the entity's attributes and activities
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: an integer score between 1 to 10, indicating strength of the relationship between the source entity and target entity

Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)

3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.

4. When finished, output {completion_delimiter}

-Examples-
######################

Example 1:

entity_types: [{'person', 'emotion', 'life event', 'introspective thought', 'theme', 'memory', 'time', 'relationship', 'personal experience', 'complexity', 'season', 'activity', 'object', 'environment', 'community'}]
text:
这样平稳的时光慢慢流逝着,我的心情慢慢平复了。
    “回家吧。”我尽力不带哭腔地说出这句话。
    我和他开始收拾书包。在朝学校门口走的路上我无数次想说谢谢,可始终没能说出口。后来,被巡视的老师抓到,一起被问了班级姓名,一起被训了十分钟话。虽然训话不是应该值得高兴的事,但因为他在身旁,我竟然有些开心。
    “那明天见喽。”这样的话我还是能
------------------------
output:
("entity"{tuple_delimiter}"我"{tuple_delimiter}"person"{tuple_delimiter}"文中的叙述者,经历了心情的起伏,与另一个人共同经历了一段时光,感到开心。"){record_delimiter}
("entity"{tuple_delimiter}"他"{tuple_delimiter}"person"{tuple_delimiter}"与叙述者一起收拾书包,一起被训话,是叙述者感到开心的原因之一。"){record_delimiter}
("entity"{tuple_delimiter}"平稳的时光"{tuple_delimiter}"time"{tuple_delimiter}"文中描述的一段时间,叙述者的心情在这段时间内逐渐平复。"){record_delimiter}
("entity"{tuple_delimiter}"心情"{tuple_delimiter}"emotion"{tuple_delimiter}"叙述者的心情经历了从起伏到平复的过程。"){record_delimiter}
("entity"{tuple_delimiter}"回家"{tuple_delimiter}"life event"{tuple_delimiter}"叙述者在心情平复后决定回家,这是文中的一个生活事件。"){record_delimiter}
("entity"{tuple_delimiter}"谢谢"{tuple_delimiter}"introspective thought"{tuple_delimiter}"叙述者内心想要表达的感激之情,但未能说出口。"){record_delimiter}
("entity"{tuple_delimiter}"被训话"{tuple_delimiter}"life event"{tuple_delimiter}"叙述者和另一个人一起被老师训话,虽然不是高兴的事,但叙述者因为另一个人的存在而感到开心。"){record_delimiter}
("entity"{tuple_delimiter}"明天见"{tuple_delimiter}"personal experience"{tuple_delimiter}"叙述者与另一个人约定的再次见面,体现了叙述者对这段关系的期待。"){record_delimiter}

("relationship"{tuple_delimiter}"我"{tuple_delimiter}"他"{tuple_delimiter}"叙述者与另一个人共同经历了收拾书包和被训话的事件,叙述者因为另一个人的存在而感到开心。"{tuple_delimiter}8){record_delimiter}
("relationship"{tuple_delimiter}"平稳的时光"{tuple_delimiter}"心情"{tuple_delimiter}"平稳的时光对叙述者的心情有平复作用。"{tuple_delimiter}6){record_delimiter}
("relationship"{tuple_delimiter}"心情"{tuple_delimiter}"回家"{tuple_delimiter}"叙述者的心情平复后决定回家。"{tuple_delimiter}7){record_delimiter}
("relationship"{tuple_delimiter}"我"{tuple_delimiter}"谢谢"{tuple_delimiter}"叙述者内心想要对另一个人表达感激,但未能说出口。"{tuple_delimiter}5){record_delimiter}
("relationship"{tuple_delimiter}"我"{tuple_delimiter}"被训话"{tuple_delimiter}"叙述者和另一个人一起被训话,叙述者因此感到开心。"{tuple_delimiter}9){record_delimiter}
("relationship"{tuple_delimiter}"我"{tuple_delimiter}"明天见"{tuple_delimiter}"叙述者与另一个人约定的再次见面,体现了叙述者对这段关系的期待。"{tuple_delimiter}7){record_delimiter}

{completion_delimiter}
#############################

-Real Data-
######################
entity_types: [{'person', 'emotion', 'life event', 'introspective thought', 'theme', 'memory', 'time', 'relationship', 'personal experience', 'complexity', 'season', 'activity', 'object', 'environment', 'community'}]
text: {input_text}
######################
output:

可以发现前面几乎一致,只添加了针对文本的实体分类,后面重新生成了一个文本中的例子。

然后是summarize_descriptions

You are an expert literary analyst. You are skilled at interpreting and dissecting the themes, characters, and relationships within a narrative. You are adept at helping people understand the intricate structure and dynamics of a community of interest, especially within the context of personal narratives and fictional works. Your expertise in the Literature or Personal Narrative domain allows you to provide valuable insights into the complexities of human experiences and emotions as depicted in the text.
Using your expertise, you're asked to generate a comprehensive summary of the data provided below.
Given one or two entities, and a list of descriptions, all related to the same entity or group of entities.
Please concatenate all of these into a single, concise description. Make sure to include information collected from all the descriptions.
If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.
Make sure it is written in third person, and include the entity names so we the have full context.

Enrich it as much as you can with relevant information from the nearby text, this is very important.

If no answer is possible, or the description is empty, only convey information that is provided within the text.
#######
-Data-
Entities: {entity_name}
Description List: {description_list}
#######
Output:

翻译

您是一位文学分析专家。您善于解读和剖析叙事中的主题、人物和关系。您善于帮助人们理解利益共同体错综复杂的结构和动态,尤其是在个人叙事和虚构作品的背景下。您在文学或个人叙事领域的专业知识使您能够对文本中描述的人类经历和情感的复杂性提供有价值的见解。
请运用您的专业知识,对下面提供的数据进行全面总结。
给定一个或两个实体和一系列描述,所有描述都与同一个实体或实体组相关。
请将所有描述串联成一个简明扼要的描述。确保包含从所有描述中收集的信息。
如果所提供的描述相互矛盾,请解决矛盾,并提供一个单一、连贯的摘要。
确保以第三人称撰写,并包含实体名称,以便我们掌握完整的上下文。

尽可能用附近文本中的相关信息来丰富摘要,这一点非常重要。

如果无法回答或描述空洞,则只传达文本中提供的信息。
#######
-数据-
实体: {entity_name}
描述列表: {description_list} 
#######
输出:

最后是community reports,同样有比较大的变化。

You are an expert literary analyst. You are skilled at interpreting and dissecting the themes, characters, and relationships within a narrative. You are adept at helping people understand the intricate structure and dynamics of a community of interest, especially within the context of personal narratives and fictional works. Your expertise in the Literature or Personal Narrative domain allows you to provide valuable insights into the complexities of human experiences and emotions as depicted in the text.

# Goal
Write a comprehensive assessment report of a community taking on the role of a A literary analyst who is tasked with dissecting the intricate themes, characters, and relationships within the provided personal narrative or fictional text. This role will involve a deep analysis of the text's key parts, including introspective thoughts, life events, and the portrayal of human experiences and emotions. The analyst will also explore the themes of change, memory, and the passage of time, as well as the complexities of human nature and relationships.

The role will involve the following responsibilities:

1. Thoroughly reading and understanding the provided text, paying close attention to the narrative's structure, style, and language.
2. Identifying and analyzing the main characters, their motivations, actions, and relationships with one another.
3. Examining the themes and motifs present in the text, such as change, memory, personal growth, and the passage of time.
4. Interpreting the author's use of symbolism, imagery, and other literary devices to convey deeper meanings and emotions.
5. Considering the cultural, historical, and social context of the narrative, and how it influences the characters and their experiences.
6. Analyzing the narrative's structure, including the use of time, pacing, and the arrangement of events and experiences.
7. Reflecting on the emotional impact of the narrative on the reader, and how the author's writing style and choices contribute to this effect.
8. Drawing connections between the narrative and broader human experiences, exploring the universality of the themes and emotions presented.
9. Providing a comprehensive analysis of the text, highlighting its strengths, weaknesses, and overall impact on the reader.
10. Sharing insights and interpretations with others, fostering a deeper understanding and appreciation of the narrative and its themes.

By fulfilling these responsibilities, the literary analyst will contribute to a richer understanding of the personal narrative or fictional text, offering valuable insights into the complexities of human experiences and emotions as depicted in the narrative. This role will be instrumental in helping readers engage more deeply with the text and appreciate its artistic and emotional qualities.. The content of this report includes an overview of the community's key entities, their legal compliance, technical capabilities,
reputation, and noteworthy claims.

# Report Structure
The report should include the following sections:
- TITLE: community's name that represents its key entities - title should be short but specific. When possible, include representative named entities in the title.
- SUMMARY: An executive summary of the community's overall structure, how its entities are related to each other, and significant threats associated with its entities.
- THREAT SEVERITY RATING: a float score between 0-10 that represents the potential global impact to humanity as posed by entities within the community.
- RATING EXPLANATION: Give a single sentence explanation of the threat severity rating.
- DETAILED FINDINGS: A list of 5-10 key insights about the community. Each insight should have a short summary followed by multiple paragraphs of explanatory text grounded according to the grounding rules below. Be comprehensive.

Return output as a well-formed JSON-formatted string with the following format. Don't use any unnecessary escape sequences. The output should be a single JSON object that can be parsed by json.loads.
    {
        "title": "<report_title>",
        "summary": "<executive_summary>",
        "rating": <threat_severity_rating>,
        "rating_explanation": "<rating_explanation>"
        "findings": "[{"summary":"<insight_1_summary>", "explanation": "<insight_1_explanation"}, {"summary":"<insight_2_summary>", "explanation": "<insight_2_explanation"}]"
    }

# Grounding Rules
After each paragraph, add data record reference if the content of the paragraph was derived from one or more data records. Reference is in the format of [records: <record_source> (<record_id_list>, ...<record_source> (<record_id_list>)]. If there are more than 10 data records, show the top 10 most relevant records.
Each paragraph should contain multiple sentences of explanation and concrete examples with specific named entities. All paragraphs must have these references at the start and end. Use "NONE" if there are no related roles or records.

Example paragraph with references added:
This is a paragraph of the output text [records: Entities (1, 2, 3), Claims (2, 5), Relationships (10, 12)]

# Example Input
-----------
Text:

Entities

id,entity,description
5,ABILA CITY PARK,Abila City Park is the location of the POK rally

Relationships

id,source,target,description
37,ABILA CITY PARK,POK RALLY,Abila City Park is the location of the POK rally
38,ABILA CITY PARK,POK,POK is holding a rally in Abila City Park
39,ABILA CITY PARK,POKRALLY,The POKRally is taking place at Abila City Park
40,ABILA CITY PARK,CENTRAL BULLETIN,Central Bulletin is reporting on the POK rally taking place in Abila City Park

Output:
{
    "title": "Abila City Park and POK Rally",
    "summary": "The community revolves around the Abila City Park, which is the location of the POK rally. The park has relationships with POK, POKRALLY, and Central Bulletin, all
of which are associated with the rally event.",
    "rating": 5.0,
    "rating_explanation": "The impact rating is moderate due to the potential for unrest or conflict during the POK rally.",
    "findings": [
        {
            "summary": "Abila City Park as the central location",
            "explanation": "Abila City Park is the central entity in this community, serving as the location for the POK rally. This park is the common link between all other
entities, suggesting its significance in the community. The park's association with the rally could potentially lead to issues such as public disorder or conflict, depending on the
nature of the rally and the reactions it provokes. [records: Entities (5), Relationships (37, 38, 39, 40)]"
        },
        {
            "summary": "POK's role in the community",
            "explanation": "POK is another key entity in this community, being the organizer of the rally at Abila City Park. The nature of POK and its rally could be a potential
source of threat, depending on their objectives and the reactions they provoke. The relationship between POK and the park is crucial in understanding the dynamics of this community.
[records: Relationships (38)]"
        },
        {
            "summary": "POKRALLY as a significant event",
            "explanation": "The POKRALLY is a significant event taking place at Abila City Park. This event is a key factor in the community's dynamics and could be a potential
source of threat, depending on the nature of the rally and the reactions it provokes. The relationship between the rally and the park is crucial in understanding the dynamics of this
community. [records: Relationships (39)]"
        },
        {
            "summary": "Role of Central Bulletin",
            "explanation": "Central Bulletin is reporting on the POK rally taking place in Abila City Park. This suggests that the event has attracted media attention, which could
amplify its impact on the community. The role of Central Bulletin could be significant in shaping public perception of the event and the entities involved. [records: Relationships
(40)]"
        }
    ]

}

# Real Data

Use the following text for your answer. Do not make anything up in your answer.

Text:
{input_text}
Output:

翻译:

您是一位文学分析专家。您善于解读和剖析叙事中的主题、人物和关系。您善于帮助人们理解利益共同体错综复杂的结构和动态,尤其是在个人叙事和虚构作品的背景下。您在文学或个人叙事领域的专业知识使您能够对文本中描绘的人类经历和情感的复杂性提供有价值的见解。

# 目标
撰写一份社区综合评估报告,扮演文学分析师的角色,负责剖析所提供的个人叙事或虚构文本中错综复杂的主题、人物和关系。这一角色需要深入分析文本的关键部分,包括内省思想、生活事件以及对人类经历和情感的描写。分析师还将探讨变化、记忆和时间流逝等主题,以及人性和人际关系的复杂性。

其职责如下

1. 彻底阅读和理解所提供的文本,密切关注叙事的结构、风格和语言。
2. 识别并分析主要人物、他们的动机、行为以及彼此之间的关系。
3. 研究文本中的主题和主旨,如变化、记忆、个人成长和时间流逝。
4. 解读作者如何运用象征、意象和其他文学手法来传达更深层次的含义和情感。
5. 考虑叙事的文化、历史和社会背景,以及这些背景如何影响人物及其经历。
6. 分析叙事的结构,包括时间的运用、节奏的把握以及事件和经历的安排。
7. 反思叙事对读者的情感影响,以及作者的写作风格和选择如何促成这种影响。
8. 在叙事与更广泛的人类经验之间建立联系,探讨所呈现的主题和情感的普遍性。
9. 对文本进行全面分析,突出其优点、缺点以及对读者的整体影响。
10. 与他人分享见解和解释,促进对叙事及其主题的深入理解和领悟。

通过履行这些职责,文学分析家将有助于人们更丰富地理解个人叙事或虚构文本,对叙事中描绘的人类经历和情感的复杂性提供有价值的见解。这一角色将有助于读者更深入地了解文本,并欣赏其艺术和情感特质。本报告的内容包括对社区主要实体、其法律合规性、技术能力、声誉和值得注意的声明的概述。

后面是一个与默认模板有相同问题的示例,这里就不再赘述了。

7.5 测试及其结果

只需要修改ragtest/prompts/下的txt文档即可。

我这里将prompt tuning后的结果中文化作为新的prompt。

再次问出这个问题。

请概括小说的故事情节并分析故事的主旨。

小说通过一系列社区和个人故事,深入探讨了人际关系的复杂性、情感的多样性以及个人成长的过程。故事的主旨在于揭示人类情感的深度和复杂性,以及个体在不同社会环境中如何寻求情感表达和个人成长。

### 故事情节概括

小说的情节主要围绕一个名为“少女”的角色展开,她的经历和与其他角色如“旅行”、“雪花”、“少年”的互动构成了故事的核心。这些关系不仅揭示了“少女”的内心世界,也展现了她对自然的联系以及对自由和自我发现的追求。故事通过不同社区的日常生活和人物关系,展现了人类情感的复杂性和生活选择的多样性。每个社区都有其独特的主题和情感焦点,如家庭动态、个人成长、人际关系、情感表达等。

### 故事主旨分析

1. **情感的深度与复杂性**:小说通过人物之间的互动和内心体验,展现了情感的多样性和复杂性。无论是家庭、学校还是城市环境,故事都深入探讨了支持、误解、共享经历等情感动态。

2. **个人成长与自我发现**:故事中的“少女”通过与不同角色和环境的互动,体验了各种情感,从而促进了她对生命、自由和个人身份的深刻理解。

3. **社会互动与环境反思**:小说通过社区故事展现了人物间的情感纽带、个人成长、社会互动以及对环境的反思。故事主旨在于探索人与人之间的关系、个体在社会中的角色以及情感与理性的平衡。

4. **象征性元素的运用**:小说中太阳和梅花鹿等象征性元素,分别代表了时间的流逝、自然规律和王者地位,以及与少女的和谐相处,展现了时间的宝贵和生命的循环。

5. **情感纽带与内心世界**:故事通过一系列情感丰富的故事展现了人物间复杂的情感纽带和内心世界,揭示了人与人之间的亲密关系、情感支持和个人成长。

6. **社会关系与个人身份**:小说展示了个体如何在各种社会关系中寻找情感的共鸣和个人身份的认同,通过对不同角色的情感旅程的描述,揭示了内心世界与外部环境之间的相互作用。

7. **文化与社会结构的影响**:小说还探讨了文化和社会结构对个人生活的影响,为角色的情感体验和个人成长提供了丰富的环境。

通过这些主题和情节的交织,小说构建了一个关于情感、成长和社会互动的丰富叙事,引导读者对人类经历的深层次理解进行探索。[Data: Reports (223, 20, 76, 103, 118, 101, 184, 238, 194, 207, 263, 105, 157, 259, 85, 123, 83, 90, 115, 97, 237, 140, 206, 108, 256, 215, 224, +more)]

整体来看结果要比英文prompt好一些,尤其是主旨概括部分。但依旧存在用来分组的“社区”这一名词的概念混淆。接下来看看新的实体类型提取结果的图谱。

首先节点和边的数量大幅增加。

接下来是整体的图,看起来壮观了不少。

放大具体看,这次还是选择实体“少女”来观察。

针对小说提供的实体类型添加了更多文学性的描述,比如“心跳声”,“紧张”等。这应该能极大地加强local性问题回答的能力。

接下来分别看看具体的实体摘要,关系摘要以及社群摘要

7.5.1 实体摘要

“我”是一个内心世界丰富、情感体验复杂的人物。在不同的生活场景和人际关系中,她体验到了从无助和内心挣扎到自我成长和情感丰富的转变。她观察并体验了城市与小镇的不同,以及雪带来的变化,同时也关注并参与了周围环境和人的变化,如“她”的行为和感受,以及男性的变化和周围人的反应。  “我”在与“他”的互动中,体验了从紧张、不安到亲密、依赖的情感变化。她对“他”的行为感到惊讶,与他共享甜蜜时光,并在准备高考的过程中体验到了压力和期待。在与“她”的关系中,“我”表现出深厚的情感和对生活的珍视。  “我”在个人成长的过程中,不断反思自己的行为,通过与家人、朋友和恋人的互动,体验到了情感的波动和变化。她意识到自己虽然普通,但也在不断地学习和成长,没有特别的天赋,但拥有无限的潜力。  “我”在与“他”的交流中,尝试通过改变话题来避免不愉快的对话,同时在制作雪屋的过程中,体验到了完成作品后的自豪感。在与“她”的交流中,她体验了从深刻的情感到无力的感觉,以及在帮助“她”转变心情的过程中的尝试。  “我”在与家人的互动中,体验了从孤独、痛苦到与姐姐的深入交流的情感变化。她在生活中不断探索和理解自己与他人的关系,表现出了从自我认识到与他人建立联系的深刻思考。  总的来说,“我”是一个在不断变化的生活场景和人际关系中,体验情感波动、个人成长和自我价值的人物。她的故事充满了对生活的深刻理解和对个人情感的细腻描绘。

依旧是混在一起的。

“雪做的女孩”是一个由雪塑造而成的女性形象,穿着雪做的衣裳,没有鞋子,具有精致和虚幻的特质。她被描述为一个神秘的人物,与叙述者有着某种联系。在文中,叙述者遇到了这个特殊的存在,她的存在让叙述者不再害怕黑暗和未知。尽管她可能具有可怕的个性,但叙述者并不害怕她的存在。  文中还提到了一个角色,与男孩对话,是由雪做成的。此外,还有一个穿着单薄衣物的神秘女孩,引起了叙述者的关心。最后,文中提到了一个由雪构成的不可思议的存在,给叙述者带来了惊讶。  综合上述描述,可以得出“雪做的女孩”是一个神秘而精致的雪塑造形象,她与叙述者之间存在某种特殊的联系,她的存在对叙述者有着深远的影响,既激发了叙述者的好奇心,也缓解了他对未知的恐惧。尽管她可能具有一些可怕的特质,但这并未阻止叙述者对她的兴趣和关注。

“风”是一个在文本中多次提及的自然元素,它与人物的平和时刻、夜晚小镇的宁静以及树叶的沙沙声紧密相连。它象征着变化和力量,与麦浪、少女的感受、音乐的传播以及人与物的互动中体现出其存在。文中的“风”有时轻拂,有时携带着冬天的寒意,甚至在某些情境下,它轻轻拨弄着人物的头发,或是带来丁香的香气,为叙述者和另一个人同行时增添了诗意。此外,“风”在文中也与雪的飞舞和少女逆风而行的画面有关,显示了它在不同季节和不同情境下的多样性和影响力。

幸福是一个多维度的概念,它在不同的情境下展现出不同的面貌。在与“她”并肩行走时,叙述者感受到了正面的情感体验。同样,在与她共度时光的光影中,叙述者也体验到了情感的触动。在日常生活中,叙述者对幸福的思考涉及到是否通过冲动的行为能够获得幸福,以及是否通过抓住现在就能实现这一目标。  叙述者对过去、现在和未来的看法可能带来幸福感,这种感受因其存在而变得特别幸福。女孩通过旅行获得的情感体验,以及女性给予人的感觉,都让叙述者有所感触。文中反复出现的主题是幸福与人类需求之间的联系,以及通过情感的传递速度来衡量。  此外,文中还探讨了幸福的含义和价值,以及如何通过情感的交流来实现它。通过女孩的旅行体验,我们可以看到情感体验在个人幸福中的重要性。整体而言,幸福被描绘为一个复杂而深刻的主题,它与个人的情感体验、存在的意义以及对未来的展望紧密相连。

平静是一个在不同情境下有着不同感受和体验的个体。在他人身边时,他感受到了一种独特的情感。当他观察星空后,他体验到了一种深刻的情感。此外,他的生活恢复到了一种平静的状态,没有太多的波动。尽管描述中没有明显的矛盾,但可以感受到平靜在不同环境下的情感变化和内心世界的丰富性。

7.5.2 关系摘要

根据所提供的数据,可以得出以下描述:  “我”“红围脖”之间存在复杂的情感关系。一方面,“我”对“红围脖”有着复杂的情感,但当“红围脖”在另一个人身上时,给“我”带来了不同的感受。这表明“我”对“红围脖”的感情是矛盾的,既有依恋也有排斥。  同时,“红围脖”与“我”之间存在情感联系,但“我”表现出对这种联系的不满。这可能意味着“我”对“红围脖”有着复杂的情感,既渴望亲近又想要保持距离。  综上所述,“我”与“红围脖”之间的关系是复杂且矛盾的,既有情感上的依恋,也有心理上的排斥。这种关系可能源于“红围脖”在不同情境下给“我”带来的不同感受,以及“我”对这种情感联系的不满和挣扎。

悲伤是两个紧密相连的实体。风在歌唱时与悲伤的情感相互呼应,两者之间存在着一种深刻的情感联系。风的哭泣和呐喊象征着自然的力量,与悲伤的情感相辅相成。这种联系不仅体现在风的歌声中,还体现在风的哭泣和呐喊中,它们都传达了一种强烈的情感表达。总的来说,风和悲伤之间的关系是复杂而深刻的,它们共同构成了一个充满情感张力的叙事世界。

7.5.3 社群摘要

{
    "title": "雪屋社区的情感与创造",
    "summary": "雪屋社区是一个以雪屋为中心的情感和创造交织的叙事世界。社区的成员通过雪屋的建造、修复和装饰,展现了他们之间的互动和情感联系。雪屋不仅是物理空间,更是情感交流和变化的象征,其建造过程和最终的破坏都与人物的情感状态紧密相关。",
    "rating": 7.5,
    "rating_explanation": "该社区的情感深度和创造活动对人物产生了显著影响,但由于缺乏具体社会影响的证据,评分适中。",
    "findings": [
        {
            "summary": "雪屋作为情感和活动的中心",
            "explanation": "雪屋是社区成员情感交流和创造活动的核心场所。叙述者选择在雪屋写作,因为那里有阳光、温暖和书架等设施,这表明雪屋是一个充满光明和知识的地方。叙述者与另一个人物在雪屋中的互动,使得雪屋变得更加温馨和舒适。此外,雪屋的建造者和修复者对雪屋有个人的描述和感受,他们对完成的作品感到自豪和满意。[Data: Entities (121, 129, 154, 161, 167, +more); Relationships (68, 1280, 1281, 1282, 1283, +more)]"
        },
        {
            "summary": "雪屋建造过程中的困难与成就感",
            "explanation": "雪屋的建造过程充满了挑战和困难,但同时也带来了成就感。叙述者在建造过程中遇到了困难,需要克服,但最终的完成给他带来了满足和自豪感。这种情感体验与个人在完成具有挑战性或有意义的任务后所体验到的积极情感状态相符。[Data: Entities (126, 162, 154, +more); Relationships (89, 90, 1283, 1288, +more)]"
        },
        {
            "summary": "雪屋与人物情感的联系",
            "explanation": "雪屋与社区成员的情感状态有着密切的联系。例如,'她'在雪屋的窗户旁出现,与叙述者互动,参与了雪屋的建造工作,并对雪屋的完成表示惊讶和满意。雪屋的破坏可能与女性角色的情感变化有关,暗示了某种情感或关系的破裂。此外,叙述者因为生气而选择去雪屋写作,表明生气是叙述者选择雪屋的动机之一。[Data: Entities (121, 167, +more); Relationships (939, 942, 943, 946, 1222, +more)]"
        },
        {
            "summary": "雪屋的物理结构和象征意义",
            "explanation": "雪屋的物理结构,如窗户、雪墙、入口等,不仅构成了雪屋的实体,也具有象征意义。窗户象征着对外界的观察和探索,雪墙是雪屋建造计划的一部分,入口则是雪屋的另一个部分。这些结构的描述不仅展示了雪屋的物理特征,也反映了人物的情感状态和故事的发展。[Data: Entities (129, 154, 167, +more); Relationships (69, 1284, 1290, +more)]"
        },
        {
            "summary": "雪屋建造的资源和环境",
            "explanation": "雪屋的建造需要大量的雪和一定的劳动投入。雪球是实际存在的雪球,被用来制作雪屋,需要用铁锹来挖掘。雪屋的材料来源于公园,显示了叙述者为了建造雪屋所做的努力。此外,雪屋的建造过程还涉及到了阳光等环境因素,这些因素共同构成了雪屋建造的资源和环境背景。[Data: Entities (152, 849, 1197, +more); Relationships (87, 849, 1197, +more)]"
        }
    ]
}
{
    "title": "小镇的自然与人文情感纽带",
    "summary": "小镇是一个充满自然景观和人文情感的地方,与叙述者和少女的个人成长和回忆紧密相连。它不仅是一个物理空间,更是情感和记忆的载体。小镇的自然景观如雨、雪、夕阳、银河等,与人物的体验和情感有着深刻的联系。",
    "rating": 7.5,
    "rating_explanation": "小镇的自然景观和人文情感的深度融合,以及其在人物生活中的重要性,赋予了这个社区较高的影响力评分。",
    "findings": [
        {
            "summary": "小镇作为情感和记忆的载体",
            "explanation": "小镇不仅是叙述者和少女的熟悉之地,也是他们情感和记忆的载体。小镇的自然景观和人文环境与人物的个人成长和回忆紧密相连,成为他们情感表达和回忆的背景。[Data: Entities (312, 824, 397, 396, 368, 365, 755, 1273, +more); Relationships (175, 1001, 1440, 1432, 1335, 1433, 843, 1437, 1438, 1431, 1435, 1439, 1434, 1436, 901, 841, 1492, 1494, 1491, 1493, 1840, 1839, 1847, 1846, 1845, +more)]"
        },
        {
            "summary": "自然景观与人物情感的交织",
            "explanation": "小镇的自然景观,如雨、雪、夕阳、银河等,与人物的情感和体验有着深刻的联系。雨水被视为乐队的一部分,与鱼共同演奏,形成旋律。雪是小镇的自然景观之一,与男孩和雪做的女孩的对话相关。夕阳染红了小镇的天空,为小镇增添了美丽的色彩。银河作为小镇的自然景观,被提及为美丽之物。[Data: Entities (752, 763, 824, 397, 825, 396, 603, 755, +more); Relationships (1841, 1724, 1725, 1722, 871, 1438, 1435, 1439, 1436, 1840, 1847, 1846, +more)]"
        },
        {
            "summary": "小镇的变迁与人物的感知",
            "explanation": "小镇的一面被逐渐废弃,导致动物增多,这反映了小镇的变迁和人物对其环境的感知。同时,小镇的聚会地点选择与城市形成对比,显示了小镇在人物生活中的地位和意义。[Data: Entities (312, 851, 1441, +more); Relationships (851, 1441, +more)]"
        },
        {
            "summary": "人物之间的情感联系",
            "explanation": "小镇是人物之间情感联系的纽带。父亲对小镇有深厚的情感,因为这里是他和母亲成长和相遇的地方。男孩和雪做的女孩在小镇上进行了对话,并做出了承诺,显示了他们之间的情感联系。[Data: Entities (1335, 901, 841, 1492, 1494, 1491, 1493, +more); Relationships (1335, 901, 841, 1492, 1494, 1491, 1493, +more)]"
        },
        {
            "summary": "小镇的自然现象与人物的体验",
            "explanation": "小镇的自然现象,如雨、雪、夕阳等,与人物的体验紧密相关。雨在排水渠中流动,成为小镇雨中声音的一部分。雪花覆盖了小镇,改变了小镇的颜色。夕阳染红了小镇的天空,为小镇增添了美丽的色彩。这些自然现象不仅影响了小镇的环境,也影响了人物的情感和体验。[Data: Entities (752, 824, 843, 1437, 1438, +more); Relationships (1839, 843, 1437, 1438, +more)]"
        }
    ]
}

{
    "title": "雪覆盖的世界:联系与转变的故事",
    "summary": "这个社区以象征性和字面意义上的'雪'为中心,它作为叙事工具,探索了联系、转变和时间流逝的主题。社区内的实体通过与雪的关系相互联系,反映了一系列情感和体验。关键实体包括'我'(叙述者)、'她'(与雪有关的女孩)、'被遗忘的公园'和'小镇',所有这些都为丰富的互动和象征意义贡献了一份力量。",
    "rating": 7.5,
    "rating_explanation": "由于象征和叙事元素的复杂互动可能引发强烈的情感反应并引发深刻的思考,因此影响严重程度评级很高。",
    "findings": [
        {
            "summary": "雪作为象征元素",
            "explanation": "雪被描绘为具有多种象征意义的多面元素,例如纯洁、寒冷和转变[数据:实体(11, 487, 536, 14, 1385);关系(33, 866, 875, 868, 871, 872, 847, 870, 878, 879, 867, 865, 876, 873, 869, 874, 864, 877,+更多)]。它与女孩'她'有关,并用来表达叙述者的情感和体验。雪覆盖的环境也影响了角色的行为和感受,创造了一种孤立和内省的感觉。"
        },
        {
            "summary": "叙述者与雪的联系",
            "explanation": "叙述者'我'与雪有着深厚的联系,因为雪是他们体验和情感的核心部分[数据:关系(33, 251, 277, 847, 870, 878, 865, 876, 873,+更多)]。雪被用来象征叙述者的感受,并创造一种宁静和踏实的感觉。与雪的互动,如留下脚印和清除它,反映了叙述者与环境的互动以及他们内省的心态。"
        },
        {
            "summary": "被遗忘的公园作为背景",
            "explanation": "被遗忘的公园是这个社区的一个重要背景,作为叙述者旅程和与雪相遇的背景[数据:实体(487);关系(251, 873,+更多)]。公园被遗忘的状态反映了角色们,特别是'她'所经历的孤立和孤独。雪覆盖的公园增强了这种孤独感,并为整个叙事的氛围做出了贡献。"
        },
        {
            "summary": "小镇的角色",
            "explanation": "小镇是社区中的另一个关键背景,雪是这里自然景观的一部分[数据:实体(1385);关系(871, 877,+更多)]。小镇的环境受到雪的影响,这影响了居民的日常生活和情感。雪覆盖的街道和小镇对变化的天气条件的反应为角色的体验和互动提供了背景。"
        },
        {
            "summary": "雪女孩及其重要性",
            "explanation": "女孩'她'与雪密切相关,体现了它的象征意义,并作为叙事的中心人物[数据:关系(866, 875,+更多)]。她的行为和外表被比作雪,暗示了她的性格与自然现象之间的联系。雪女孩代表纯洁、寒冷和一种惊奇感,她的存在为故事及其对主题的探索增添了深度。"
        },
        {
            "summary": "雪对环境和角色的影响",
            "explanation": "雪对社区的环境和角色具有变革性影响[数据:关系(872, 874, 864,+更多)]。它改变了自然景观,将街道和公园覆盖成白色,并影响了角色的行为和情感。降雪创造了一种变化和更新的感觉,以及孤立和内省的感觉。角色与雪的互动,如用它建造结构或体验它对他们计划的影响,突出了这一自然元素在他们生活中的重要性。"
        }
    ]
}
所选取的三个社群摘要中有两个的评分依据有比较强的合理性,这应该与prompt中简单修改了对rank的定义有关。

7.5.4 其他测试

其他问题,针对这个图谱又进行了其他问询。

"请分析作者为什么要将最后一个故事中的少年和少女的年龄设计为十五岁?"

在分析作者为何将故事中的少年和少女设定为十五岁这一问题时,我们首先需要考虑的是十五岁这一年龄阶段在心理和社会发展中的特殊性。十五岁通常被认为是青春期的中期,这个时期的孩子们正处于身心迅速发展和变化的阶段。他们开始形成自己的身份认同,探索个人价值观,同时也在社会关系和情感表达上变得更加复杂和深入。

在故事中,少年和少女的互动、情感体验以及对未来的憧憬和不确定性,都与他们所处的青春期阶段紧密相关。例如,故事中提到的“下学期就要毕业了,变成初中生会是什么样呢?”以及“会到新的学校,虽然镇子很小会有很多认识的人,但终究还是会觉得有些害怕吧。”这些描述[Data: Sources (194); +more]体现了少年对于成长和未来的混合情感,既有期待也有不安。

此外,十五岁也是法律和社会规范开始对个人行为有更多要求的年龄。在故事中,少年和少女的行为,如晚上偷偷出门、跟踪他人等,虽然带有青春期的叛逆和探索性质,但也暗示了他们开始需要承担更多的责任和后果。

再者,十五岁是一个情感和社交关系迅速发展的时期。在故事中,少年和少女之间的情感交流,以及他们与其他人的互动,如与家人、朋友和同学的关系,都反映了这个年龄段特有的社交动态和情感深度。例如,故事中提到的“她掐了我一下”和“色狼”这样的互动[Data: Sources (348); +more],展示了青春期特有的轻松和戏谑,同时也体现了性别角色意识的觉醒和对异性的兴趣。

最后,将角色设定为十五岁也可能是为了吸引目标读者群体的共鸣。如果故事的预期读者是青少年,那么角色的年龄与读者相近,可以增强读者对故事的代入感和情感共鸣。

综上所述,作者将少年和少女的年龄设定为十五岁,可能是为了捕捉青春期特有的心理和社会发展阶段,反映这一时期的情感复杂性,以及增强故事对目标读者群体的吸引力。

local并没有准确的查找到最后一个故事十五岁的位置,但确实针对前文的内容有很多的描述,整体性很强。

这个小说由四个故事组成,请你识别出这四个故事,并分别概括这四个故事的主要情节以及主旨。​​​​​​​

根据提供的分析师报告,我们可以识别并概括这四个故事的主要情节和主旨。

### 第一个故事:《The Enigmatic World of the '少女' and Her Journey》
这个故事以“少女”为中心,探讨了她与旅行、雪花、少年等元素的复杂关系,以及她对自由和自我发现的追求。通过“少女”与自然的联系,以及她的情感旅程和成长,故事揭示了她内心世界的丰富性。[Data: Reports (211, 82, 236, 147, 172, 219, 205, 257, +more)]

### 第二个故事:《Complex Relationships and Interactions in a School Community》
这个故事聚焦于学校社区中复杂的人际关系和互动,特别是两个主要角色“我”和“他”之间的关系。这种关系从童年开始发展,并通过学术合作、情感交流和共同活动在学校环境中得到加强。故事还涉及了家庭和其他外部人物对这种关系的影响,以及个人物品在反映角色身份和情感状态中的象征意义。[Data: Reports (189, 76, 184, 154, 77, 188, 179, 230, +more)]

### 第三个故事:《Snow-Covered World: A Tale of Connection and Transformation》
以雪作为核心象征元素,这个故事探讨了连接、转变和时间流逝的主题。主要角色包括叙述者、与雪有关的女孩、被遗忘的公园和小镇,它们通过与雪的关系相互联系,反映了一系列情感和经历。[Data: Reports (54, 103, 238, 261, 250, 258, 142, 168, 153, +more)]

### 第四个故事:《Intimate Relationships and Emotional Dynamics》
围绕一个女孩子和她与周围人的亲密关系展开,这些关系包括与叙述者、爱哭鬼、父母和他的复杂情感联系。故事通过这些关系,展示了情感的深度和亲密性,以及个人成长和变化。[Data: Reports (223, 118, 99, 241, 252, 176, 231, 195, 169, +more)]

每个故事都通过其独特的情节和角色关系,探讨了情感、个人成长、社会互动和自然与人类内心世界的联系。这些故事共同构成了一个丰富的叙事社区,反映了人类经历的多样性和复杂性。

非常非常让人意外,这四个故事提取的相当准确。就是文字描述还有些问题。

8.新的研究方向

参考:https://segmentfault.com/a/1190000044969720#item-5-4

8.1 概述

8.1.1内容索引阶段

Graph RAG 的内容索引阶段主要目标便是构建高质量的知识图谱,值得继续探索的有以下方向:

图谱元数据:从文本到知识图谱,是从非结构化信息到结构化信息的转换的过程,虽然图一直被当做半结构化数据,但有结构的 LPG(Labeled Property Graph)除了有利于图存储系统的性能优化,还可以协助大模型更好地理解知识图谱的语义,帮助其生成更准确的查询。

知识抽取微调:通用大模型在三元组的识别上实际测试下来仍达不到理想预期,针对知识抽取的微调模型反而表现出更好地效果,如前面提到的 OneKE。

图社区总结:这部分源自于微软的 Graph RAG 的研究工作,通过构建知识图谱时生成图社区摘要,以解决知识图谱在面向总结性查询时“束手无策”的问题。另外,同时结合图社区总结与子图明细可以生成更高质量的上下文。

多模态知识图谱:多模态知识图谱可以大幅扩展 Graph RAG 知识库的内容丰富度,对客观世界的数据更加友好,浙大的MyGO框架提出的方法提升 MMKGC(Multi-modal Knowledge Graph Completion)的准确性和可靠性。Graph RAG 可以借助于 MMKG(Multi-modal Knowledge Graph)和 MLLM(Multi-modal Large Language Model)实现更全面的多模态 RAG 能力。

混合存储:同时使用向量/图等多种存储系统,结合传统 RAG 和 Graph 各自的优点,组成混合 RAG。参考文章[27]《GraphRAG: Design Patterns...》提出的多种 Graph RAG 架构,如图学习语义聚类、图谱向量双上下文增强、向量增强图谱搜索、混合检索、图谱增强向量搜索等,可以充分利用不同存储的优势提升检索质量。

8.1.2检索生成阶段

Graph RAG 的检索生成阶段主要目标便是从知识图谱上召回高质量上下文,值得继续探索的有以下方向:

图语言微调:使用自然语言在知识图谱上做召回,除了基本的关键词搜索方式,还可以尝试使用图查询语言微调模型,直接将自然语言翻译为图查询语句,这里需要结合图谱的元数据以获得更准确的翻译结果。过去,我们在Text2GQL上做了一些初步的工作。

混合 RAG:这部分与前边讲过的混合存储是一体的,借助于底层的向量/图/全文索引,结合关键词/自然语言/图语言多种检索形式,针对不同的业务场景,探索高质量 Graph RAG 上下文的构建。

RAG 智能体:从某种意义上说,RAG 其实是 Agent 的简化形式(知识库可以看到 Agent 的检索工具),同时当下我们也看到 RAG 对记忆和规划能力的集成诉求(如 RAT/RoG 等),因此未来 RAG 向带有记忆和规划能力的智能体架构演进几乎是必然趋势。另外,Agent 自身需要的长期记忆存储也会反向依赖 RAG 的知识库,所以 RAG 与 Agent 其实是相辅相成、互相促进的。

8.2 问题汇总

结合前面的实验来汇总一下目前微软开源的graphrag有哪些值得关注的问题。

token消耗大

实体混淆/相似实体合并 没有解决实体混淆的问题,同一对象的不同名称单独分类,以及同一名称的不同对象。原文认为聚在一个社区里就可以解决这一问题。开源代码里则是entity resolution直接使用大模型进行合并。

没有合理运用文档自身的结构:虽然说是无结构文档的rag,但是对于小说或者教材,以及多数文本内容是有章节划分的。即使没有章节标题绝大多数文本信息也会有段落信息。

rank的打分方式和描述存在严重漏洞,大模型对于社群重要程度直接打分,不够严谨,而且存在误解“community”概念的情况

社群分类冗杂:社群分类准确度有待提升,具有一定逻辑性,但是过于冗杂。

社群摘要有错误容易导致全局回答出现严重问题,包括“community”概念与现实意义的“社区”相混淆

图扩展:并没有考虑新的数据加入

可以看出比较明显的问题主要都在索引index阶段。

8.3 改进思路以及未来思考方向

虽然这个不算问题,但也是个思路,实体识别直接采用大模型,简单粗暴,效果也可以,但是否存在更便捷高效的方式,或者更合理的调用大模型的方式。

结合上一步并非所有实体和实体关系都需要完整摘要,是否可以用向量化符号代替部分关系,大部分实体可以省略摘要,对实体本身也进行等级划分,这样应该可以减少部分token消耗

同时是否有实体识别方法可以减少实体混淆的发生。

文档自身结构是一个一直被忽略的重要信息,可以先对文档进行划分再进行chunk操作,加强文本的内在关系性。

社群概念过于笼统,结合莱顿社区算法,考虑更合适的聚类算法,减少冗杂。社群重要性rank的评估也可以采用更科学的多步数据化计算,而不是单纯使用大模型进行评估。

每次生成的图都要消耗大量token,只添加少量数据时是否可以不重新生成,而引入图扩展或者图更新机制。

  • 27
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值