KAG来了,RAG慌了!

KAG来了,RAG慌了!

原创 热爱AI的 NLP前沿 2024年10月31日 11:55 湖北

上个周,OpenSPG 开源了KAG 框架,通过利用知识图谱和向量检索的优势,在四个方面双向增强LLM和知识图谱,以解决 RAG 存在的挑战(RAG 存在着向量相似度与知识推理相关性差距大、对知识逻辑(如数值、时间关系、专家规则等)不敏感等问题,这些都阻碍了专业知识服务的落地。)。

整个框架包括kg-builder和kg-solver两部分

  • 图片

    kg-builder实现了对LLM友好的知识表示,支持无schema约束的信息提取和有schema约束的专业知识构建,并支持图结构与原始文本块之间的互索引表示

  • kg-solver采用逻辑形式引导的混合求解和推理引擎,包括规划、推理和检索三种类型的运算符,将自然语言问题转化为结合语言和符号的问题求解过程。

知识表示:

图片

KAG参考了DIKW(数据、信息、知识和智慧)的层次结构,将SPG升级为对LLM友好的版本,能够处理非结构化数据、结构化信息和业务专家经验。采用版面分析、知识抽取、属性标化、语义对齐等技术,将原始的业务数据&专家规则融合到统一的业务知识图谱中。

推理步骤:

图片

  1. 将自然语言问题转换成可执行的逻辑表达式,此处依赖的是项目下的概念建模,可参考黑产挖掘文档。

  2. 将转换的逻辑表达式提交到 OpenSPG reasoner 执行,得到用户的分类结果。

  3. 将用户的分类结果进行答案生成。

效果如何?

KAG在多跳问答任务中表现优异,相较于其他方法如NaiveRAG、HippoRAG等,在hotpotQA上的F1分数提高了19.6%,在2wiki上的F1分数提高了33.5%

图片

NLP前沿

03-11
### Kaggle Competitions and Datasets Overview Kaggle, an online community of data scientists and machine learners, hosts competitions to solve real-world problems with machine learning techniques[^1]. Participants can access various datasets that are provided by companies and organizations from different industries. These datasets cover a wide range of topics such as healthcare, finance, technology, etc., which allows users not only to compete but also to learn new skills through practice on diverse projects. In addition to hosting competitions, Kaggle offers a platform where participants share their work via kernels (now known as notebooks), discuss ideas within forums, and collaborate on improving models together. For those interested specifically in machine learning modeling approaches used during these events or when working with dataset challenges, common algorithms include classification methods like Random Forest, GBM (Gradient Boosting Machine), Logistic Regression, Naive Bayes, Support Vector Machines, k-Nearest Neighbors; while regression tasks may involve using Random Forest again alongside other options including Linear Regression, Ridge, Lasso, SVR (Support Vector Regressor)[^3]. ```python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score # Example code snippet showing how one might start exploring a dataset obtained from Kaggle. data = pd.read_csv('path_to_kaggle_dataset.csv') X_train, X_test, y_train, y_test = train_test_split(data.drop(columns=['target']), data['target'], test_size=0.2) model = RandomForestClassifier() model.fit(X_train, y_train) predictions = model.predict(X_test) print(f'Accuracy: {accuracy_score(y_test, predictions)}') ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

强化学习曾小健

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值