deepdive的股权关系抽取实践

最新推荐文章于 2024-04-18 11:17:35 发布

LeafCC

最新推荐文章于 2024-04-18 11:17:35 发布

阅读量951

点赞数

本文链接：https://blog.csdn.net/cc815107613/article/details/102766864

版权

1、搭建项目框架并与数据库关联

运行命令：
psql postgres
CREATE DATABASE first OWNER leafccc;
echo “postgresql://leafccc@localhost:5432/first” >db.url

在这里插入图片描述

2、导入先验数据与文章

导入先验数据，在app.ddlog下编辑：

@source
transaction_dbdata(
    @key
    company1_name text,
    @key
    company2_name text
).

接着执行导入到postgres的命令：
(注意 1、每次更改app.ddlog都需要deepdive compile 2、执行完后输入：wq保存）

deepdive compile && deepdive do transaction_dbdata

查询一下：

deepdive query '?- transaction_dbdata(company1_name, company2_name).'

在这里插入图片描述

待抽取文章导入，将待抽取的文章articles_market.csv导入到postgres：
(注意文件改下名字或者自己将代码里文章文件名修改下，并且把文章中字句删除到只剩几十行)

@source
articles_market(
    id text,
    content text
).

再运行：

deepdive compile && deepdive do articles_market

查询一下：

deepdive query '?- articles_market(id, _).'

在这里插入图片描述

3、nlp模型进行文本处理

将以下输入app.ddlog文件中:

sentences(
    doc_id         text,
    sentence_index int,
    sentence_text  text,
    tokens         text[],
    lemmas         text[],
    pos_tags       text[],
    ner_tags       text[],
    doc_offsets    int[],
    dep_types      text[],
    dep_tokens     int[]
).


function nlp_markup over(
    doc_id text,
    content text
) returns rows like sentences
implementation "udf/nlp_markup.sh" handles tsv lines.


sentences+=nlp_markup(doc_id, content):-
articles_market(doc_id, content).

复制transaction/udf/的目录下的bazzar文件夹以及nlp_markup.sh文件到你自己项目的udf/中。这个模块需要重新编译。进入 bazzar/parser目录下，执行编译命令:

sbt/sbt stage

最后执行

deepdive compile && deepdive do sentences

执行成功之后，可以查看：

deepdive query '
doc_id, index, tokens, ner_tags
?- sentences(doc_id, index, text, tokens, lemmas, pos_tags, ner_tags, _, _, _).'

LeafCC

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
deepdive的股权关系抽取实践

1、搭建项目框架并与数据库关联运行命令：psql postgresCREATE DATABASE first OWNER leafccc;echo “postgresql://leafccc@localhost:5432/first” >db.url2、导入先验数据与文章导入先验数据，在app.ddlog下编辑：@sourcetransaction_dbdata( ...
复制链接

扫一扫