LegalAI领域大规模预训练语言模型的整理、总结及介绍(持续更新ing…)

诸神缄默不语-个人CSDN博文目录

1. 通用大规模预训练语言模型

英语:

  1. LegalBERT
    1. 原始论文:(2020 EMNLP) LEGAL-BERT: The Muppets straight out of Law School - ACL Anthology
    2. 下载地址:huggingface在这里插入图片描述
  2. CaseLaw-BERT / Custom Legal-BERT
    1. 原始论文:(2021 ICAIL) When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings
    2. 下载地址:https://huggingface.co/casehold/custom-legalbert
  3. BERTLaw
    1. 原始论文:(2021) Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
    2. 下载地址:https://huggingface.co/nguyenthanhasia/BERTLaw
  4. PolBERT
    1. 原始论文:(2022 NeurIPS) Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
  5. legal-longformer
    1. 下载地址:https://huggingface.co/saibo/legal-longformer-base-4096
  6. (印度) InLegalBERT
    1. 原始论文:(2023 ICAIL) Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
    2. 下载地址:https://huggingface.co/law-ai/InLegalBERT
  7. (跨国)LexLM(backbone是RoBERTa)
    1. 原始论文:(2023 ACL) LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
    2. checkpoint已放到transformers上:
      from transformers import AutoModel, AutoTokenizer
      
      model = AutoModel.from_pretrained("lexlms/legal-roberta-base")
      tokenizer = AutoTokenizer.from_pretrained("lexlms/legal-roberta-base")
      
  8. (美国)(2024 CIKM) LawLLM: Law Large Language Model for the US Legal System
    类案检索、推荐先例、预测判决结果

中文:

  1. InterLM-Law
    1. 原始论文:(2024) InternLM-Law: An Open Source Chinese Legal Large Language Model
  2. Lawformer
    1. 原始论文:(2021) Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
    2. 下载方式:thunlp/LegalPLMs: Source code and checkpoints for legal pre-trained language models.
  3. 民事BERT & 刑事BERT
    https://github.com/thunlp/OpenCLaP

意大利语:

  1. ITALIAN-LEGAL-BERT
    1. 原始论文:(2022) ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law
    2. 下载地址:https://huggingface.co/dlicari/Italian-Legal-BERT

罗马尼亚语:

  1. jurBERT
    1. 原始论文:(2021 NLLP) jurBERT: A Romanian BERT Model for Legal Judgement Prediction

西班牙语:

  1. RoBERTalex
    1. 原始论文:(2021) Spanish Legalese Language Model and Corpora
    2. 下载地址:PlanTL-GOB-ES/RoBERTalex · Hugging Face

土耳其语:

  1. BERTurk
    1. 原始论文:(2024) LegalTurk Optimized BERT for Multi-Label Text Classification and NER

多语言:

  1. ParaLaw Nets(看论文应该是日语和英语)
    1. 原始论文:(2021 COLIEE) ParaLaw Nets – Cross-lingual Sentence-level Pretraining for Legal Text Processing
    2. 下载地址:我猜是这个:nguyenthanhasia/XLM-Paralaw · Hugging Face
  2. LegalXLMs
    1. 原始论文:(2023) MultiLegalPile: A 689GB Multilingual Legal Corpus
    2. 下载地址:太多了,待补

越南语:

  1. nguyenthanhasia/VNBertLaw · Hugging Face
  2. PhoBERT
    1. 原始论文:(2020 EMNLP) PhoBERT: Pre-trained language models for Vietnamese
    2. 官方GitHub项目(介绍了各个预训练模型checkpoint的地址和下载方式):VinAIResearch/PhoBERT: PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)

法语

  1. JuriBERT
    1. 原始论文:(2022) JuriBERT: A Masked-Language Model Adaptation for French Legal Text
    2. 下载地址:http://master2-bigdata.polytechnique.fr/resources#juribert(用transformers包的)

葡萄牙语

  1. JurisBERT(巴西)
    1. 原始论文:(2023 ICCSA) JurisBERT: A New Approach that Converts a Classification Corpus into an STS One
    2. 下载地址:https://huggingface.co/alfaneo

评测

  1. open-compass/LawBench: Benchmarking Legal Knowledge of Large Language Models

联邦学习大模型

  1. (2024 DASFAA) FedJudge: Federated Legal Large Language Model

2. 对话模型

中文:

  1. Lawyer LLaMA
    AndrewZhe/lawyer-llama: 中文法律LLaMA
    1. 原始论文:(2023) Lawyer LLaMA Technical Report
    2. 官方GitHub项目:AndrewZhe/lawyer-llama: 中文法律LLaMA
      本地部署版:lawyer-llama-13b-beta1.0已公开(lawyer-llama/run_inference.md at main · AndrewZhe/lawyer-llama · GitHub),但是必须要LLaMA的权重,而我还在排LLaMA的队,所以等着吧
  2. 智海-录问
    zhihaiLLM/wisdomInterrogatory
  3. LawGPT
    pengxiao-song/LaWGPT: 🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
  4. LexiLaw
    CSHaitao/LexiLaw: LexiLaw - 中文法律大模型
  5. JurisLMs
    seudl/JurisLMs: JurisLMs: Jurisprudential Language Models
  6. ChatLaw
    Chatlaw ——面向未来的法律人工智能:在排队了
    官方GitHub项目:PKU-YuanGroup/ChatLaw: 中文法律大模型:我看issue有很多人都吐槽跑不起来,我就不试了
  7. BaoLuo-LawAssistant-sftglm-6b 宝锣法律大模型1.0
    https://huggingface.co/xuanxuanzl/BaoLuo-LawAssistant-sftglm-6b
    作者官方的知乎博文:宝锣法律大模型及法律AI助理开源 - 知乎
  8. davidpig/lychee_law: 律知, 法律咨询大模型

英文:

  1. Insolvency bot legal n l p | Fast Data Science:这个是专门做公司破产场景的
  2. LawGPT 1.0
    没给代码,无图言屌。
    1. 原始论文:A Brief Report on LawGPT 1.0: A Virtual Legal Assistant Based on GPT-3

应用

  1. Harvey
  2. CoCounsel:文件审查、法律研究备忘录、证词准备和合同分析
  3. DoNotPay
  4. DemandsAI:准备律师函
  5. 幂律智能

3. 分句

多语言:

  1. https://huggingface.co/models?search=rcds/distilbert-sbd(英语、西班牙语、德语、意大利语、葡萄牙语、法语)
    1. 原始论文:(2023 ICAIL) MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

4. 文本分类

多语言:

  1. PyEuroVoc(欧盟成员国和候选成员国的语言)按照EuroVoc的indicator来进行分类。基于BERT
    1. 原始论文:(2021 RANLP) PyEuroVoc: A Tool for Multilingual Legal Document Classification with EuroVoc Descriptors
    2. 下载地址:https://pypi.org/project/pyeurovoc/

5. 信息抽取

  1. FPDM
    这个原模型是从open-domain迁移到specific domain的工作,法律领域主要做的是contract review(抽取重要信息)
    1. 原始论文:(2023) FPDM: Domain-Specific Fast Pre-training Technique using Document-Level Metadata
    2. 给了代码和数据集:https://drive.google.com/drive/folders/1RT7g_cTR_twz75xmFjDgQmCPWC8sZSFK

6. 案例检索

  1. SAILER
    1. 原始论文:(2023 SIGIR) SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
      原名caseformer,talk截屏见我之前写的博文:Caseformer talk PPT截屏
    2. CSHaitao/SAILER: The official repo for our SIGIR’23 Full paper: Structure-aware Pre-trained Language Model for Legal Case Retrieval

7. 文本摘要

多语言:

  1. PRIMERA及其他架构
    原始论文:(2022) Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
    下载地址也在数据集的官方GitHub项目里面:https://github.com/multilexsum/dataset
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

诸神缄默不语

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值