CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

最新推荐文章于 2024-08-22 08:16:54 发布

秋田刀鱼

最新推荐文章于 2024-08-22 08:16:54 发布

阅读量899

点赞数

文章标签： python nlp big data Powered by 金山文档

原文链接：http://cail.cipsc.org.cn/

版权

python 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

CAIL2018是首个用于法律判决预测的大规模中文法律数据集，包含超过260万个中国最高法院发布的刑事案件。此数据集提供了更详细的判决结果标注，如适用法律条款、罪名和刑期。研究表明，当前模型在预测法律案件判决结果方面仍面临挑战，特别是对于低频罪名和法律条款。该数据集的发布旨在促进法律判断预测和高级法律智能算法的研究。

摘要由CSDN通过智能技术生成

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

Chaojun Xiao1∗ Haoxi Zhong1∗ Zhipeng Guo1 Cunchao Tu1 Zhiyuan Liu1 Maosong Sun1 Yansong Feng2 Xianpei Han3 Zhen Hu4 Heng Wang4 Jianfeng Xu5

1 Department of Computer Science and Technology, Tsinghua University, China

2 Institute of Computer Science and Technology, Peking University, China

3 Institute of Software, Chinese Academy of Sciences, China

4 China Justice Big Data Institute

5 Supreme People’s Court, China

Abstract

In this paper, we introduce the Chinese AI and Law challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction. CAIL2018 con- tains more than 2.6 million criminal cases published by the Supreme People’s Court of China, which are several times larger than other datasets in existing works on judgment prediction. Moreover, the anno- tations of judgment results are more de- tailed and rich. It consists of applicable law articles, charges, and prison terms, which are expected to be inferred accord- ing to the fact descriptions of cases. For comparison, we implement several con- ventional text classification baselines for judgment prediction and experimental re- sults show that it is still a challenge for current models to predict the judgment re- sults of legal cases, especially on prison terms. To help the researchers make im- provements on legal judgment prediction, both CAIL2018 and baselines will be re- leased after the CAIL competition1

*indicates equal contribution.

1 http://cail.cipsc.org.cn/

2 http://wenshu.court.gov.cn/

1 Introduction

The task of Legal Judgment Prediction(LJP) aims to empower machine to predict the judgment re- sults of legal cases after reading fact descrip- tions. It has been studied for decades. Due to the limitation of publicly available cases, early works (Lauderdale and Clark, 2012; Segal, 1984; Keown, 1980; Ulmer, 1963; Nagel, 1963; Kort, 1957) usually conduct statistical analy- sis on the judgment results over a small num- ber of cases rather than predicting them. Withthe development of machine learning algorithms, some works take LJP as a text classifica- tion task and propose to extract efficient fea- tures from fact descriptons (Liu and Chen, 2017; Sulea et al., 2017; Aletras et al., 2016; Lin et al., 2012; Liu and Hsieh, 2006). These works are still restricted to particular case types and suffer from generalization issue when applied to other scenar- ios.

Inspired by the success of deep learning tech- niques on natural language processing tasks, re- searchers attempt to employ neural models to han- dle judgment prediction task under the text clas- sification framework (Luo et al., 2017; Hu et al., 2018). However, there is not a publicly accessi- ble high-quality dataset for LJP yet. Therefore, we collect and release the first large-scale dataset for LJP, i.e., CAIL2018, to encourage further ex- plorations on this task and other advanced legal intelligence algorithms.

CAIL2018 consists of more than 2.6 mil- lion criminal cases, which are collected from http://wenshu.court.gov.cn/ pub- lished by the Supreme People’s Court of China. These documents serve as the reference for professionals to improve their working efficiency and are expected to benefit researches on legal intelligent systems.

Specifically, each case in CAIL2018 consists of two parts, i.e., fact description and corresponding judgment result. Here, the judgment result of each case is refined into 3 representative ones, including relevant law articles, charges, and prison terms. Comparing with other datasets used by existing LJP works, CAIL2018 is on a larger scale and reserves richer annotations of judgment results. Totally, CAIL2018 contains 2,676,075 criminal cases, which are annotated with 183 criminal law articles and 202 criminal charges. Both the num- ber of cases and the number of labels are several times than other closed-source LJP datasets.

In the following parts, we give a detailed intro- duction to the construction of CAIL2018 and the

LJP results of baseline methods on this dataset.

2 Dataset Construction

We construct CAIL2018 from 5, 730, 302 criminal documents collected from China Judgments On- line2. There documents of criminal cases belong to five types, including judgment, verdict, concil- iation statement, decision letter, and notice. For LJP, we only concern on these cases with judg- ment results. Therefore, we only keep these judg- ment documents for training LJP models.

Each original document is well-structured and divided into several parts, e.g., fact description, court view, parties, judgment result and other in- formation. Therefore, we take the fact part as in- put and extract applicable law articles, charges and prison terms from referee result with regular ex- pressions.

Since many criminal cases own multiple defen- dants, which would increase the difficulty of LJP greatly, we only retain the cases with a single de- fendant.

In addition, there are also many low-frequency charges(e.g. insult the national flag, jailbreak) and law articles. We filter out cases with those charges and law articles whose frequency is smaller than 30. Besides, the top 102 law articles in Chinese Criminal Law are not relevant to specific charges, we filter out these law articles and charges as well.

After preprocessing, the dataset contains 2,676,075 criminal cases, 183 criminal law arti- cles, 202 charges and prison term. We also show an instance in CAIL2018 in Table 1.

It is worth noting that, the distribution of differ- ent categories in CAIL2018 is quite imbalanced. Considering the number of various charges, the top 10 charges cover 79.0% cases. On the con- trary, the bottom 10 charges only cover 0.12% cases. The imbalance issue in CAIL2018 makes it challenging to predict low-frequency charges and law articles.

3 Experiments

In this section, we implement and evaluate several typical text classification baselines on three sub- tasks of LJP, including law articles, charges, and

3.1 Baselines

We select following 3 baselines for comparison:

① TFIDF+ SVM

Term-frequency inverse doc- ument frequency (TFIDF) (Salton and Buckley, 1988) is an efficient method to extract word features and Support Vector Machine (SVM) (Suykens and Vandewalle, 1999) is a rep- resentative classification model. We implement TFIDF to extract text features and employ SVM

with linear kernel to train the classifier.

② FastText

FastText (Joulin et al., 2017) is a simple and efficient approach for text classifi- cation based on N-grams and Hierarchical soft-

max (Mikolov et al., 2013).

③ CNN

Convolutional Neural Network(CNN)

has been proven efficient in text classifica- tion (Kim, 2014). We employ the CNN with mul- tiple filters to encode fact descriptions.

3.2 Implementation Details

For all the methods, we randomly select 1,710,856 cases for training and 965,219 cases for testing. Since all fact descriptions are writ- ten in Chinese, we employ THULAC (Sun et al., 2016) for word segmentation. For TFIDF+SVM model, we limit the feature size to 5,000. For neural-based model, we employ Skip-Gram model (Mikolov et al., 2013) to train word embed- dings with 200 dimensions.

For CNN, we set the maximum length of a case description to 4, 096, the filter widths to (2, 3, 4, 5) with each filter size to 64 for consistency.

For training, we employ Adam (Kingma and Ba, 2015) as the opti- mizer. We set the learning rate to 0.001, the dropout rate to 0.5, and the batch size to 128.

3.3 Results and Analysis

We evaluate baseline models with several metrics, including accuracy(Acc.), macro-precision(MP) and macro-recall(MR) which are widely used in the classification task. Experimental results on the test set are shown in Table 2.

From this table, we find that current models can achieve considerable results on the accuracy of charges prediction and relevant law articles pre- diction. However, the results of MP and MR show that LJP is still a huge challenge due to the lack of training data and imbalance issue.

Fact

Relevant Law Article

Charge

Prison Term

Defendant

被告人胡某

The Defendant Hu

刑法第234条

the 234th article of criminal law

故意伤害

intentional injury

12个月

12months

胡某

Miss./Mr. Hu

Table 1: An example in CAIL2018.

Tasks	Charges	Relevant Articles	Terms of Penalty
Metrics	Acc.% MP% MR%	Acc.% MP% MR%	Acc.% MP% MR%
FastText	94.3 50.9 39.7	93.3 45.8 38.1	74.6 48.0 24.5
TFIDF+SVM	94.0 73.9 56.2	92.9 71.8 52.4	75.4 75.4 46.1
CNN	97.6 37.0 21.4	97.6 37.4 21.8	78.2 45.5 36.1

Table 2: LJP results on CAIL.

4 Conclusion

In this work, we release the first large-scale legal judgment prediction dataset, CAIL2018. Compar- ing with existing LJP datasets, CAIL2018 is the largest LJP dataset so far and publicly available. Moreover, CAIL2018 reserves more detailed an- notations, which is consistent with real-world sce- narios. Experiments demonstrate that LJP is still challenging and leave plenty of room to make im- provements.

References

Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Pre- dicting judicial decisions of the european court of human rights: A natural language processing per- spective. PeerJ Computer Science 2.

Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of COLING.

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of EACL.

R Keown. 1980. Mathematical models for legal pre- diction. Computer/LJ 2:829.

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of EMNLP.

Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.

Fred Kort. 1957. Predicting supreme court decisions mathematically: A quantitative analysis of the ”right to counsel” cases. American Political Science Re- view 51(1):1–12.

Benjamin E Lauderdale and Tom S Clark. 2012. The supreme court’s many median justices. American Political Science Review 106(4):847–866.

Wan-Chen Lin, Tsung-Ting Kuo, Tung-Jia Chang, Chueh-An Yen, Chao-Ju Chen, and Shou-de Lin. 2012. Exploiting machine learning models for chi- nese legal documents labeling, case classification, and sentencing prediction. In Processdings of RO- CLING. page 140.

Chao-Lin Liu and Chwen-Dar Hsieh. 2006. Exploring phrase-based classification of judicial documents for criminal charges in chinese. In Proceedings of IS- MIS. pages 681–690.

Yi Hung Liu and Yen Liang Chen. 2017. A two-phase sentiment analysis approach for judgement predic- tion. Journal of Information Science .

Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, and Dongyan Zhao. 2017. Learning to pre- dict charges for criminal cases with legal basis. In Proceedings of EMNLP.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013. Distributed representa- tions of words and phrases and their compositional- ity. In Proceedings of NIPS. pages 3111–3119.

Stuart S Nagel. 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42:1006.

Gerard Salton and Christopher Buckley. 1988. Term- weighting approaches in automatic text retrieval. Information processing & management 24(5):513– 523.

Jeffrey A Segal. 1984. Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. American Political Science Review 78(4):891–900.

Octavia Maria Sulea, Marcos Zampieri, Mihaela Vela, and Josef Van Genabith. 2017. Exploring the use of text classi cation in the legal domain. In Proceedings of ASAIL workshop.

Maosong Sun, Xinxiong Chen, Kaixu Zhang, Zhipeng Guo, and Zhiyuan Liu. 2016. Thulac: An efficient lexical analyzer for chinese. .

Table 2: LJP

results on CAIL.

Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters 9(3):293–300.

Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of EMNLP. pages 1422–1432.

S Sidney Ulmer. 1963. Quantitative analysis of judi- cial processes: Some practical and theoretical appli- cations. Law & Contemp. Probs. 28:164