llm-质量模型

使用LLM,对文本质量进行评估

论文 Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study

We compared three \text{\color{blue}three} three kinds of reference-free evaluation methods. The experimental results prove that \text{\color{blue}results prove that} results prove that ChatGPT is capable of evaluating text quality effectively from various perspectives without reference and demonstrates superior performance than most existing automatic metrics.
In particular, the Explicit Score \text{\color{blue}Explicit Score} Explicit Score (直接让模型打分), which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method \text{\color{blue}is the most effective and reliable method} is the most effective and reliable method among the three exploited approaches. However, directly comparing the quality of two texts may lead to sub-optimal results. We believe this paper will provide valuable insights for evaluating text quality with LLMs and have released the used data.

How accurately can ChatGPT assess text quality without references

It is feasible for ChatGPT to evaluate text quality without reference, and it outperforms commonly used metrics even with a simple prompt design.

What is the most effective approach to evaluate text quality using ChatGPT?

Generally, using ChatGPT to generate an explicit score for text quality is the best and most stable method among the three we compared. We suggest using greedy decoding for more reliable results.

Why may directly comparing two texts using ChatGPT yield suboptimal results?

主要是很难定义出,什么是高质量文本

Why is Implicit Score generally less effective than Explicit Score?

文章使用 txt-davinci 模型进行的实验,结果表明,Implicit Score的分布看起来是狭窄与尖峰的结构,而 Explicit Score 则是一个更平滑的分布

在这里插入图片描述

参考

https://arxiv.org/pdf/2304.00723.pdf

  • 12
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值