lm-evaluation-harness 实操版

向上Claire

已于 2025-03-23 19:49:41 修改

阅读量3.1k

点赞数 27

文章标签：笔记大模型测评

于 2024-05-27 18:40:55 首次发布

本文链接：https://blog.csdn.net/weixin_44522477/article/details/139231971

版权

我真的改了好久，还提了issue，终于跑通了，现在分享一下方法

open llm leaderboard

open llm leaderboard:
open llm leaderboard
我是想测这个榜单
这网址里面有测试的数据，和评价方式

在这里插入图片描述
这里是任务说明，和用的shot

这是版本，和现在的最新版不同（时间截至于2024年5月27日），第二个红框是给的任务list，由于给的版本和最新版本不同，特别是测这个MMLU：我找到最新版本的任务列表，根本没有MMLU的list
在这里插入图片描述

LangGPT

该模型是 LangGPT _ community 和 LangGPT _ kind 数据集上的/data/huggingface/Qwen 1.5-4B-Chat 的微调版本。

在2024年5月16日的时候，只有
Qwen-sft-la0v0.1和Qwen-sft-ls-v0.1

我今天看（0527）出了5个模型

lm-evaluation-harness

维护的也不错，写的也不错，只要调好，我觉得就可以用啦
我改了一个星期！！！！！！
基本上都是数据出了错误

安装

1. 仓库地址

https://github.com/EleutherAI/lm-evaluation-harness

2.配环境

git clone https://github.com/EleutherAI/lm-evaluation-harness
conda create -n cp
pip install -e .

3.参数

# 参数含义
--model：模型启动模式，可选hf代表huggingface，或者可选vllm,openai

--model_args：代表模型地址，可用huggingface仓库名（不是地址），例如EleutherAI/pythia-160m，或者本地地址例如./xxxxx
注意model_args可以附带很多参数，用逗号分隔
--model_args pretrained=EleutherAI/gpt-j-6b,parallelize=True,load_in_4bit=True,peft=nomic-ai/gpt4all-j-lora \
--model_args pretrained={model_name},tensor_parallel_size={number of GPUs to use},dtype=auto,gpu_memory_utilization=0.8 \
这里说明model_args可以有很多模型加载的参数，例如并行，量化，微调模型的peft位置等

--tasks：代表测试任务，指定具体任务 openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq
--device：使用的GPU机器标号，注意这个参数只能写一个，如果想用多gpu请使用tensor_parallel_size参数或者accelerate启动器
--batch_size：auto代表自动选择batch大小，:4代表在测试过程中共自动重选4次batch大小
--output_path：保存评估结果 
--use_cache：来缓存先前运行的结果。这样可以避免重复执行相同的（模型、任务）对进行重新评分。

参考的是该文：https://zhuanlan.zhihu.com/p/671235487

开始啦

先开始服务器出现了问题，之后我就需要解决服务器问题，参照我写的这篇文章

出错1：ImportError: cannot import name ‘soft_unicode’ from ‘markupsafe’

ImportError: cannot import name 'soft_unicode' from 'markupsafe'

检查这个库是否存在
pip show markupsafe
(cp) shensijia@xiaoniu04:~/lm-evaluation-harness$ pip show markupsafe
Name: MarkupSafe
Version: 2.1.5

是存在的

按照网上的做法：
https://blog.csdn.net/weixin_45438997/article/details/124261720

python -m pip install markupsafe==2.0.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

还是出现错误

出错1的解决方法：

找到23年6月的帖子：

Cannot import name ‘soft_unicode’ from ‘markupsafe’ [Solved]
出现“ImportError: cannot import name ‘soft_unicode’ from ‘markupsafe’”是因为该方法在2.1.0 版soft_unicode中已被弃用。markupsafe
要解决该错误，请运行pip install markupsafe==2.0.1命令安装markupsafe支持的最新版本soft_unicode。

重新再看是否安装为正确版本

是因为markupsafe并没有安装为我们确定的版本，还是2.1.5
卸掉重装

出错2：

  File "/data/home/shensijia/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--exact_match/009c8b5313309ea5b135d526433d5ee76508ba1554cbe88310a30f85bb57ec88/exact_match.py", line 1
    <!DOCTYPE html>
    ^
SyntaxError: invalid syntax

https://github.com/EleutherAI/lm-evaluation-harness/issues/1820

记录一下，exact_match在这个位置

File "/kercing/ssj22/lm-evaluation-harness/lm_eval/api/metrics.py", line 168, in <module>
    exact_match = hf_evaluate.load("exact_match")

一直就是这个exact_match的问题

能够明确的就是需要把 exact_match.py这个文件替换掉
就是如何找到这个文件的问题

出错2的解决方法


# exact_match = hf_evaluate.load("exact_match")
# ssj修改：
exact_match = hf_evaluate.load("/data/home/shensijia/lm-evaluation-harness/exact_match.py")

这个exact_match.py就是https://github.com/huggingface/evaluate/issues/590 里 ErikaaWang 提供的文件
但是它是直接替换了
我找不到在哪，我就自己新建了，之后替换的

出错3：

    with open(json_file, "r", encoding="utf-8") as reader:
PermissionError: [Errno 13] Permission denied: '/data/LLMs/qwen/Qwen1___5-7B/config.json'

应该是本地模型的问题

换镜像：

export HF_ENDPOINT=https://huggingface.co/

就变成了这样子：

OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like EleutherAI/gpt-j-6B is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

有点事情，先到这里，

_____##########

有看到有兄弟找我问这个任务
这个是我实习的时候推的玩的
我当时应该就把这个写完的，因为我当时也找了好久的bug解决方法
可是我懒啊，我记得我当时写了文档，但是我懒得找了

我觉得有很多大佬应该写的很详细，大家去看看吧
如果我找到满意的工作之后我再来补充完成