项目简介:
在医学领域,诊断准确性对患者治疗和预后至关重要。然而,误诊率仍然较高,尤其在初级医疗层面,约达20%。为了提高诊断的准确性,医学界不断努力,推动传统方法、技术创新和人工智能(AI)的应用。近年来,基于大型语言模型(LLM)的自然语言处理技术在医学中展现出巨大潜力,能够实现诊断过程的智能化和精准化。本研究介绍了面向医学诊断辅助的通用型医学语言模型——MedFound。该模型通过大规模医学语料库预训练,结合临床知识和自然语言理解技术,旨在提升诊断准确性和效率。通过一系列实验验证,MedFound在多个临床科室中表现出广泛的应用前景,并提出了用于评估LLM性能的CLEVER框架,涵盖了医学案例理解、诊断推理等多个方面。研究表明,随着技术的进步,LLM将在医学诊断中发挥越来越重要的作用。
模型介绍:一种名为MedFound的通用医学语言模型,及其针对疾病诊断辅助优化的变体MedFound-DX-PA。
1.克隆或手动下载代码
git clone https://github.com/medfound/medfound
2.创建并激活Python虚拟环境:
1.创建虚拟环境:conda create -n env_name python=3.8
2.激活虚拟环境:conda activate env_name
3.安装依赖
pip install -r requirements.txt
确保安装所有必要的 Python 依赖,但在复现过程中会遇到安装失败问题,建议手动安装所有依赖。提示:在安装deepspeed和faiss两个库时出现了安装失败问题,要检查pytorch是否安装成功(pytorch版本2.2.1)。
deepspeed安装报错:Windows下安装DeepSpeed报错FileNotFoundError: [WinError 2] 系统找不到指定的文件。subprocess.check_output
解决办法:参考Windows下安装DeepSpeed报错_error: failed building wheel for deepspeed-CSDN博客
试了以上的方法,但是也未成功,后来发现是Git未配置环境变量。参考Git的安装和环境配置_git环境-CSDN博客
faiss安装报错:
ERROR: Could not find a version that satisfies the requirement faiss (from versions: none)
ERROR: No matching distribution found for faiss
解决办法:pip3 install faiss -cpu
4.根据Github 中项目提示,在Pycharm中建立demo.py文件
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "D:/python-project/medfound-main/MedFound-Llama3-8B-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
data = pd.read_json('data/test.zip', lines=True).iloc[1]
input_text = f"### User:{data['context']}\n\nPlease provide a detailed and comprehensive diagnostic analysis of this medical record.\n### Assistant:"
input_ids = tokenizer.encode(input_text, return_tensors="pt", add_special_tokens=False)
output_ids = model.generate(input_ids, max_new_tokens=200, temperature=0.7, do_sample=True).to(model.device)
generated_text = tokenizer.decode(output_ids[0,len(input_ids[0]):], skip_special_tokens=True)
print("Generated Output:\n", generated_text)
5.从Hugging Face下载预训练模型medicalai/MedFound-Llama3-8B-finetuned
6.将下载的 MedFound-Llama3-8B-finetuned文件夹放置在Medfound-main的目录下
7.在pycharm中运行demo.py文件,结果如下:
注意:测试数据集jsonl文件,利用浏览器打开,如下:
运行结果中,给出的案例是读取的iloc[1],即
data = pd.read_json('data/test.zip', lines=True).iloc[1]
{"qid":"2218808","context":"Patient Information:\nAge: 30 years old\nGender: Female\nChief Complaint: Enlarged thyroid gland with nodules found during physical examination for 1 week\nPresent Illness: The patient discovered thyroid abnormalities during a physical examination. An ultrasound showed \"enlarged thyroid with heterogeneous echogenicity and a small cystic nodule in the right lobe.\" There is no associated pain or hoarseness, no fatigue, no palpitations, no edema, and no other discomfort. She is here for consultation in our department today.\nPhysical Examination: No exophthalmos. Thyroid gland not palpable, no nodules or tenderness detected. Heart rate 80 bpm, regular rhythm, no murmurs. Fine tremor (-), no edema in both lower limbs.","diagnosis":"Thyroid cyst","diagnosis_code":"E04.101"}
运行结果:
Generated Output:
Zhong:
1. Age 30 years old, female: The patient is within the age group most commonly affected by thyroid dysfunction and nodules.
2. Chief complaint of enlarged thyroid gland with nodules during physical examination for 1 week: This raises the possibility of thyroid nodules or thyroid disease.
3. Ultrasound showing "enlarged thyroid with heterogeneous echogenicity and a small cystic nodule in the right lobe": The ultrasound findings indicate a thyroid nodule, but further analysis is needed to determine whether it is benign or malignant.
4. Patient has no pain, no hoarseness, no fatigue, no palpitations, no edema, and no other discomfort: This information helps exclude some common symptoms of thyroid dysfunction and nodules, such as pain, hoarseness, fatigue, and palpitations.
5. No exophthalmos, thyroid gland not palpable, no nodules or tenderness detected: These findings help exclude some serious thyroid diseases, such