from transformers import AutoTokenizer, AutoModelForCausalLM,BitsAndBytesConfig quantization_config= BitsAndBytesConfig(load_in_8bit=True) tokenizer = AutoTokenizer.from_pretrained(path,trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(path, device_map="cuda:0", trust_remote_code=True, quantization_config=quantization_config, max_memory=torch.cuda.get_device_properties(0).total_memory ).eval()
显存不够又想用某个模型时的模型量化操作
于 2024-01-26 21:37:55 首次发布