lmg_Model Links and Torrents

最新推荐文章于 2025-05-28 11:02:20 发布

沧海之巅

最新推荐文章于 2025-05-28 11:02:20 发布

阅读量922

点赞数

分类专栏： GPT 大语言模型文章标签：人工智能网络服务器

沧海之巅

本文链接：https://blog.csdn.net/linjie_830914/article/details/130587653

版权

大语言模型同时被 2 个专栏收录

45 篇文章

订阅专栏

GPT

31 篇文章

订阅专栏

/lmg/ Model Links and Torrents

Changelog (MDY)

[05-07-2023] - Added Vicuna 13B Cocktail, bluemoonrp-13b & AlpacaDente2
[05-05-2023] - Added CPU quantization variation links
[05-02-2023] - Initial Rentry

4-bit GPU Model Requirements

VRAM Required takes full context (2048) into account. You may be able to load the model on GPU’s with slightly lower VRAM, but you will not be able to run at full context. If you do not have enough RAM to load model, it will load into swap. Groupsize models will increase VRAM usage, as will running a LoRA alongside the model.

Model Parameters	VRAM Required	GPU Examples	RAM to Load
7B	8GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, RTX 3060, RTX 3070	6 GB
13B	12GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080 12GB, A2000	12GB
30B	24GB	RTX 3090, RTX 4090, A4500, A5000, 6000, Tesla V100	32GB
65B	42GB	A100 80GB, NVIDIA Quadro RTX 8000, Quadro RTX A6000	64GB

4-bit CPU/llama.cpp RAM Requirements

5bit to 8bit Quantized models are becoming more common, and will obviously require more RAM. Will update these with the numbers when I have them.

Model	4-bit	5-bit	8-bit
7B	3.9 GB
13B	7.8 GB
30B	19.5 GB
65B	38.5 GB

LLaMA 16-bit Weights

The original LLaMA weights converted to Transformers @ 16bit. A torrent is available as well, but it uses outdated configuration files that will need to be updated. Note that these aren’t for general use, as the VRAM requirements are beyond consumer scope.

Filtering Status : None

Model	Type	Download
7B 16bit	HF Format	HuggingFace
13B 16bit	HF Format	HuggingFace
30B 16bit	HF Format	HuggingFace
65B 16bit	HF Format	HuggingFace
All the above	HF Format	[Torrent Magnet](magnet:?xt=urn:btih:8d634925911a03f787d9f68ac075a9b24281573a&dn=Safe-LLaMA-HF-v2 (4-04-23)&tr=http%3a%2f%2fbt2.archive.org%3a6969%2fannounce&tr=http%3a%2f%2fbt1.archive.org%3a6969%2fannounce)

LLaMA 4-bit Weights

The original LLaMA weights quantized to 4-bit. The GPU CUDA versions have outdated tokenizer and configuration files. It is recommended to either update them with this or use the universal LLaMA tokenizer.

Filtering Status : None

Model	Type	Download
7B, 13B, 30B, 65B	CPU	Torrent Magnet
7B, 13B, 30B, 65B	GPU CUDA (no groupsize)	Torrent Magnet
7B, 13B, 30B, 65B	GPU CUDA (128gs)	Torrent Magnet
7B, 13B, 30B, 65B	GPU Triton	Neko Institute of Science HF page

BluemoonRP 13B (05/07/2023)

An RP/ERP focused finetune of LLaMA 13B finetuned on BluemoonRP logs. It is designed to simulate a 2-person RP session. Two versions are provided; a standard 13B with 2K context and an experimental 13B with 4K context. It has a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.

Filtering Status : Very light

Model	Type	Download
13B	GPU & CPU	https://huggingface.co/reeducator/bluemoonrp-13b

Vicuna 13B Cocktail (05/07/2023)

Vicuna 1.1 13B finetune incorporating various datasets in addition to the unfiltered ShareGPT. This is an experiment attempting to enhance the creativity of the Vicuna 1.1, while also reducing censorship as much as possible. All datasets have been cleaned. Additionally, only the “instruct” portion of GPTeacher has been used. It has a non-standard format (USER/ASSOCIATE), so ensure that you read the model card and use the correct syntax.

Filtering Status : Very light

Model	Type	Download
13B	GPU & CPU	https://huggingface.co/reeducator/vicuna-13b-cocktail

GPT4-x-AlpacaDente2-30B (05/05/2023)

ChanSung’s Alpaca-LoRA-30B-elina merged with Open Assistant’s second Finetune. Testing in progress.

Filtering Status : Medium

Model	Type	Download
30B GGML	CPU	Q5
30B	GPU	Q4 CUDA

https://huggingface.co/askmyteapot/GPT4-x-AlpacaDente2-30b-4bit

Vicuna 13B Free v1.1 (05/01/2023)

A work-in-progress, community driven attempt to make an unfiltered version of Vicuna. It currently has an early stopping bug, and a partial workaround has been posted on the repo’s model card.

Filtering Status : Very light

Model	Type	Download
13B	GPU & CPU	https://huggingface.co/reeducator/vicuna-13b-free

Pygmalion/Metharme 7B (04/30/2023)

Pygmalion 7B is a dialogue model that uses LLaMA-7B as a base. The dataset includes RP/ERP content. Metharme 7B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.

PygmalionAI intend to use the same dataset on the higher parameter LLaMA models. No ETA as of yet.

Filtering Status : None

Model	Type	Download
7B Pygmalion/Metharme	XOR	https://huggingface.co/PygmalionAI/
7B Pygmalion GGML	CPU	Q4, Q5, Q8
7B Metharme GGML	CPU	Q4, Q5
7B Pygmalion	GPU	Q4 Triton, Q4 CUDA 128gs
7B Metharme	GPU	Q4 Triton, Q4 CUDA

GPT4-X-Alpasta 30B (04/29/2023)

An attempt at improving Open Assistant’s performance as an instruct while retaining its excellent prose. The merge consists of Chansung’s GPT4-Alpaca Lora and Open Assistant’s native fine-tune.

It is an extremely coherent model for logic based instruct outputs. And while the prose is generally very good, it does suffer from the “Assistant” personality bleedthrough that plagues the OpenAssistant dataset, which can give you dry dialogue for creative writing/chatbot purposes. However, several accounts claim it’s nowhere near as bad as OA’s finetunes, and that the prose and coherence gains makes up for it.

Filtering Status : Medium

Model	Type	Download
30B 4bit	CPU & GPU CUDA	https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b-4bit

OpenAssistant LLaMa 30B SFT 6 (04/23/2023)

An open-source alternative to OpenAI’s ChatGPT/GPT 3.5 Turbo. However, it seems to suffer from overfitting and is heavily filtered. Not recommended for creative writing or chat bots, given the “assistant” personality constantly bleeds through, giving you dry dialogue.

Filtering Status : Heavy

Model	Type	Download
30B	XOR	https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor
30B GGML	CPU	Q4
30B	GPU	Q4 CUDA, Q4 CUDA 128gs

SuperCOT (04/22/2023)

SuperCOT is a LoRA trained with the aim of making LLaMa follow prompts for Langchain better, by infusing chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.

Though designed to improve Langchain, it’s quite versatile and works very well for other tasks like creative writing and chatbots. The author also pruned a number of filters from the datasets. As of early May 2023, it’s the most recommended model on /lmg/