【safetensor】介绍和基础代码

文章介绍了HuggingFace、EleutherAI和StabilityAI使用的安全Tensor库safetensors的安装、使用方法,包括保存和加载模型权重,以及如何转换模型权重为安全格式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hugging Face, EleutherAI, StabilityAI 用的多

介绍

文件形式

  • header,体现其特性。如果强行将pickle或者空软连接 打开,会出现报错。解决详见:debug 连接到其他教程
  • 结构和参数
    数据结构

安装

with pip:

Copied
pip install safetensors
with conda:

Copied
conda install -c huggingface safetensors

Usage

文档: https://huggingface.co/docs/safetensors/index
github: https://github.com/huggingface/safetensors

测试安装

import torch
from safetensors import safe_open
from safetensors.torch import save_file

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}
save_file(tensors, "model.safetensors")

tensors = {}
with safe_open("model.safetensors", framework="pt", device="cpu") as f:
   for key in f.keys():
       tensors[key] = f.get_tensor(key)

加载

文档 https://huggingface.co/docs/diffusers/using-diffusers/using_safetensors

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_single_file(
    "https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/Models/AbyssOrangeMix/AbyssOrangeMix.safetensors"
)

Load tensors


from safetensors import safe_open

tensors = {}
with safe_open("model.safetensors", framework="pt", device=0) as f:
    for k in f.keys():
        tensors[k] = f.get_tensor(k)
# Loading only part of the tensors (interesting when running on multiple GPU)

from safetensors import safe_open

tensors = {}
with safe_open("model.safetensors", framework="pt", device=0) as f:
    tensor_slice = f.get_slice("embedding")
    vocab_size, hidden_dim = tensor_slice.get_shape()
    tensor = tensor_slice[:, :hidden_dim]

保存


import torch
from safetensors.torch import save_file

tensors = {
    "embedding": torch.zeros((2, 2)),
    "attention": torch.zeros((2, 3))
}
save_file(tensors, "model.safetensors")

转换到safetensor

  • 在线,利用hugging face

The easiest way to convert your model weights is to use the Convert Space, given your model weights are already stored on the Hub. The Convert Space downloads the pickled weights, converts them, and opens a Pull Request to upload the newly converted .safetensors file to your repository.

# 主函数
def convert_file(
    pt_filename: str,
    sf_filename: str,
):
    loaded = torch.load(pt_filename, map_location="cpu")
    if "state_dict" in loaded:
        loaded = loaded["state_dict"]
    shared = shared_pointers(loaded)
    for shared_weights in shared:
        for name in shared_weights[1:]:
            loaded.pop(name)

    # For tensors to be contiguous
    loaded = {k: v.contiguous() for k, v in loaded.items()}

    dirname = os.path.dirname(sf_filename)
    os.makedirs(dirname, exist_ok=True)
    save_file(loaded, sf_filename, metadata={"format": "pt"})
    check_file_size(sf_filename, pt_filename)
    reloaded = load_file(sf_filename)
    for k in loaded:
        pt_tensor = loaded[k]
        sf_tensor = reloaded[k]
        if not torch.equal(pt_tensor, sf_tensor):
            raise RuntimeError(f"The output tensors do not match for key {k}")

例子

解析

import requests # pip install requests
import struct

def parse_single_file(url):
    # Fetch the first 8 bytes of the file
    headers = {'Range': 'bytes=0-7'}
    response = requests.get(url, headers=headers)
    # Interpret the bytes as a little-endian unsigned 64-bit integer
    length_of_header = struct.unpack('<Q', response.content)[0]
    # Fetch length_of_header bytes starting from the 9th byte
    headers = {'Range': f'bytes=8-{7 + length_of_header}'}
    response = requests.get(url, headers=headers)
    # Interpret the response as a JSON object
    header = response.json()
    return header

url = "https://huggingface.co/gpt2/resolve/main/model.safetensors"
header = parse_single_file(url)

print(header)
### 使用 `timm` `safetensors` 在 PyTorch 项目中的集成 #### 安装依赖库 为了在 PyTorch 项目中使用 `timm` `safetensors`,首先需要安装这两个包。可以通过 pip 来完成: ```bash pip install timm safetensors ``` #### 加载预训练模型并保存为 SafeTensor 格式 `timm` 提供了大量的预训练模型,而 `safetensors` 则是一种安全高效的张量存储格式。 加载一个预训练模型并将权重保存到 `.st` 文件的例子如下所示: ```python import torch from timm import create_model from safetensors.torch import save_file model_name = "resnet50" model = create_model(model_name, pretrained=True) state_dict = model.state_dict() save_path = f"{model_name}.st" # 将 state dict 转换为 Safetensors 所需的字典结构 safe_state_dict = {k: v.cpu().numpy() for k, v in state_dict.items()} save_file(safe_state_dict, save_path) print(f"Model weights saved as '{save_path}' using safetensors format.") ``` #### 从 SafeTensor 中加载模型参数 当希望恢复之前保存的状态时,可以按照下面的方式读取 `.st` 文件的内容,并将其应用回 PyTorch 模型实例上: ```python from safetensors.torch import load_file import numpy as np loaded_tensors = load_file(save_path) for key, value in loaded_tensors.items(): loaded_tensors[key] = torch.from_numpy(np.array(value)) model.load_state_dict(loaded_tensors) print("Loaded model parameters from safetensor file successfully.") ``` 通过这种方式可以在 PyTorch 项目里方便地利用 `timm` 库获取丰富的模型资源的同时享受由 `safetensors` 带来的安全性与效率提升[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值