Luchang-Li-CSDN博客

原创 guidellm LLM大模型性能评测工具

guidellm LLM大模型性能评测工具

2025-11-09 19:37:16 598

此外，需要设置环境变量SGLANG_DG_CACHE_DIR(早期版本为SGL_DG_CACHE_DIR)指定缓存目录，并且以后launch_server启动模型也要加上这个从而提升模型启动速度。python3 -m sglang.launch_server这个改成python3 -m sglang.compile_deep_gemm。按照启动模型的方式一样来启动compile_deep_gemm，例如多机仍然需要多机启动。其实只需要把模型启动命令的。这个用法比较含糊其辞。

2025-10-24 09:34:36 331

原创 SGLang PD分离流程细节

SGLang PD分离流程细节

2025-09-17 15:10:38 623

原创 SGLang HiCache KV Cache offload

SGLang HiRadix Cache KV Cache offload

2025-09-15 16:43:43 1496 1

原创 LMCache KV cache存储

LMCache KV cache存储

2025-06-12 16:56:23 2749

原创 sglang/VLLM性能评测: bench_serving工具

sglang性能评测和分析

2025-05-29 08:33:03 3294

原创 VLLM/sglang evalscope/lm_eval MMLU等准确度评测

VLLM/sglang lm_eval MMLU等准确度评测

2025-05-29 08:18:10 858

原创 sglang radix tree KV cache管理

sglang KV cache管理

2025-05-16 12:49:45 1838

原创 sglang Dense LLM PD分离部署

sglang Dense LLM PD分离部署

2025-05-06 18:40:32 2375

原创不卷的sglang/VLLM大模型推理优化方向机会-武汉-北京

不卷的大模型推理优化方向机会-武汉-北京

2025-04-28 11:42:18 424 1

原创 VLLM V1 serve在线推理基本流程

VLLM V1 serve在线推理基本流程

2025-04-15 19:42:18 1906

原创 VLLM V1 part 4 - KV cache管理

VLLM V1 part 4 - KV cache block管理

2025-04-08 10:59:32 2911 1

原创 VLLM V1 part 5 - graph capture图捕获

VLLM V1 part 5 - graph capture图捕获

2025-04-02 18:21:22 2654 2

原创 VLLM V1 part 3 - Scheduler

VLLM V1 part 3 - Scheduler

2025-03-28 16:58:00 1230

原创 VLLM V1 offline推理2 - Model Executor

VLLM V1 offline推理2 - Model Executor

2025-03-28 15:09:10 1391

原创 VLLM V1 offline推理1 - 基本流程

VLLM V1 offline推理1 - 基本流程

2025-03-26 15:10:48 1889

原创 NVIDIA Dynamo源码编译

NVIDIA Dynamo源码编译

2025-03-20 16:20:01 1560

原创 DeepSeek MLA原理

DeepSeek MLA Explained

2025-03-06 11:27:34 1431

原创 DeepSeek group-limited expert routing和负载均衡

DeepSeek group-limited expert routing

2025-03-03 19:24:42 1964

原创小波变换背景预测matlab和python, pytorch样例

Sparse deconvolution improves the resolution of live-cell super-resolution fluorescence microscopy的附件代码。

2025-02-23 21:19:58 582

原创深度学习推理引擎-内存共享算法

深度学习推理引擎-基于优先级图着色的内存共享算法

2025-01-25 09:53:29 1239

原创 OpenCV转pytorch

opencv to pytorch

2024-11-18 17:11:58 403

原创 TensorRT Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed

[TRT] [E] IRuntime::deserializeCudaEngine: Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 239

2024-10-18 12:56:18 1192

原创 RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a paramete

RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a paramete

2024-10-15 16:19:07 723

原创 GPU CUDA进行高性能reduce计算的几种方法

GPU CUDA进行高性能reduce计算的几种方法

2024-09-11 16:52:01 1435

原创大语言模型LLM权重4bit向量量化(Vector Quantization)/查找表量化基本原理

针对大语言模型权重的4bit量化，除了常规的广泛使用的group-wise均匀量化，如GPTQ, AWQ等等，苹果提出了一种称为Palettization的lookup table (LUT)查找表量化技术，高通也提出了新的一种向量量化技术，其实这两种技术原理基本上是相同的

2024-09-06 10:12:49 2456

原创线性代数教材书籍推荐

INTRODUCTI N TO LINEAR ALGEBRA, 线性代数导论，GILBERT STRANG ，有中译本。

2024-09-03 17:01:08 1502

原创 normalization归一化算子和方差计算数值稳定性方法

normalization算子和方差计算数值稳定性方法

2024-09-02 10:28:17 1651

原创 Nvidia GPU profiling by nsight system

GPU profiling

2024-08-13 09:32:58 3336 1

原创 TensorRT Model Optimizer量化和模型导出

TensorRT Model Optimizer量化和模型导出

2024-08-07 17:38:45 1609 1

原创 Unsupported: ONNX export of convolution for kernel of unknown shape

错误提示为Caused by the value '28 defined in (%28 : Float(*, *, *, *, strides=[199692, 66564, 258, 1], requires_grad=0, device=cpu)这种问题一般出现在卷积的权重不是常规的直接的训练参数，而是从其他计算分支计算得到。这里指示了是test.py第10行引起的，也就是pad那一句导致的。这其实是底层infer shape的bug。这使得x的shape重新被完全静态确定。

2024-07-19 17:02:16 1721

原创 TP TN FP FN(true positive, false negative等)的理解

比如，你判断一件事情是阳性，但是你的判断是错的，就是false positive。第二个词positive或者negative，表明判断的内容是阳性还是阴性。第一个形容词为true, false表名这个判断是正确还是错误的。

2024-07-13 23:52:46 291

原创大语言模型LLM量化激活outliers异常值抑制

突破性技术: 大语言模型LLM量化激活outliers异常值抑制

2024-06-03 10:15:23 5701 6

原创 Transformer 从attention到grouped query attention (GQA)

Transformer 从Attention到grouped query attention (GQA)

2024-05-28 16:38:54 2050

原创 LLM激活稀疏性加速

LLM激活稀疏性加速

2024-04-10 09:21:08 522

原创导出RWKV模型为onnx

导出RWKV模型为onnx

2024-03-11 13:01:54 784

原创导出谷歌gemma模型为ONNX

gemma模型导出ONNX

2024-03-08 21:05:54 1432 6

原创导出LLaMA ChatGlm2等LLM模型为onnx

通过onnx模型可以在支持onnx推理的推理引擎上进行推理，从而可以将LLM部署在更加广泛的平台上面。此外还可以具有避免pytorch依赖，获得更好的性能等优势。

2023-08-05 19:15:07 7704 2

原创 SentencePiece android ndk编译

LLaMa等LLM语言模型一般使用SentencePiece tokenizer，在端侧部署需要编译和使用其c++版本。在安卓平台使用NDK编译CMakeLists.txt需要进行一些修改：

2023-07-27 09:37:17 1461 1

原创 AWQ模型量化实践

AWQ模型量化实践

2023-06-28 10:23:02 16914 8

Verilog-generate语句的用法

非常棒，非常详细的讲解呢，

2016-05-20

Introduction to Digital Speech Processing Rabiner 2008

讲解数字语音信号处理非常详细的资料，不错的文档

2014-12-26

FPGA Prototyping By Verilog Examples codes代码

FPGA Prototyping By Verilog Examples 一书的源代码

2014-08-06

matlab 屏幕截图ScreenCapture

国外写的一个比较好的matlab截图代码，使用很简单，解压出来即可使用。

2015-03-25

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day3

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day3，内部包含讲解文档和代码

2015-01-30

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day2

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day2，内部包含讲解文档和代码

2015-01-30

免费屏幕录制软件FreeScreenVideoRecorder

免费好用的屏幕截图和录制软件，录制视频非常清晰。

2016-11-08

sublime text 2 3 system verilog自动补充插件

sublime text 2 3 system verilog自动补充插件,非常好用，我自己在原版的基础上进行了一些修改使得它更加适合我们编程的风格，编程风格可以通过里面的配置文件进行修改，不会可以咨询我哦，

2015-02-01

Laser Fundamentals William 激光基础激光原理

Laser Fundamentals Second Edition William 激光基础

2017-08-20

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day1

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day1，里面包括讲解文档和代码

2015-01-30

Hidden Markov Models,隐马尔科夫链介绍Phil Blunsom

对隐马尔科夫链做了一个非常全面而又非常简要的介绍，通俗易懂，

2014-12-26

免费高清屏幕录制软件FreeScreenVideoRecorder_3.0.45.1027

免费高清屏幕录制软件FreeScreenVideoRecorder_3.0.45.1027,简单易用

2016-11-10

DAC MCP4725 STM32 code代码

自己写的STM32 MCP4725 10位轨对轨DAC IIC协议代码，亲测通过。

2017-06-18

Computer Generated Holograms Techniques and Application

Lee Hologram, 计算全息的重要文献

2017-08-02

深度学习 Deep Learning book, MIT, Ian Goodfellow, Aaron Courville, and Yoshua Bengio

非常棒的深度学习课本 MIT Deep Learning Book in PDF format This book was downloaded in HTML form and conviniently joined as a single PDF file for your enjoyment. Please notice the known issues in the web page, especially with regards to some symbols not rendering well or not at all. From http://www.deeplearningbook.org/ An MIT Press book Ian Goodfellow, Yoshua Bengio and Aaron Courville The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon.

2016-04-22

Principles of Fluorescence Spectroscopy Third Edition Joseph R Lakowicz.pdf

Principles of Fluorescence Spectroscopy Third Edition Joseph R Lakowicz 第三版

2017-08-13

最好用的代码编辑软件Sublime Text3.59

最好用的代码编辑软件Sublime Text3.59,包含32位和64位系统安装包

2014-04-27

机器学习十大算法带书签PDF

机器学习十大算法合并和整理，附带书签，内容丰富。

2014-11-04

practical-PID-control

PID自动控制学习资料

2016-04-29

nnImplementationV2 神经网络C++实现

nnImplementationV2 神经网络C++实现国外一个神经网络的C++实现，具体讲解地址见：https://takinginitiative.wordpress.com/2008/04/23/basic-neural-network-tutorial-c-implementation-and-source-code/和https://takinginitiative.wordpress.com/2008/04/03/basic-neural-network-tutorial-theory/

2016-04-21

Programming in Parallel with CUDA A Practical Guide Richard Ansorge 2022-Cambridge-University

Programming in Parallel with CUDA A Practical Guide Richard Ansorge 2022-Cambridge-University CUDA并行编程实战安索奇英文版

2025-01-25

北京大学研究生学位论文写作指南 2014.pdf

研究生和博士学位论文写作指南参考：北京大学研究生学位论文写作指南 2014.pdf

2019-06-11

OpenCL image from buffer intel

introduction of OpenCL create image from buffer by intel

2023-12-09

概率分布手册Hand-book on statistical distributions for experimentalists

不可多得的，全面的讲解各种概率分布的手册。主要用于科研实验人员使用时查询

2018-03-01

在 Windows 上安装 TensorFlow TensorFlow官方

官网可能打不开，这里提供了其windows 安装TensorFlow的页面打印pdf

2018-05-03

STM32CubeMX 4.26.1

2018年7月最新版STM32CubeMX 4.26.1，官网下载实在太慢，还需要注册

2018-07-20

Netron-Setup-4.5.0.zip

深度学习模型可视化神器，20200912最新版。github下载慢。注意：安装后记得第一次打开accept里面的条款哦，否则会出现模型打不开的问题。

2020-09-12

CRLB 讲解PPT

Cramer-Rao Lower Bound (CRLB)下界可以用于计算无偏估计中能够获得的最佳估计精度，因此经常用于计算理论能达到的最佳估计精度，和评估参数估计方法的性能（是否接近CRLB下界）。本篇博客融合和总结了若干PPT的内容。

2017-10-03

Principles of Optics 7th ed M.Born,E.Wolf.pdf 光学原理

Principles of Optics 7th ed M.Born,E.Wolf.pdf 光学原理比较好的光学原理介绍书籍，内容全面丰富，几何光学波动光学材料光学

2017-09-01

STM32CubeMX 4.24.0

从官网下载还需要注册登录，很是麻烦，这里提供了官网原版下载的STM32CubeMX 4.24.0

2018-03-05

华中科技大学博士学位论文endnote参考文献格式

华中科技大学研究生、博士学位论文参考文献endnote格式

2019-09-04

BFGS Optimization curve fitting 优化曲线拟合

自己编写的BFGS 优化算法，以及用于曲线拟合的范例，测试通过，结果和matlab非常近似。简单易用。

2017-12-07

DeepSpeed System Optimizations Enable Training Deep Learning

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters

2021-10-23

Practical data acquisition for instrumentation and control systems

book of Practical data acquisition for instrumentation and control systems

2018-12-10

Handbook of Fluorescence Spectroscopy and Imaging

Handbook of Fluorescence Spectroscopy and Imaging From Single Molecules to Ensembles

2017-11-23

Netron-Setup-3.9.8.zip

Netron-Setup-3.9.8.exe Netron Setup 3.9.8，github下载特别慢，可以在这里下载

2020-03-14

Protocol Buffer sublime text 3插件

Protocol Buffer sublime text 3插件，官方网址： https://github.com/vihangm/sublime-protobuf-syntax

2018-08-06

伯克利常用经典算法.zip

计算机常用经典算法 Chapter 0 Prologue Chapter 1 Algorithms with numbers Chapter 2 Divide-and-conquer algorithms Chapter 3 Decompositions of graphs Chapter 4 Paths in graphs Chapter 5 Greedy algorithms Chapter 6 Dynamic programming Chapter 7 Linear programming and reductions Chapter 8 NP-complete problems Chapter 9 Coping with NP-completeness Chapter 10 Quantum algorithms

2019-09-04

cmake-3.17.2-win64-x64.zip

cmake 3.17.2 win64 x64 msi zip，官网下载非常慢，这里可以快速下载，官网下载非常慢，这里可以快速下载，

2020-05-23

伯克利常用经典算法.pdf

Chapter 0 Prologue Chapter 1 Algorithms with numbers Chapter 2 Divide-and-conquer algorithms Chapter 3 Decompositions of graphs Chapter 4 Paths in graphs Chapter 5 Greedy algorithms Chapter 6 Dynamic programming Chapter 7 Linear programming and reductions Chapter 8 NP-complete problems Chapter 9 Coping with NP-completeness Chapter 10 Quantum algorithms

2019-09-04

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

Verilog-generate语句的用法

Introduction to Digital Speech Processing Rabiner 2008

FPGA Prototyping By Verilog Examples codes代码

matlab 屏幕截图ScreenCapture

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day3

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day2

免费屏幕录制软件FreeScreenVideoRecorder

sublime text 2 3 system verilog自动补充插件

Laser Fundamentals William 激光基础 激光原理

altera官方上海Quartus II时序分析相关三天培训教材练习实例_day1

Hidden Markov Models,隐马尔科夫链介绍Phil Blunsom

免费高清屏幕录制软件FreeScreenVideoRecorder_3.0.45.1027

DAC MCP4725 STM32 code代码

Computer Generated Holograms Techniques and Application

深度学习 Deep Learning book, MIT, Ian Goodfellow, Aaron Courville, and Yoshua Bengio

Principles of Fluorescence Spectroscopy Third Edition Joseph R Lakowicz.pdf

最好用的代码编辑软件Sublime Text3.59

机器学习十大算法 带书签PDF

practical-PID-control

nnImplementationV2 神经网络C++实现

Programming in Parallel with CUDA A Practical Guide Richard Ansorge 2022-Cambridge-University

北京大学 研究生学位论文写作指南 2014.pdf

OpenCL image from buffer intel

概率分布手册Hand-book on statistical distributions for experimentalists

在 Windows 上安装 TensorFlow TensorFlow官方

STM32CubeMX 4.26.1

Netron-Setup-4.5.0.zip

CRLB 讲解PPT

Principles of Optics 7th ed M.Born,E.Wolf.pdf 光学原理

STM32CubeMX 4.24.0

华中科技大学博士学位论文endnote参考文献格式

BFGS Optimization curve fitting 优化曲线拟合

DeepSpeed System Optimizations Enable Training Deep Learning

Practical data acquisition for instrumentation and control systems

Handbook of Fluorescence Spectroscopy and Imaging

Netron-Setup-3.9.8.zip

Protocol Buffer sublime text 3插件

伯克利 常用经典算法.zip

cmake-3.17.2-win64-x64.zip

伯克利 常用经典算法.pdf

空空如也

Laser Fundamentals William 激光基础激光原理

机器学习十大算法带书签PDF

北京大学研究生学位论文写作指南 2014.pdf

伯克利常用经典算法.zip

伯克利常用经典算法.pdf