- 博客(223)
- 问答 (4)
- 收藏
- 关注
原创 LLM Benchmarks
We very often see a menagerie of performance benchmarks for LLM papers listed to showcase the "breakthroughs" and very likely know very little about the specifics about each particular test suite.There, then, lies a danger of being misled and manipulated b
2024-04-08 11:47:21 553
原创 1 bit LLM and 1 trit LLM
In light of NV's recent addition of fp4, I'm once again curious about the bottom line for LLM, at least for inference; let's go back to this BitNet paper from Microsoft, featuring 1 bit LLM, with 1-bit weights trained from scatch, and later on another feat
2024-03-22 18:17:10 969
原创 SORA: text-to-video Generator by OpenAI
sources:1. OpenAI's blog piece: Video generation models as world simulators 2. DiTs (Diffusion Transformers): Scalable Diffusion Models with TransformersThis is so far the most contentious point for SORA, regarding whether it is "learning" physics and gene
2024-02-24 21:47:08 1035
转载 Layer Normalization (LN)
here on I use the Pinecore article as main source, as it's the more comprehensive and easily read one.as you can read from the abstract of the original paper,LN is proposed as an alternative or complement to BN, hence it's best to start with a solid unders
2024-01-19 12:29:04 56
转载 vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
paper: https://arxiv.org/pdf/2309.06180.pdfrepo: GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMshighlights blog by authors: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM BlogLLMs
2023-12-05 20:49:41 164
转载 Trailing Comma “Feature“ in Python
【代码】Trailing Comma “Feature“ in Python。
2023-11-25 17:44:43 71
转载 Understanding Gated Recurrent Unit (GRU) in Deep Learning
SourceGRU stands for Gated Recurrent Unit, which is a type of recurrent neural network (RNN) architecture that is similar to LSTM (Long Short-Term Memory).Like LSTM, GRU is designed to model sequential data by allowing information to be selectively remembe
2023-11-07 19:01:17 81
原创 The Reversal Curse: LLMs trained on “A is B“ fail to learn “B is A“
paper: https://owainevans.github.io/reversal_curse.pdfblog with interactions with the authors: Paper: LLMs trained on “A is B” fail to learn “B is A” — LessWrongThis is a linkpost for https://owainevans.github.io/reversal_curse.pdfThis post is the copy of
2023-09-28 18:07:11 394
转载 Illustrated Stable Diffusion
AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art.
2023-08-17 14:02:34 140
原创 FlashAttention-2
FlashAttention is a fusion trick, which merges multiple operational steps (ops) in the attention layers of transformer networks to achieve better end2end result; the performance gain is mainly from better memory reuse given the vanilla version being memory
2023-07-29 12:01:51 319
原创 Automatic Differentiation
For beginners, the most daunting aspect of deep learning algorithms is perhaps Back-Propagations (BP) which require derivations of some highly complex mathematical expressions.Luckily when actually implementing BP, we do not have to rely on smmary symbolic
2023-07-28 13:46:54 163
原创 Introduction to Verilog
Sources:Editted and padded GPT content; if you prefer human sources: Verilog Data TypesThis article focus on Verilog as a programming language, i.e. the simulation part is not covered.Verilog is C-like with a few quirks tweaked for the HDL side of things.V
2023-05-06 16:32:27 492
原创 FlashAttention
paper: https://arxiv.org/abs/2205.14135an informal talk by the author Tri Dao: https://www.youtube.com/watch?v=FThvfkXWqtEcode repo: GitHub - HazyResearch/flash-attention: Fast and memory-efficient exact attention introduction to transformer: Transformer a
2023-05-03 18:30:32 1314
原创 Cross Domain Signal Integrity in Asynchronous Designs
Conventional two flip-flop synchronizerfrom Synchronizer Techniques for Multi-Clock Domain SoCs & FPGAs - EDN In general, a conventional two flip-flop synchronizer is used for synchronizing a single bit level signal. As shown in Figure 1 and Figure 2 , fli
2023-04-22 18:50:56 545
转载 Moore vs. Mealy Machine
slides from https://inst.eecs.berkeley.edu/~cs150/fa05/Lectures/07-SeqLogicIIIx2.pdf==> output is not dependent on input, but next state still iswith the merging rule stated in the beginning:(starting from the Moore diagram, but change it to a Mealy first,
2023-04-13 17:24:40 75
原创 Initial Block and Testbenches in Verilog
【代码】Initial Block and Testbenches in Verilog。
2023-04-11 16:22:36 389
转载 Common architectures in convolutional neural networks
from: https://www.jeremyjordan.me/convnet-architectures/#lenet5==> most of the graphs cannot be copied to this platform, so just check the linked originalIn this post, I'll discuss commonly used architectures for convolutional networks. As you'll see, almo
2023-02-22 18:56:56 133
转载 Domain Specific Compiling: 领域编译器发展的前世今生 • 面向AI的编译技术
作者简介:张朔铭,博士研究生,正在中国科学院计算技术研究所崔慧敏研究员指导下攻读计算机系统结构博士学位,目前主要的研究方向是AI编译。[email protected]本文分为两个部分,第一部分为综述(领域编译器发展的前世今生 • 综述);这部分重点讨论面向AI领域的编译技术。0. 前言随着人工智能时代的来临,AI领域应用的大量出现也促进着领域编译的发展,最突出的表现就是多种AI编译器的普及和应用。AI领域有几个重要的特征使得AI编译器面临很多新的机遇和挑战:一是AI领域中编程
2023-02-21 18:59:50 474
转载 C vs. Python Operator Precedence: Beware of (Bitwise) Logical Op.
comparison operators in python and C have different precedence compared to bitwise and logical operators, beware.
2022-10-13 10:16:12 131
转载 Python: Function Annotation and “inspect“ module
https://peps.python.org/pep-3107/This PEP introduces a syntax for adding arbitrary metadata annotations to Python functions [1].Because Python’s 2.x series lacks a standard way of annotating a function’s parameters and return values, a variety of tools and
2022-09-14 18:48:53 144
原创 TorchSparse: 3D SC/SSC Acceleration on GPU
Paper:TorchSparse: Efficient Point Cloud Inference EngineNotation:Mappingto get output position set:when down-sampling, since we want to sample as manysparse input sites as possible, we slack the SSC i/o mapping condition to p < s*...
2022-05-26 17:19:00 464
原创 3D (Input) Sparse Convolution
Review:2D sparsity in DNNsSparsity in Deep Learning_EverNoob的博客-CSDN博客==> the above mentioned 2Dsparsity is decidedly different from the 3D sparsity situation, in that we manually created the structured sparsity to cut down memory footprint, whil.
2022-05-24 17:28:13 644
转载 Setup and Hold Time
Setup and Hold Time in an FPGAWhat is Setup and Hold Time in an FPGA?Setup time and Hold time are important concepts to understand for every digital designer. This article explains what setup and hold times are and how they are used inside of an FPGA.
2022-05-11 11:54:31 300
转载 Schematic Symbols for Circuit Design
Passive Componentshttps://www.allaboutcircuits.com/technical-articles/schematic-symbols-electronic-components-passives-resistors-capacitors/==> a well loaded article with most passive unit electrical components and follow up links on details of the
2022-05-10 17:06:24 791
转载 Power Supply Nomenclature
What are the meaning of Vdd and Vss? Vcc and Vee?? GND? - Mis CircuitosIt has always been a bit intriguing and even confusing the nomenclature of these power voltages (VddandVss) or (VccandVee).The following pictures are worth a thousand words…...
2022-05-10 16:13:22 106
转载 RC, RL, LC, RLC
all taken from WikipediaRChttps://en.wikipedia.org/wiki/RC_circuitusingKirchhoff's current lawSeries Circuithere we exploit reactance/impedance, see later in capacitor section.s is for "second", since Q = I * t, C = Q/V ==> 1/C ~ Oh...
2022-05-10 15:11:44 362
原创 NNs for Point Cloud: PRNN and PV-CRNN
for basics on point cloud, see:(3D Imaging) Point Cloud_EverNoob的博客-CSDN博客Moving Point Cloud Processing: PointRNNhttps://arxiv.org/abs/1910.08287In this paper, we introduce a Point Recurrent Neural Network (PointRNN) for moving point cloud process
2022-05-09 20:04:49 950
空空如也
c++:对cout使用seekp 不报错,结果不符合预期
2021-04-08
c++:如何用模板写n层向量(字节面试题复盘)?
2021-04-02
Cpp : 求解父类指针指代new创建的衍生类的语法细节
2021-03-23
C++ a^=b b^=a a^=b bug 求解
2021-03-03
TA创建的收藏夹 TA关注的收藏夹
TA关注的人