自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+

Steven Li's Zone

A fighting panda. Learning makes me happy.

  • 博客(9)
  • 收藏
  • 关注

原创 Study Note: RoofLine Model

Some background knowledge: Here is some connection between latency, throughput and concurrency [1]:Here is the influence factor of runtime and performance: latency and throughput.

2016-02-29 16:00:47 2032

原创 Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming

Let us introduce a new term first[1]. It is the ratio of active warps / maximum number(32) of warps. It depends on three parameters: 1) threads/block (set in >>)2) registers/th

2016-02-29 09:41:27 650

原创 Study Note: Instruction Optimisation of CUDA programming

Consideration 1: Branch Divergence Before we talk about this, let us go through what is going on in GPU actually.Here is the abstract model of SM like[1]:Every SM has one con

2016-02-28 22:55:24 938

原创 Study Note: Shared Memory Optimisation -- avoid of bank conflict

This article is illustrated bases on 2.x computation device: Typically speaking, a shared memory has 16KB totally. And it has 32 banks for 2.x computation device. Bank is a unit of parallel read

2016-02-28 13:11:37 1328

原创 Study Note: Global memory optimisation of CUDA programming

Global memory coalescing: The storage pattern of global memory in GPU is row first pattern because there is not two dimension array in GPU. Use a matrix as an example[1]: Knowledge of

2016-02-27 23:49:19 745

转载 Self summary: Ruby(RVM, gem, bundle)

Establishment of the develop environment:https://ruby-china.org/wiki/install_ruby_guideEvolution of bundler: Now, the install of ruby will include the gem command for get the oth

2016-02-16 20:09:30 892

原创 Self Summary: Basic concepts of GPU

Some basic concepts of GPU programming: Here is the overview of a GPU(Fermi Architecture)[1]:It is a 16-way many core (16 SM) GPU. Each way of many core has the architecture like t

2016-02-09 22:10:22 651

转载 Locale in Linux

The following content is from http://www.blog.chinaunix.net/uid-641896-id-338729.html:Linux use locale to set the different language environment for running program. Locale is supported by ANSI

2016-02-06 22:50:39 382

转载 C/C++: Inline function, calloc vs malloc

Inline function is like a macro definition. When it was be called in another function, the control right will not be changed to this function. The compiler will just replace the line of inline functio

2016-02-02 18:52:20 513

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除