CUDA Toolkit 4.1

最新推荐文章于 2024-02-21 16:30:18 发布

iteye_702

最新推荐文章于 2024-02-21 16:30:18 发布

阅读量92

点赞数

文章标签： python

CUDA Toolkit 4.1

Available For CUDA Registered Developers (Nov 2011)

The first CUDA 4.1 release candidate (RC1) is now available to CUDA Registered Developers.

This release includes a new LLVM-based CUDA compiler, 1000+ new image processing functions, and a redesigned Visual Profiler with automated performance analysis and integrated expert guidance.

We’re looking forward to hearing about your experience with this release (good and bad) through the Registered Developer feedback form. Download from your CUDA Registered Developer account or sign up for a free account here

Join us for the CUDA 4.1 Feature Overview Webinar on Tuesday, Nov. 22 at 10am PST, sign up today.

Release Highlights

Try The New Compiler!

New LLVM-based compiler delivers up to 10% faster performance for many applications

New & Improved “Drop-In” Acceleration With GPU-Accelerated Libraries

Over 1000 new image processing functions in the NPP library
New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
Bessel functions now supported in the CUDA standard Math library
Up to 2x faster sparse matrix vector multiply using ELL hybrid format
Learn more about all the great GPU-Accelerated Libraries

Enhanced & Redesigned Developer Tools

Redesigned Visual Profiler with automated performance analysis and expert guidance
CUDA-GDB support for debugging MPI applications, multi-context debugging, and assert() in device code
CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput
Learn more about debugging and performance analysis tools for GPU developers on our CUDA Tools and Ecosystem Summary Page

Advanced Programming Features

Access to 3D surfaces and cube maps from device code
Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
Peer-to-peer communication between processes
Support for resetting a GPU without rebooting the system in nvidia-smi

New & Improved SDK Code Samples

simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

之前的p2p支持的是tesla级别的显卡，现在4.1是支持所有的Fermi GPU。

iteye_702

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CUDA Toolkit 4.1

CUDA Toolkit 4.1Available For CUDA Registered Developers (Nov 2011)The first CUDA 4.1 release candidate (RC1) is now available to CUDA Registered Developers. This release includes a new LLVM-base...
复制链接

扫一扫