CUDA Toolkit 4.1

CUDA Toolkit 4.1

Available For CUDA Registered Developers (Nov 2011)

The first CUDA 4.1 release candidate (RC1) is now available to CUDA Registered Developers.

This release includes a new LLVM-based CUDA compiler, 1000+ new image processing functions, and a redesigned Visual Profiler with automated performance analysis and integrated expert guidance.

We’re looking forward to hearing about your experience with this release (good and bad) through the Registered Developer feedback form. Download from your CUDA Registered Developer account or sign up for a free account here

Join us for the CUDA 4.1 Feature Overview Webinar on Tuesday, Nov. 22 at 10am PST, sign up today.

Release Highlights

Try The New Compiler!

  • New LLVM-based compiler delivers up to 10% faster performance for many applications

New & Improved “Drop-In” Acceleration With GPU-Accelerated Libraries

  • Over 1000 new image processing functions in the NPP library
  • New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
  • New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
  • Bessel functions now supported in the CUDA standard Math library
  • Up to 2x faster sparse matrix vector multiply using ELL hybrid format
  • Learn more about all the great GPU-Accelerated Libraries

Enhanced & Redesigned Developer Tools

  • Redesigned Visual Profiler with automated performance analysis and expert guidance
  • CUDA-GDB support for debugging MPI applications, multi-context debugging, and assert() in device code
  • CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
  • Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
  • Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput
  • Learn more about debugging and performance analysis tools for GPU developers on our CUDA Tools and Ecosystem Summary Page

Advanced Programming Features

  • Access to 3D surfaces and cube maps from device code
  • Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
  • Peer-to-peer communication between processes
  • Support for resetting a GPU without rebooting the system in nvidia-smi

New & Improved SDK Code Samples

  • simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
  • New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
  • New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

之前的p2p支持的是tesla级别的显卡,现在4.1是支持所有的Fermi GPU。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值