Using CUDA and Thrust with Visual Studio 2010

Using CUDA and Thrust with Visual Studio 2010

Sunday, March 6, 2011 – 11:16 am <!-- by Ade Miller -->

Using CUDA 4.0 RC2? Read the update post here.

I was working on setting up some new CUDA projects as I’m doing some spiking (prototyping for the not so agile crowd) work to figure out how best to use CUDA 4.0. I’ve turned it in to a quick tutorial on how to write a simple application that allows you to use both CUDA and the latest C++0x features in Visual Studio 2010.

Because the current CUDA SDK requires projects to compile using the v90 toolset (Visual Studio 2008) the solution requires two projects. One DLL project containing the CUDA and targeting v90 and a second application project targeting v100 (VS 2010) containing the C++ code.

Click on the images to see full size versions.

Installing dependencies

Make sure you have the following installed.

If you don’t have 4.0. I built this walkthrough using the 4.0 RC but it should work with 3.2.

Setting up the solution

Create a solution containing two projects. Two projects are required because one targets the V100 (VS 2010) compiler to allow access to the latest C++0x language features and one targets the V90 (VS 2008) compiler because this is required by CUDA.

1) Create a Win32 console application called HelloWorld. Select the defaults for the remaining pages in the wizard. This project will contain the main entry point to your application and any Windows specific code, like the Parallel Patterns Library (PPL) code used for managing threads.

1_HelloWorld

2) Create a second Win32 project called HelloWorldCuda. This is the DLL that will contain your CUDA code. In the application settings screen select DLL for the application type and check the empty project box.

2_HelloWorldCuda

2a_HelloWorldCuda

Configure the CUDA project

There’s a number of settings that need to be configured on the HelloWorldCuda project.

3) Configure the HelloWorldCuda project.

3.1) Select the Project | Build Customizations… menu item. In the dialog select the CUDA 4.0 item. This adds support for CUDA C/C++ files but there needs to be a .CU file in the project before the build settings appear in the project properties. If you don’t have CUDA 4.0 then use the 3.2 rules.

3_1_HelloWorldCuda

3.2) Add two new items to the project; a C++ file (.cpp) and header file (.h) called Hello.cpp and Hello.h, rename the .cpp file to Hello.cu. Your solution should look like this:

3_2_HelloWorldCuda

3.3) Select the Hello.cu file and open it’s properties pages. In the general tab change the Item Type to “CUDA C/C++”.

3_3_HelloWorldCuda

3.4) Select the project and open the properties (ALT-Enter). In the general tab set the Platform Toolset field to v90 (if you are not able to do this then you probably don’t have VS 2008 installed, this is required by CUDA).

3_4_HelloWorldCuda

3.5) Open the Linker | General properties page and add “$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.

3_5_HelloWorldCuda

Note that the CUDA/C++ properties tab is now visible.

3.6) Open the Linker | Input properties page and add “cudart.lib;” to the Additional Dependencies field.

3_6_HelloWorldCuda

3.7) Make sure that your projects will always build in the correct order. Right click on the HelloWorld project and select Project Dependencies. Check the box next to HelloWorldCuda. This will force the HellowWorldCuda project to build before HelloWorld.

5) Build the solution. At this point the solution should build without any warnings or errors. It doesn’t do anything yet but all the pieces are in place.

Adding some CUDA/Thrust code

Now it’s time to add some code. We need to write some CUDA code in HelloWorldCuda DLL and export it so that the HelloWorld application can execute it.

5) Configure the HelloWorld project. It needs to link the HelloWorldCuda and also have access to the appropriate header files.

5.1) Open the Linker | General properties page and add “..\$(Configuration);$(CUDA_PATH_V4_0)\lib\$(Platform);” to the Additional Libraries Directories field.

5_1_HelloWorld

5.2) Open the Linker | Input properties page and add “cudart.lib;HelloWorldCuda.lib;” to the Additional Dependencies field.

5_2_HelloWorld

5.3) Open the C/C++ | general properties page and add “..\HelloWorldCuda\; $(CUDA_PATH_V4_0)\Include;” to the Additional Include Directories field.

5_3_HelloWorld

5.4) Open the Project | Project Dependencies menu item and check the HelloWorldCuda box to make the CUDA project a dependency of the main Win32 application project.

5_4_HelloWorld

6) Now it’s time to write some code. CUDA 4.0 now comes with Thrust so we’re going to use Thrust in our example. If you’re not using 4.0 then you need to download the latest Thrust library (link below) and copy it into a Thrust folder inside the CUDA SDK include folder %CUDA_PATH%\include\thrust.

This is a Hello World application so the code is very simple. It’s a variation of the first example on the Thrust project homepage.

Add the following class declaration to Hello.h. Most of the code is to fix up compilation warnings. Really all this does is declares a class that is constructed with a host_vector<unsigned long> and then has some methods that execute CUDA code and return results.

   1: #pragma once
   2: #pragma warning(push)
   3: #pragma warning(disable: 4996)      // Thrust's use of strerror
   4: #pragma warning(disable: 4251)      // STL class exports
   5: #include "thrust/host_vector.h"
   6: #include "thrust/device_vector.h"
   7: #pragma warning(pop)
   8:
   9: // See: http://support.microsoft.com/default.aspx?scid=KB;EN-US;168958 
  10: //      http://msdn.microsoft.com/en-us/library/esew7y1w.aspx
  11: //      http://www.unknownroad.com/rtfm/VisualStudio/warningC4251.html
  12:
  13: #if defined(__CUDACC__)
  14: #   define DECLSPECIFIER  __declspec(dllexport)
  15: #   define EXPIMP_TEMPLATE
  16: #else
  17: #   define DECLSPECIFIER  __declspec(dllimport)
  18: #   define EXPIMP_TEMPLATE extern
  19: #endif
  20:
  21: #pragma once
  22: #pragma warning(push)
  23: #pragma warning(disable: 4231)
  24: EXPIMP_TEMPLATE template class
  25:     DECLSPECIFIER thrust::device_vector<unsigned long>;
  26: EXPIMP_TEMPLATE template class
  27:     DECLSPECIFIER thrust::detail::vector_base<unsigned long,
  28:         thrust::device_malloc_allocator<unsigned long>>;
  29: #pragma warning(pop)
  30:
  31: class DECLSPECIFIER Hello
  32: {
  33: private:
  34:     thrust::device_vector<unsigned long> m_device_data;
  35:
  36: public:
  37:     Hello(const thrust::host_vector<unsigned long>& data);
  38:     unsigned long Sum();
  39:     unsigned long Max();
  40: };

Hello.cu declares the constructor and Sum and Max methods. The constructor copies the data onto the device, while the Sum and Max methods call Thrust algorithms to carry out calculations on the GPU.

   1: #include "Hello.h"
   2:
   3: Hello::Hello(const thrust::host_vector<unsigned long>& data)
   4: {
   5:     m_device_data = data;
   6: }
   7:
   8: unsigned long Hello::Sum()
   9: {
  10:     return thrust::reduce(m_device_data.cbegin(), m_device_data.cend(),
  11:         0, thrust::plus<unsigned long>());
  12: }
  13:
  14: unsigned long Hello::Max()
  15: {
  16:     return *thrust::max_element(m_device_data.cbegin(), m_device_data.cend(),
  17:         thrust::less<unsigned long>());
  18: }

Finally HelloWorld.cpp contains the application’s entry point and executes the CUDA/Thrust code. It also calculates the answers on the host’s CPU so that you can check for correctness.

   1: #include "stdafx.h"
   2: #include <iostream>
   3: #include <algorithm>
   4: #include <vector>
   5: #include "Hello.h"
   6:
   7: using namespace ::std;
   8:
   9: int _tmain(int argc, _TCHAR* argv[])
  10: {
  11:     cout << "Generating data..." << endl;
  12:     thrust::host_vector<unsigned long> host_data(100000);
  13:     thrust::generate(host_data.begin(), host_data.end(), rand);
  14:     cout << "generated " << host_data.size() << " numbers" << endl;
  15:
  16:     cout << endl << "Running host code..." << endl;
  17:     unsigned long host_result = thrust::reduce(host_data.cbegin(), host_data.cend(),
  18:                                                0, thrust::plus<unsigned long>());
  19:     cout << "The sum is: " << host_result << endl;
  20:     host_result = *thrust::max_element(host_data.cbegin(), host_data.cend(),
  21:                                        thrust::less<unsigned long>());
  22:
  23:     cout << "The max is: " << host_result << endl;
  24:
  25:     cout << endl << "Copying data to device..." << endl;
  26:     Hello hello(host_data);
  27:
  28:     cout << endl << "Running CUDA device code..." << endl;
  29:     unsigned long device_result = hello.Sum();
  30:     cout << "The sum is: " << device_result << endl;
  31:
  32:     cout << endl << "Running CUDA device code..." << endl;
  33:     device_result = hello.Max();
  34:     cout << "The max is: " << device_result << endl;
  35:
  36:     return 0;
  37: }

Run the application and you should see the following output:

6_Console

You may see lots of warnings Resolving Thrust/CUDA warnings “Cannot tell what pointer points to…”. This appears to be a know issue. They only appear when the NVCC compiler’s –G0 flag is set and/or the project is compiling against arch sm_10.

Making use of the Parallel Patterns Library and C++ lambdas

So now we have a Win32 application that runs CUDA code using the Thrust template library. We could have done this with a single project that targeted the v90 toolset. Update the HelloWorld.cpp file to use the parallel_invoke algorithm to run the host and device code in parallel.

   1: #include "stdafx.h"
   2: #include <iostream>
   3: #include <algorithm>
   4: #include <vector>
   5: #include "ppl.h"
   6:
   7: #include "Hello.h"
   8:
   9: using namespace ::std;
  10: using namespace ::Concurrency;
  11:
  12: int _tmain(int argc, _TCHAR* argv[])
  13: {
  14:     cout << "Generating data..." << endl;
  15:     thrust::host_vector<unsigned long> host_data(100000);
  16:     thrust::generate(host_data.begin(), host_data.end(), rand);
  17:     cout << "generated " << host_data.size() << " numbers" << endl;
  18:
  19:     parallel_invoke(
  20:         [host_data]()
  21:         {
  22:             cout << endl << "Running host code..." << endl;
  23:             unsigned long host_result = thrust::reduce(host_data.cbegin(),
  24:                 host_data.cend(), 0, thrust::plus<unsigned long>());
  25:             cout << "The sum is: " << host_result << endl;
  26:             host_result = *thrust::max_element(host_data.cbegin(),
  27:                 host_data.cend(), thrust::less<unsigned long>());
  28:             cout << "The max is: " << host_result << endl;
  29:         },
  30:         [host_data]()
  31:         {
  32:             cout << endl << "Copying data to device..." << endl;
  33:             Hello hello(host_data);
  34:
  35:             cout << endl << "Running CUDA device code..." << endl;
  36:             unsigned long device_result = hello.Sum();
  37:             cout << "The sum is: " << device_result << endl;
  38:
  39:             cout << endl << "Running CUDA device code..." << endl;
  40:             device_result = hello.Max();
  41:             cout << "The max is: " << device_result << endl;
  42:         }
  43:     );
  44:     return 0;
  45: }

Notice how the output ordering has changed. The call to parallel_invoke takes to lambda expressions containing code that is now run in parallel.

7_Console

The complete code for this sample is available on here.

Other resources

How to create/upgrade a CUDA project in VS2008 and VS2010 to work with Nsight 1.5 and CUDA 3.2 (NVIDIA forum post)

Thrust (Project homepage on Google Code)

The Parallel Patterns Library

Lambda expressions in C++ Visual Studio 2010

come from: http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值