CUDA Programming

Cancelled Posted May 3, 2014 Paid on delivery
Cancelled Paid on delivery

HW4A: knapAlgo [30 points]

Start from sequential code that you did for HW3, and implement the fill_table function in CUDA (you will essentially be using the code that you wrote for HW0, which does a sequence of kernel calls, one for each row of the table). Make sure that your program now produces correct answers for large values of the capacity. Modify the program so that beyond a certain depth, the fill_table function is executed on the host rather than the GPU, and study the value of depth at which the I/O transfer to/from the GPU is no longer worth the parallelism gains.

HW4B: knapAlgoOptSmallN (Making the CUDA compute bound) [25 points]

Using the techniques described in the lecture, modify the CUDA code so that the function fill_table function is evaluated in a single kernel call. We suggest that you proceed in the following steps.

First write a simple CUDA function that accomplishes the correct synchronization between the thread blocks. Think of writing a toy program similar to Waruna's example for syncthreads (except that that one was for threads within a single threadblock, and this one is for multiple threadblocks).

Next embellish this with the code that correctly updates (with only one thread per threadblock active) the section of the array that is allocated to this threadblock. make sure that the correct values are written and read from global memory between the synchronizations. For this to work there should be enough global memory such that each threadblock can store its entire "output" (i.e., N*WMAX or N*sigma(wi) memory per threadblock). Hence, in order to maximize the number of active threadblocks, this scheme will only work for relatively small values of N.

After ensuring that this code produces the correct answers, parallelize the computation of a threadblock. This is the part that was not completely detailed in the class, since there are a few different options that you could pursue.

HW4C: knapAlgoOpt2 [25 points]

Now modify your program so that an arbitrary value of N can be handled. For this you will repeatedly (in a sequence of kernel calls) call the code of Part B.

CUDA

Project ID: #5891041

About the project

3 proposals Remote project Active May 9, 2014

3 freelancers are bidding on average $229 for this job

cudabigdata

Hi, I'd like to finish this project for you. ___________________________________________________________________________________________________________________________________________________________________________ More

$136 USD in 3 days
(8 Reviews)
4.0
prad08

Hi, I am masters student in Embedded Systems and am doing my graduation thesis in OpenCL, the platform independent counterpart of CUDA. While I have not actually worked on CUDA, OpenCL is conceptually the same and it m More

$225 USD in 5 days
(0 Reviews)
0.0
tulebaev

Предложение еще не подано

$150 USD in 3 days
(1 Review)
0.0
patriczhao

I am very familiar with CUDA programming and I think you can check out my resume from linkedin. Since I am not full time on freelancer, I estimate to deliver your code in 15 days. Contact with me if you have any More

$311 USD in 15 days
(0 Reviews)
0.0