CUDA Programming
$30-250 USD
Paid on delivery
HW4A: knapAlgo [30 points]
Start from sequential code that you did for HW3, and implement the fill_table function in CUDA (you will essentially be using the code that you wrote for HW0, which does a sequence of kernel calls, one for each row of the table). Make sure that your program now produces correct answers for large values of the capacity. Modify the program so that beyond a certain depth, the fill_table function is executed on the host rather than the GPU, and study the value of depth at which the I/O transfer to/from the GPU is no longer worth the parallelism gains.
HW4B: knapAlgoOptSmallN (Making the CUDA compute bound) [25 points]
Using the techniques described in the lecture, modify the CUDA code so that the function fill_table function is evaluated in a single kernel call. We suggest that you proceed in the following steps.
First write a simple CUDA function that accomplishes the correct synchronization between the thread blocks. Think of writing a toy program similar to Waruna's example for syncthreads (except that that one was for threads within a single threadblock, and this one is for multiple threadblocks).
Next embellish this with the code that correctly updates (with only one thread per threadblock active) the section of the array that is allocated to this threadblock. make sure that the correct values are written and read from global memory between the synchronizations. For this to work there should be enough global memory such that each threadblock can store its entire "output" (i.e., N*WMAX or N*sigma(wi) memory per threadblock). Hence, in order to maximize the number of active threadblocks, this scheme will only work for relatively small values of N.
After ensuring that this code produces the correct answers, parallelize the computation of a threadblock. This is the part that was not completely detailed in the class, since there are a few different options that you could pursue.
HW4C: knapAlgoOpt2 [25 points]
Now modify your program so that an arbitrary value of N can be handled. For this you will repeatedly (in a sequence of kernel calls) call the code of Part B.
Project ID: #5891041
About the project
3 freelancers are bidding on average $229 for this job
Hi, I'd like to finish this project for you. ___________________________________________________________________________________________________________________________________________________________________________ More
I am very familiar with CUDA programming and I think you can check out my resume from linkedin. Since I am not full time on freelancer, I estimate to deliver your code in 15 days. Contact with me if you have any More