With the release of CUDA, a parallel computing platform and application programming interface (API) model, in 2007 by NVIDIA the learning curve to code on the GPUs have drastically reduced. Along with the performance benefits the use of GPU promises for a variety of applications, it has become very enticing for the software developers and researchers to make use of this new hardware/software stack for faster feedback to their problems.
This thesis is an attempt to solve the computationally expensive Maximum Likelihood Expectation Maximization (MLEM) algorithm with respect to the image reconstruction in Positron Emission Tomography (PET). The CuBLAS, CuSparse and NVML libraries, provided by NVIDIA, have been extensively used to run the algorithm and to harness the full power of the GPUs.
The most expensive operation in the entire process is the transpose Sparse Matrix Vector Multiplication(SPMV_T) for which the functions provided by the CuSparse libraries were used and which were later bench-marked against the custom kernels developed during the thesis. Apart from that the effect of multi GPU, Cuda Aware MPI, pinned memory and hybrid computing have also been studied with respect to the performance and accuracy of the results.
Finally, the last section has been dedicated to the discussion of the limitations of present implementation and how those limitations could be overcome by making the code resource aware. It also discusses how the performance of the code could be improved by using the merge based SPMV to rewrite the most expensive loop operation i.e. SPMV_T.
«
With the release of CUDA, a parallel computing platform and application programming interface (API) model, in 2007 by NVIDIA the learning curve to code on the GPUs have drastically reduced. Along with the performance benefits the use of GPU promises for a variety of applications, it has become very enticing for the software developers and researchers to make use of this new hardware/software stack for faster feedback to their problems.
This thesis is an attempt to solve the computationally...
»