Migrating Hermitian Matrix Multiplication CUDA Code to SYCL

Inspiration

My initial interest came from my school research where I had used CUDA for benchmarking various NVIDIA GPUs. I was deeply intrigued by the possibilities of porting this code to a wider range of platforms, especially after learning about Intel’s oneAPI, DPCPP SYCL, and their new Data Center GPU Max Series.

What I Learnt

This hackathon was an incredible learning curve. I gained firsthand experience with oneAPI and the SYCL framework, realizing the potential of cross-platform accelerator programming. The process taught me how to make migrations from CUDA to SYCL, giving me a broader understanding of various vendor-specific architectures beyond NVIDIA.

How I Built It

Starting with my original CUDA code, which performed Hermitian matrix multiplication, I was intrigued by the task of migrating it to SYCL. By following the provided guidelines and leveraging Intel Developer discussions, I transitioned my codebase to be compatible with Intel's ecosystem. My focus was not just on making the code work but ensuring optimal performance on new platforms.

Challenges I Faced

Migrating from CUDA to SYCL was not completely straightforward. Some libraries in CUDA did not have direct equivalents in SYCL(After automatic porting using c2s), so I had to find alternatives or come up with custom subroutines. Performance tuning was another hurdle, given the various differences between NVIDIA and Intel hardware architectures. Insights from Intel VTune and Advisor were invaluable in pinpointing bottlenecks and areas of improvement.

Built With

advisor
c++
cuda
devcloud
intel
linux
oneapi
sycl
toolkits
vscode
vtune
windows

Updates

Migara Amarasinghe started this project — Sep 13, 2023 07:32 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.