Delay-and-sum(DAS) algorithm is a widespread algorithm used in various typical applications, such as medical ultrasound imaging, radar signal emission and reception, and antenna directional signal formation. It is not a typical computationally demanding algorithm, but under new circumstance, for example medical imaging cloud service, its calculation speed should be increased so as to meet the cloud network communication speed requirements. The conventional cloud infrastructures using central processing units as prime computing resources are not adequate for the fast image formation process. Therefore, SuperVessel cloud accelerated with heterogeneous field-programmable gate arrays was used as the implementation platform for the parallel delay-and-sum algorithm in this design.

What it does

This design is used for accelerating DAS algorithm with CAPI acceleration core on SuperVessel platform. And the computing speed of the SuperVessel heterogeneous implementation is 22 times faster than the algorithm implementation on a central processing unit, which also includes the data transferring time. Therefore we can use this high speed implementation for medical imaging cloud service.

Here is the design source code with its CAPI accelerator.

How I built it

The Design block diagram of FPGA implementation for parallel DAS algorithm is shown in below figure.


As seen from this figure, the FPGA implementation consists of five parts: input data splitter, loop of RowsCount generator, signal selector, summation, and output data controller.

In our design, each echo signal dataset contains 8388608 single float numbers. And transferring dataset from DDR to BRAM in our design takes a lot of time, about 10.9 ms. Furthermore, calculation processing takes much more less time, which just takes 76160 cycles. Detailed time information is shown in follow table:

data transferring Calculation total
10.9 ms 0.3 ms 11.2 ms

However, calculation of the same image by using CPU takes about 246 ms. In other word our design is quite quick.

We also make a simple demonstration to verify that our design can be used in medical cloud calculation, which is shown in follow figure:


Challenges I ran into

Firstly, because of the lack of the documents on how to use CAPI interface and SuperVessel platform, we have spent a lot of time to try to do right thing.

Then, the debugging of CAPI programming is fairly difficult. However, after several trials, we have learned a good debug method. Firstly we finished basic modules with unit tests, then we assembled these modules step by step, and did appropriate correction.

Accomplishments that I'm proud of

In this design, we have archived a acceleration core of DAS algorithm on SuperVessel platform, and we have got almost 22 times speedup compered with C implementation on a central processing unit, which includes data transferring time.

Then, we have developed a useful method on debugging CAPI acceleration core, and successfully finished our design depended on this method.

Finally, because of this design, we have won the first prize of IBM-Xilinx First Heterogeneous Computing Contest.

What I learned

In this design, I have learned these:

  1. how to use CAPI interface
  2. how to construct a acceleration core on FPGA
  3. how to use SuperVessel platform
  4. how to debug in CAPI implementation
  5. how to test and verify design

What's next for Medical Ultrasound Imaging Acceleration Based on CAPI

We will continue to optimize current acceleration core design on resource usage and clock speed. And we are going to implement the acceleration core of synthetic aperture imaging algorithm on CAPI FPGA of SuperVessel platform in the near future.

Built With

  • verilog
  • capi
  • c
Share this project: