A graphics processing unit (GPU) is better equipped for forming images at higher resolution and faster frame rate as compared to a central processing unit (CPU) because the GPU features hundreds of compute units that can process thousands of data sets in parallel.
The parallel data structure and high thread count make GPUs inherently more suitable for applications like medical imaging and video games that demand compute-heavy features, such as concurrent visualization and interactive segmentation.
Multi-core processor designs housing both a CPU and a GPU have existed for many years. In fact, almost every notebook, smartphone, and tablet PC now boasts a multi-core processor with an integrated GPU and many other accelerators for audio, networking and other features. However, in these multi-core processor designs, a GPU usually doesn’t access application memory directly and thus acts as a slave to the CPU.
A few years ago, AMD introduced the concept of an accelerated processor unit (APU) that incorporates cache-coherent memory for both the CPU and GPU inside the processor. The idea of combining the two processing units on the same bus to increase the processor throughput eventually led to the creation of the Heterogeneous System Architecture (HSA) Foundation in 2012.
The set of standards and specifications within HSA facilitate the common bus and shared memory for the CPU, GPU, and other accelerators in a bid to make these vastly different architectures work in tandem. Industry leaders like AMD, ARM, MediaTek, and Texas Instruments are part of this effort that marks a significant break from the existing multi-core processor design approach.
HSA takes the existing heterogeneous computing to the next level.
For a start, HSA 1.0 aimed to unlock the GPU potential in embedded computing by automating an offload of calculations from the CPU to the GPU, and vice-versa. By enabling software to efficiently dispatch tasks to the GPU with much lower latency and with dramatically reduced overhead, HSA allows the GPU tasks to directly and securely access data in system memory via the shared virtual memory feature (SVM) and walk data structures in application process memory (ptr-is-ptr). And this can all now be done without requiring host CPU provisioning of data buffers as previously required in legacy GPU compute APIs.
Upcoming releases of the HSA standards integrate Digital Signal Processors (DSP) into the architecture and also improve the efficient interoperation with non-HSA enabled programmable and fixed-function accelerators in the system.
Next up, while HSA is a great foundation for general-purpose GPU (GPGPU) APIs like OpenCL, with its fine-grain & coarse grain shared virtual memory features, many high-level languages have been ported and optimized to natively target HSA platforms, including C++ 17, GCC, LLVM/CLANG, and Python. Work is also ongoing to optimize software frameworks such as CAFFE, BLAS, CHARM++, FFT, Sparse, FLAME, and Docker to make it easier for developers to efficiently program and use heterogeneous parallel devices directly.
GPU acceleration offers exceptional speed to efficiently fulfill medical imaging’s unique data throughput and post-processing needs.
How HSA is transforming the heterogeneous design environment? By effectively dealing with professional workloads, and how it will impact medical and print imaging segments.
Specifically, the tutorials are The Heterogeneous System Architecture – A foundation for the next generation of heterogeneous computing, and GPU compute in medical and print imaging, while the panel is titled Heterogeneous Systems Architectures: Power, performance, and programming for the future.
2016 PAUL BLINZER, FELLOW, AMD, CHAIRPERSON, SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION