Sub-Menu

FPGA Acceleration

Research Overview

Research Lead: Dr. Rajesh Gupta, Professor, Computer Science and Engineering, UC San Diego

Effective parallelization is a key to energy-efficient delivery of performance. A long history of custom designed machines that expressly map a given application to building maximally parallel hardware has shown two to three orders magnitude in performance, cost, or even energy efficiency against a general purpose solution, only to have been superseded by the continuing advances in general purpose computing due to Moore's law. Rather than build a custom machine for a given application code, we need architectures and methods that enable application-specific customization of the code and its execution on machines that use specialized co-processor assists. These assists may be integrated at various levels (e.g., distances from the CPU instruction stream) from Instruction Set Architecture (ISA) extensions to memory-mapped co-processors to stand-alone parallel processors. Recent work on scaling trends in computing (size, speed, and resolution) points to emerging workload convergence that allows us to classify them in architecturally useful ways, such as the recognition, mining, synthesis (RMS) framework. Using such characterization, we can now think about the efficiency of machine architectures in ways closer to the application characteristics (e.g., math solvers, data mining, visualization, and learning, as opposed to traditional memory behavior) and to systematically use these for building specialized machines. We plan to capture and use such 'semantic information' from the application for the co-design process by 'compressing the abstraction layers' from application to architecture and its implementation and thus provide important application-level information to the low-level mapping and implementation tools. The key to this compression is the emergence of semi-structured data and methods to manipulate these with SOAs. Currently, such exploration requires setting up diverse pieces of hardware, their compilation-synthesis-mapping tool chains and, of course, training students to work with diverse development platforms. Consequently, not much hard data exists that can analyzed to find out exactly algorithmic features and in what context are better suited for implementing a function in an FPGA or on a CPU.

We plan to capture and use such 'semantic information' from the application for the co-design process by 'compressing the abstraction layers' from application to architecture and its implementation and thus provide important application-level information to the low-level mapping and implementation tools. The key to this compression is the emergence of semi-structured data and methods to manipulate these with SOAs. Currently, such exploration requires setting up diverse pieces of hardware, their compilation/synthesis/mapping tool chains and, of course, training students to work with diverse development platforms. Consequently, little hard data exists that can analyzed to determine in what context algorithmic features are better suited for implementing a function in an FPGA or on a CPU.

The GreenLight Instrument will offer a diverse array of reconfigurable co-processing (coarse grain, fine-grain/memory-coherent), accelerated datapaths under processor control, systolic arrays and direct graph execution machines. Our strategy is to use commodity hardware to quickly build platforms that can be integrated within the web-centric access methods as a part of the GreenLight Instrument. For this reason, we will prototype architectural configurations using a general-purpose reconfigurable hardware. For a tighter integration of FPGA in a parallel architecture, we will work with Convey Computer Corporation which has taken co-processing to a new level of sophistication by enabling coherent access to the FPGAs in order reduce to development costs. On another fabric that is more silicon efficient than FPGAs, we will work with Ambric Systems to provide their massively parallel CPU array as an architectural alternative to exploit parallelism and to build direct graph execution machines. The GreenLight Instrument will allow us to migrate whole computations from one fabric to the other, at various levels of processing/communications/storage granularity, and thus enable the researchers to devise metrics and optimization methods to automatically determine the right fabrics and mapping methods for a given computation and its behavior at runtime.

Technical White Papers


FPGA Acceleration 2010
FPGA Acceleration 2009