Thread 02: Benchmarking
There is growing evidence that—with respect to traditional Boolean circuits and von Neumann architectures—it will be challenging for beyond-CMOS devices to compete with CMOS technology if these devices are simply used as drop-in replacements. However, exploiting the unique characteristics of beyond-CMOS devices in conjunction with alternative/new computing models, architectures and circuits can potentially offer orders of magnitude improvement in terms of power, performance and functional capabilities over their CMOS counterparts. To accomplish such a goal, cross-layer (both top-down and bottom-up) design efforts spanning from devices to circuits to architectures to algorithms are indispensable. Such cross-layer efforts are particularly valuable for developing domain-specific accelerators that are effective in addressing concerns related to energy efficiency. In order to (i) assess and compare the numerous cross-layer approaches that are currently being investigated for domain-specific accelerator design, and (ii) identify bottlenecks in achieving energy/performance/accuracy objectives, it is important to conduct “end-to-end” evaluations of device level efforts for a representative set of application domains. A critical need in such a cross-layer effort is a uniform modeling approach that is capable of capturing the huge design space encompassing various device / circuit / architecture / algorithm options. This project is precisely targeted at addressing this need. We aim to develop a uniform modeling framework that takes as inputs descriptions of algorithms and architectures, as well as technologies parameters to predict system performance and energy.
Deep Neural Network Benchmarking
As a representative example, recently Deep Neural Networks (DNNs) have achieved tremendous success in many application domains, e.g., speech recognition, image processing, playing complex games, etc. Inspired by its success, specialized accelerators have been and continue to be developed to process DNN workloads in an energy-efficient manner. While existing accelerators have diverse architectural designs, use different data mapping strategies, employ different microarchitectures, use different device-technologies, etc., nearly all accelerators employ a memory hierarchy to reduce expensive, long-range data movement. We have developed a uniform modeling framework for DNN accelerators to estimate the required energy for a given network. To estimate the total energy, we model the number of accesses, and associated energy cost at different levels of memory, and functional units. To model the energy cost of a individual access, we employ a device-level benchmarking approach. Our approach can accurately model energy contributions from device-technology, circuits, architecture, data mapping strategy, and network. We applied our model on three accelerator architectures from the literature, namely: Eyeriss, ShiDianNao, and TrueNorth, and demonstrate that the model can reliably estimate the energy required by different components of different accelerator architecture topologies for a given workload or network.
Representative work: Indranil Palit, Qiuwen Lou, Robert Perricone, Michael Niemier, and X. Sharon Hu, “A Uniform Modeling Methodology for Benchmarking DNN Accelerators,” in International Conference on Computer Aided Design (ICCAD), p. 1-7, 2019.