RP2 - Architecture and Heterogeneous System Integration

Sections
Text Area

RP2 focuses on exploring different new architectural and system integration solutions for efficient neuromorphic computing on platforms ranging from cloud to smart Internet of Things (IoT).

Left Column
Text Area

Communication Interface and OS / Architectural Support for CPU-Accelerator Heterogenous Architecture

RP2-1:

Right Column
Image
Image
RP2-1.png
Image Caption
Figure RP2-1: Different interface design between CPU and hardware accelerator
Text Area

Project RP2-1 focus on optimizing the memory communication interface between CPU and the dedicated accelerator. Meanwhile, the multi-accelerator multi-host system in cloud training requires to consider the communication interface and OS/Architectural support with high scalability and efficiency. Dedicated optimization of the memory subsystem is proposed to address the introduced challenges from the near-memory/in-memory communication issue. Architectural innovations and OS support are further investigated to efficiently facilitate memory-centric computing in a multi-user environment. testing

Left Column
Text Area

Architecture Optimization Considering the Characteristics of the Deep Neural Networks

RP2-2:

Right Column
Image
Image
RP2-2.png
Image Caption
Figure RP2-2: Video comprehension network integrating tensorization and quantization
Text Area

Project RP2-2 will encompass new and promising initiatives in neural network modeling and design algorithms to provide a compressed network that can achieve optimal complexity-accuracy tradeoff and storage reduction. At the same time, we will also study the fundamental building blocks of a deep neural network and search for the globally optimal DNN structures by learning how to learn. The corresponding architectures that can exploit the optimized tensor ranks, bit quantization and the optimal building block structures will also be developed.

Left Column
Text Area

Energy Efficient AI Accelerators for Sensor-based Edge Devices

RP2-3:

Right Column
Image
Image
RP2-3.png
Image Caption
Figure RP2-3: Mixed analog/ digital in-memory computation architecture interfacing with sensor
Text Area

In Project RP2-3, a mixed analog/digital in-memory computation architecture interfacing with sensor array is devised for the inference engine. This project includes embedding some data processing functionality within the sensors to reduce the data movement between the sensor and the computation engine, in-memory computation architecture to reduce the costly memory movement. Different analog computation architectures will be developed for different types of sensor readout interfaces, as well as low-resolution networks to reduce the costly high resolution A/D conversion and the computation complexity in the digital layers.

Left Column
Text Area

Architectural Optimization of In-Memory CMOS-ReRAM Accelerator for Large-Scale Neural Network

RP2-4:

Right Column
Image
Image
RP2-4.png
Image Caption
Figure RP2-4: An In-memory accelerator architecture based on CMOS-ReRAM
Text Area

Project RP2-4 focus on developing AI accelerator architecture based on the emerging non-volatile memory technologies. By using the ReRAM-based crossbar as in-memory computation engine, a scalable and energy-efficient architecture is proposed to realize large-scale quantization-trained network. Most-significant-bits-first (MSBF) computation paradigms and memory compression algorithms will be proposed for reducing the amount of weight storage and increasing the computation sparsity. Improving the energy efficiency by another order of magnitude is the goal.