RP2 focuses on exploring different new architectural and system integration solutions for efficient neuromorphic computing on platforms ranging from cloud to smart Internet of Things (IoT).
Communication Interface and OS / Architectural Support for CPU-Accelerator Heterogenous Architecture
Project RP2-1 focus on optimizing the memory communication interface between CPU and the dedicated accelerator. Meanwhile, the multi-accelerator multi-host system in cloud training requires to consider the communication interface and OS/Architectural support with high scalability and efficiency. Dedicated optimization of the memory subsystem is proposed to address the introduced challenges from the near-memory/in-memory communication issue. Architectural innovations and OS support are further investigated to efficiently facilitate memory-centric computing in a multi-user environment. testing
Architecture Optimization Considering the Characteristics of the Deep Neural Networks
Project RP2-2 will encompass new and promising initiatives in neural network modeling and design algorithms to provide a compressed network that can achieve optimal complexity-accuracy tradeoff and storage reduction. At the same time, we will also study the fundamental building blocks of a deep neural network and search for the globally optimal DNN structures by learning how to learn. The corresponding architectures that can exploit the optimized tensor ranks, bit quantization and the optimal building block structures will also be developed.
Energy Efficient AI Accelerators for Sensor-based Edge Devices
In Project RP2-3, a mixed analog/digital in-memory computation architecture interfacing with sensor array is devised for the inference engine. This project includes embedding some data processing functionality within the sensors to reduce the data movement between the sensor and the computation engine, in-memory computation architecture to reduce the costly memory movement. Different analog computation architectures will be developed for different types of sensor readout interfaces, as well as low-resolution networks to reduce the costly high resolution A/D conversion and the computation complexity in the digital layers.
Architectural Optimization of In-Memory CMOS-ReRAM Accelerator for Large-Scale Neural Network
Project RP2-4 focus on developing AI accelerator architecture based on the emerging non-volatile memory technologies. By using the ReRAM-based crossbar as in-memory computation engine, a scalable and energy-efficient architecture is proposed to realize large-scale quantization-trained network. Most-significant-bits-first (MSBF) computation paradigms and memory compression algorithms will be proposed for reducing the amount of weight storage and increasing the computation sparsity. Improving the energy efficiency by another order of magnitude is the goal.