All Analog CNN Accelerator with RRAMs for Fast Inference PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download All Analog CNN Accelerator with RRAMs for Fast Inference PDF full book. Access full book title All Analog CNN Accelerator with RRAMs for Fast Inference by Minghan Chao. Download full books in PDF and EPUB format.

All Analog CNN Accelerator with RRAMs for Fast Inference

All Analog CNN Accelerator with RRAMs for Fast Inference PDF Author: Minghan Chao
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
As AI applications become more prevalent and powerful, the performance of deep learning neural network is more demanding. The need to enable fast and energy efficient circuits for computing deep neural networks is urgent. Most current research works propose dedicated hardware for data to reuse thousands of times. However, while re-using the same hardware to perform the same computation repeatedly saves area, it comes at the expense of execution time. This presents another critical obstacle, as the need for real-data and rapid AI requires a fundamentally faster approach to implementing neural networks. The focus of this thesis is to duplicate the key operation - multiply and accumulate (MAC) computation units, in the hardware so that there is no hardware re-use, enabling the entire neural network to be physically fabricated on a single chip. As neural networks today often require hundreds of thousands to tens of millions of MAC computation units, this requires designing the smallest MAC computation units to fit all of the operations on chip. Here, we present initial analysis on a convolutional neural network (CNN) accelerator that implements such a system, optimizing for inference speed. The accelerator duplicates all of the computation hardware, thus eliminating the need to fetch data back and forth while reusing the same hardware. We propose a novel design for memory cells using resistive random access memory (RRAM) and computation units utilizing the analog behavior of transistors. This circuit classifies one Cifar-10 dataset image in 6μs (160k frames/s) with 2.4[mu]J energy per classification with an accuracy of 85%. It contains 7.5 million MAC units and achieves 5 million MAC/mm2.