Supporting Approximate Computing on Coarse Grained Re-configurable Array Accelerators PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Supporting Approximate Computing on Coarse Grained Re-configurable Array Accelerators PDF full book. Access full book title Supporting Approximate Computing on Coarse Grained Re-configurable Array Accelerators by Jonathan Dickerson. Download full books in PDF and EPUB format.

Supporting Approximate Computing on Coarse Grained Re-configurable Array Accelerators

Author: Jonathan Dickerson
Publisher:
ISBN:
Category : Accelerator-driven systems
Languages : en
Pages : 56

Book Description
Recent research has shown approximate computing and Course-Grained Reconfigurable Arrays (GGRAs) are promising computing paradigms to reduce energy consumption in a compute intensive environment. CGRAs provide a promising middle ground between energy inefficient yet flexible Freely Programmable Gate Arrays (FPGAs) and energy efficient yet inflexible Application Specific Integrated Circuits (ASICs). With the integration of approximate computing in CGRAs, there is substantial gains in energy efficiency at the cost of arithmetic precision. However, some applications require a certain percent of accuracy in calculation to effectively perform its task. The ability to control the accuracy of approximate computing during run-time is an emerging topic. This paper presents a rudimentary way to have run-time control of approximation on the CGRA by profiling a function, then generating tables to meet the given approximation accuracy. During the profiling stage, the application is run with all types of approximation, which produces a file that contains the errors for all approximation types (zero, first, third) and the exact value. After the profiling stage, the output is parsed and a table is created with the highest order of approximation type possible and the associated error. Using the auto-generated table, the given tolerance is achieved, while maintaining the highest order of approximation type, which yields the best power savings. The simulation records the metrics associated with each approximation type, which it uses to calculate the achieved power savings for each run.

Supporting Approximate Computing on Coarse Grained Re-configurable Array Accelerators

Author: Jonathan Dickerson
Publisher:
ISBN:
Category : Accelerator-driven systems
Languages : en
Pages : 56

Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems

Author: Satyajit Das
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing.

Efficient, Scalable and High-Throughput Runtime Reconfigurable Arrays for Accelerator as a Service

Author: Sumeet Singh Nagi
Publisher:
ISBN:
Category :
Languages : en
Pages : 197

Book Description
Advancements in silicon processing are responsible for the exponential growth in computing performance and algorithmic development. With the end of Dennard scaling, conventional computing architectures, like CPU, are unable to keep up with the increasing computation requirements of modern algorithms. Hardware accelerators are designed for each such computation-heavy algorithm and incorporated into the system; a modern System-On-Chip (SoC) for phones can have up to 30 different accelerators. Modern high-compute applications such as 5G, machine learning, and autonomous driving vehicles require accelerators to keep up with their rapidly evolving standards and computation needs. However, with the rising design costs at newer technology nodes, the iterative development of inflexible accelerators becomes prohibitively expensive. Reconfigurable architectures, with their ability to adapt to rapidly-evolving standards as well as their ability to accommodate several such high-performance applications in the system, provide an ideal solution. The motivation of this dissertation is to develop such a Coarse Grain Reconfigurable Architecture called Universal Digital Signal Processor (UDSP) which could replace accelerator blocks in an SoC, and develop a hardware management system to enable concurrent multiprogram functionalities in the reconfigurable architectures. UDSP consists of 196 Compute Elements (CEs) and a statistics-based scalable, delayless, high speed routing network. It is developed using an algorithm-driven framework to allow for faster development of each successive revision of the design. The tileable and scalable nature of UDSP allowed us to put together 4 UDSP dies on a 10 m fine-pitch interposer Silicon Interconnect Fabric, as a 2 2 UDSP Multi-Chip Module (MCM), quadrupling the number of compute resources. The UDSP 2 2 assembly has a peak throughput of 3,450 Giga-Operations per second (GOPs) or 1,725 Giga-Multiply Accumulates per second (GMACs) at 1.1GHz clock frequency while consuming 6W power including 0.38pJ/bit to transfer data across dies in TSMC 16nm. It achieves a peak efficiency of 785GMACs/J (0.42V, 315MHz). UDSP lies within 4.2 energy efficiency and 6.4 area efficiency gap relative to ASICs at nominal operation conditions (0.8V, 1.1GHz).Multiprogram tenancy on conventional reconfigurable arrays requires high manual effort from the programmer to foresee and account for runtime program dynamics during compilation. The inability to predict runtime and multiprogram dynamics places the recompilation time of programs in the critical timing path, leading to long reconfiguration times, poor active resource utilization, and low acceleration performance. We developed an active hardware resource management system for reconfigurable arrays that automatically accounts for multi-program dynamics at runtime, eases the workload of the programmer, and improves the array's performance. These hardware management techniques enable dynamic runtime relocation of programs on the Runtime Reconfigurable Array (RTRA) with minimal reconfiguration latency overhead, which allows the array to offer Accelerator as a Service (ACAS). The ACAS architecture virtualizes the array by spatially and temporally scheduling multiple programs on its available resources, thus achieving higher active utilization for the mapped programs on the array. ACAS allows developers to compile programs for acceleration on reconfigurable array without requiring additional manual steps for runtime resource planning at compile time. Provided with high program pressure, ACAS exceeds 90% active utilization of arrays. For signal processing workloads, our simulated 9 12 RTRA uses a 3 smaller area and delivers 3.2−4.3 more throughput than a 18 18 UDSP and the 18 18 RTRA delivers 8 − 14 more throughput as compared to its equivalent-sized 18 18 UDSP counterpart.

From Variability Tolerance to Approximate Computing in Parallel Integrated Architectures and Accelerators

Author: Abbas Rahimi
Publisher: Springer
ISBN: 3319537687
Category : Technology & Engineering
Languages : en
Pages : 204

Book Description
This book focuses on computing devices and their design at various levels to combat variability. The authors provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. They discuss methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; they also consider their implications on cost, performance and quality. Coverage includes a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross layer or hybrid resilience.

Architecture and Compiler Support for a VLIW Execution Model of a Coarse-grained Reconfigurable Array

Author: Nathaniel McVicar
Publisher:
ISBN:
Category : Array processors
Languages : en
Pages : 67

Book Description

Efficient Processing of Deep Neural Networks

Author: Vivienne Sze
Publisher: Springer Nature
ISBN: 3031017668
Category : Technology & Engineering
Languages : en
Pages : 254

Book Description
This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.

Adaptable Embedded Systems

Author: Antonio Carlos Schneider Beck
Publisher: Springer Science & Business Media
ISBN: 1461417465
Category : Technology & Engineering
Languages : en
Pages : 321

Book Description
As embedded systems become more complex, designers face a number of challenges at different levels: they need to boost performance, while keeping energy consumption as low as possible, they need to reuse existent software code, and at the same time they need to take advantage of the extra logic available in the chip, represented by multiple processors working together. This book describes several strategies to achieve such different and interrelated goals, by the use of adaptability. Coverage includes reconfigurable systems, dynamic optimization techniques such as binary translation and trace reuse, new memory architectures including homogeneous and heterogeneous multiprocessor systems, communication issues and NOCs, fault tolerance against fabrication defects and soft errors, and finally, how one can combine several of these techniques together to achieve higher levels of performance and adaptability. The discussion also includes how to employ specialized software to improve this new adaptive system, and how this new kind of software must be designed and programmed.

High-Performance Computing Using FPGAs

Author: Wim Vanderbauwhede
Publisher: Springer Science & Business Media
ISBN: 1461417910
Category : Technology & Engineering
Languages : en
Pages : 798

Book Description
High-Performance Computing using FPGA covers the area of high performance reconfigurable computing (HPRC). This book provides an overview of architectures, tools and applications for High-Performance Reconfigurable Computing (HPRC). FPGAs offer very high I/O bandwidth and fine-grained, custom and flexible parallelism and with the ever-increasing computational needs coupled with the frequency/power wall, the increasing maturity and capabilities of FPGAs, and the advent of multicore processors which has caused the acceptance of parallel computational models. The Part on architectures will introduce different FPGA-based HPC platforms: attached co-processor HPRC architectures such as the CHREC’s Novo-G and EPCC’s Maxwell systems; tightly coupled HRPC architectures, e.g. the Convey hybrid-core computer; reconfigurably networked HPRC architectures, e.g. the QPACE system, and standalone HPRC architectures such as EPFL’s CONFETTI system. The Part on Tools will focus on high-level programming approaches for HPRC, with chapters on C-to-Gate tools (such as Impulse-C, AutoESL, Handel-C, MORA-C++); Graphical tools (MATLAB-Simulink, NI LabVIEW); Domain-specific languages, languages for heterogeneous computing(for example OpenCL, Microsoft’s Kiwi and Alchemy projects). The part on Applications will present case from several application domains where HPRC has been used successfully, such as Bioinformatics and Computational Biology; Financial Computing; Stencil computations; Information retrieval; Lattice QCD; Astrophysics simulations; Weather and climate modeling.

Dynamically Reconfigurable Systems

Author: Marco Platzner
Publisher: Springer Science & Business Media
ISBN: 9048134854
Category : Technology & Engineering
Languages : en
Pages : 455

Book Description
Dynamically Reconfigurable Systems is the first ever to focus on the emerging field of Dynamically Reconfigurable Computing Systems. While programmable logic and design-time configurability are well elaborated and covered by various texts, this book presents a unique overview over the state of the art and recent results for dynamic and run-time reconfigurable computing systems. Reconfigurable hardware is not only of utmost importance for large manufacturers and vendors of microelectronic devices and systems, but also a very attractive technology for smaller and medium-sized companies. Hence, Dynamically Reconfigurable Systems also addresses researchers and engineers actively working in the field and provides them with information on the newest developments and trends in dynamic and run-time reconfigurable systems.

Reconfigurable Computing

Author: Joao Cardoso
Publisher: Springer Science & Business Media
ISBN: 1461400619
Category : Technology & Engineering
Languages : en
Pages : 308

Book Description
As the complexity of modern embedded systems increases, it becomes less practical to design monolithic processing platforms. As a result, reconfigurable computing is being adopted widely for more flexible design. Reconfigurable Computers offer the spatial parallelism and fine-grained customizability of application-specific circuits with the postfabrication programmability of software. To make the most of this unique combination of performance and flexibility, designers need to be aware of both hardware and software issues. FPGA users must think not only about the gates needed to perform a computation but also about the software flow that supports the design process. The goal of this book is to help designers become comfortable with these issues, and thus be able to exploit the vast opportunities possible with reconfigurable logic.