PDF/Textbook Efficient Inference Using Deep Convolutional Neural Networks On Resource Constrained Platforms Full Download

Efficient Inference Using Deep Convolutional Neural Networks on Resource-constrained Platforms PDF

Author: Mohammad Motamedi
Publisher:
ISBN: 9781085572187
Category :
Languages : en
Pages :

Book Description
Deep Convolutional Neural Networks (CNNs) exhibit remarkable performance in many pattern recognition, segmentation, classification, and comprehension tasks that were widely considered open problems for most of the computing history. For example, CNNs are shown to outperform humans in certain visual object recognition tasks. Given the significant potential of CNNs in advancing autonomy and intelligence in systems, the Internet of Things (IoT) research community has witnessed a surge in demand for CNN-enabled data processing, technically referred to as inference, for critical tasks, such as visual, voice and language comprehension. Inference using modern CNNs involves billions of operations on millions of parameters, and thus their deployment requires significant compute, storage, and energy resources. However, such resources are scarce in many resource-constrained IoT applications. Designing an efficient CNN architecture is the first step in alleviating this problem. Use of asymmetric kernels, breadth control techniques, and reduce-expand structures are among the most important approaches that can effectively decrease CNNs parameter budget and their computational intensity. The architectural efficiency can be further improved by eliminating ineffective neurons using pruning algorithms, and quantizing the parameters to decrease the model size. Hardware-driven optimization is the subsequent step in addressing the computational demands of deep neural networks. Mobile System on Chips (SoCs), which usually include a mobile GPU, a DSP, and a number of CPU cores, are great candidates for CNN inference on embedded platforms. Depending on the application, it is also possible to develop customized FPGA-based and ASIC-based accelerators. ASIC-based acceleration drastically outperforms other approaches in terms of both power consumption and execution time. However, using this approach is reasonable only if designing a new chip is economically justifiable for the target application. This dissertation aims to bridge the gap between computational demands of CNNs and computational capabilities of embedded platforms. We contend that one has to strike a judicious balance between functional requirements of a CNN, and its resource requirements, for an IoT application to be able to utilize the CNN. We investigate several concrete formulations of this broad concept, and propose effective approaches for addressing the identified challenges. First, we target platforms that are equipped with reconfigurable fabric, such as Field Programmable Gate Arrays (FPGA), and offer a framework for generation of optimized FPGA-based CNN accelerators. Our solution leverages an analytical approach to characterization and exploration of the accelerator design space through which, it synthesizes an efficient accelerator for a given CNN on a specific FPGA. Second, we investigate the problem of CNN inference on mobile SoCs, propose effective approaches for CNN parallelization targeting such platforms, and explore the underlying tradeoffs. Finally, in the last part of this dissertation, we investigate utilization of an existing optimized CNN model to automatically generate a competitive CNN for an IoT application whose objects of interest are a fraction of categories that the original CNN was designed to classify, such that the resource requirement of inference using the synthesized CNN is proportionally scaled down. We use the term resource scalability to refer to this concept and propose solutions for automated synthesis of context-aware, resource-scalable CNNs that meet the functional requirements of the target IoT application at fraction of the resource requirements of the original CNN.

Efficient Implementation of Deep Neural Networks on Resource-constrained Devices PDF

Author: Maedeh Hemmat
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
In recent years, Deep Neural Networks (DNNs) have emerged as an impressively successful model to perform complicated tasks including object classification, speech recognition, autonomous vehicle, etc. To provide better accuracy, state-of-the-art neural network models are designed to be deeper (i.e., having more layers) and larger (i.e., having more parameters within each layer). It subsequently has increased the computational and memory costs of DNNs, mandating their efficient hardware implementation, especially on resource-constrained devices such as embedded systems and mobile devices. This challenge can be investigated from two aspects: computation and storage. On one hand, state-of-the-art DNNs require the execution of billions of operations for each inference. This is while the computational power of embedded systems is tightly limited. On the other hand, DNN models require storage of several Megabytes of parameters which can't fit in the on-chip memory of these devices. More importantly, these systems are usually battery-powered with a limited energy budget to access memory and perform computations.This dissertation aims to make contributions towards improving the efficiency of DNN deployments on resource-constraint devices. Our contributions can be categorized into three aspects. First, we propose an iterative framework that enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration enables input-dependent approximation of the CNN at run-time, leading to significant energy savings without any significant degradation in classification accuracy. Our proposed framework breaks each inference into several iterations and fetches only a fraction of the weights from off-chip memory at each iteration to perform the computations. It then decides to either terminate the network or fetch more weights to do the inference, based on the difficulty of the received input. The termination condition can be also adjusted to trade off classification accuracy and energy consumption at run-time. Second, we exploit the user-dependent behavior of DNNs and propose a personalized inference framework that prunes an already-trained neural network model based on the preferences of individual users and without the need to retrain the network. Our key observation is that an individual user may only encounter a tiny fraction of the trained classes on a regular basis. Hence, storing trained models (pruned or not) for all possible classes on local devices is costly and unnecessary for the user's needs. Our personalized framework minimizes the memory, computation, and energy consumption of the network on the local device as it processes neurons on a need basis (i.e., only when the user expects to encounter a specific output class). Third, we propose a framework for distributed inference of DNNs across multiple edge devices to improve the communication and latency overheads. Our framework utilizes many parallel, independent-running edge devices which communicate only once to a single 'back-end' device (also an edge device) to aggregate their predictions and produce the result of the inference. To achieve this distributed implementation, our framework first partitions the classes of the complex DNN into subsets to be assigned across the available edge devices while considering the computational resources of each device. The DNN is then aggressively pruned for each device for its set of assigned classes. Each smaller DNN (SNN) is further configured to return a 'Don't Know' when encountered by an input from an unassigned class. Each SNN is generated from the complex DNN at the beginning and then loaded onto its corresponding edge device, without the need for retraining. To perform inference, each SNN will perform an inference based on its received input.

Efficient Processing of Deep Neural Networks PDF

Author: Vivienne Sze
Publisher: Springer Nature
ISBN: 3031017668
Category : Technology & Engineering
Languages : en
Pages : 254

Book Description
This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.

Towards Efficient Inference and Improved Training Efficiency of Deep Neural Networks PDF

Author: Ravi Shanker Raju (Ph.D.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
In recent years, deep neural networks have surpassed human performance on image classification tasks and and speech recognition. While current models can reach state of the art performance on stand-alone benchmarks, deploying them on embedded systems that have real-time latency deadlines either cause them to fail these requirements or severely get degraded in performance to meet the stated specifications. This requires intelligent design of the network architecture in order to minimize the accuracy degradation while deployed on the edge. Similarly, deep learning often has a long turn-around time due to the volume of the experiments on different hyperparameters and consumes time and resources. This motivates a need for developing training strategies that allow researchers who do not have access to large computational resources to train large models without waiting for exorbitant training cycles to be completed. This dissertation addresses these concerns through data dependent pruning of deep learning computation. First, regarding inference, we propose an integration of two different conditional execution strategies we call FBS-pruned CondConv by noticing that if we use input-specific filters instead of standard convolutional filters, we can aggressively prune at higher rates and mitigate accuracy degradation for significant computation savings. Then, regarding long training times, we introduce our dynamic data pruning framework which takes ideas from active learning and reinforcement learning to dynamically select subsets of data to train the model. Finally, as opposed to pruning data and in the same spirit of reducing training time, we investigate the vision transformer and introduce a unique training method called PatchDrop (originally designed for robustness to occlusions on transformers [1]), which uses the self-supervised DINO [2] model to identify the salient patches in an image and train on the salient subsets of an image. These strategies/training methods take a step in a direction to make models more accessible to deploy on edge devices in an efficient inference context and reduces the barrier for the independent researcher to train deep learning models which would require immense computational resources, pushing towards the democratization of machine learning.

Author: Ravi Shanker Raju (Ph.D.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
In recent years, deep neural networks have surpassed human performance on image classification tasks and and speech recognition. While current models can reach state of the art performance on stand-alone benchmarks, deploying them on embedded systems that have real-time latency deadlines either cause them to fail these requirements or severely get degraded in performance to meet the stated specifications. This requires intelligent design of the network architecture in order to minimize the accuracy degradation while deployed on the edge. Similarly, deep learning often has a long turn-around time due to the volume of the experiments on different hyperparameters and consumes time and resources. This motivates a need for developing training strategies that allow researchers who do not have access to large computational resources to train large models without waiting for exorbitant training cycles to be completed. This dissertation addresses these concerns through data dependent pruning of deep learning computation. First, regarding inference, we propose an integration of two different conditional execution strategies we call FBS-pruned CondConv by noticing that if we use input-specific filters instead of standard convolutional filters, we can aggressively prune at higher rates and mitigate accuracy degradation for significant computation savings. Then, regarding long training times, we introduce our dynamic data pruning framework which takes ideas from active learning and reinforcement learning to dynamically select subsets of data to train the model. Finally, as opposed to pruning data and in the same spirit of reducing training time, we investigate the vision transformer and introduce a unique training method called PatchDrop (originally designed for robustness to occlusions on transformers [1]), which uses the self-supervised DINO [2] model to identify the salient patches in an image and train on the salient subsets of an image. These strategies/training methods take a step in a direction to make models more accessible to deploy on edge devices in an efficient inference context and reduces the barrier for the independent researcher to train deep learning models which would require immense computational resources, pushing towards the democratization of machine learning.

IoT-enabled Convolutional Neural Networks: Techniques and Applications PDF

Author: Mohd Naved
Publisher: CRC Press
ISBN: 1000879690
Category : Computers
Languages : en
Pages : 409

Book Description
Convolutional neural networks (CNNs), a type of deep neural network that has become dominant in a variety of computer vision tasks, in recent years, CNNs have attracted interest across a variety of domains due to their high efficiency at extracting meaningful information from visual imagery. CNNs excel at a wide range of machine learning and deep learning tasks. As sensor-enabled internet of things (IoT) devices pervade every aspect of modern life, it is becoming increasingly critical to run CNN inference, a computationally intensive application, on resource-constrained devices. Through this edited volume, we aim to provide a structured presentation of CNN-enabled IoT applications in vision, speech, and natural language processing. This book discusses a variety of CNN techniques and applications, including but not limited to, IoT enabled CNN for speech denoising, a smart app for visually impaired people, disease detection, ECG signal analysis, weather monitoring, texture analysis, etc. Unlike other books on the market, this book covers the tools, techniques, and challenges associated with the implementation of CNN algorithms, computation time, and the complexity associated with reasoning and modelling various types of data. We have included CNNs' current research trends and future directions.

Efficient Inference of Convolutional Neural Networks on General Purpose Hardware Using Weight Repetition PDF

Author: Rohit Agrawal
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description

Early Soft Error Reliability Assessment of Convolutional Neural Networks Executing on Resource-Constrained IoT Edge Devices PDF

Author: Geancarlo Abich
Publisher: Springer Nature
ISBN: 3031185994
Category : Technology & Engineering
Languages : en
Pages : 143

Book Description
This book describes an extensive and consistent soft error assessment of convolutional neural network (CNN) models from different domains through more than 14.8 million fault injections, considering different precision bit-width configurations, optimization parameters, and processor models. The authors also evaluate the relative performance, memory utilization, and soft error reliability trade-offs analysis of different CNN models considering a compiler-based technique w.r.t. traditional redundancy approaches.

Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing PDF

Author: Sudeep Pasricha
Publisher: Springer Nature
ISBN: 3031399323
Category : Technology & Engineering
Languages : en
Pages : 481

Book Description
This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative application domains, exploring the efficient hardware design of efficient machine learning accelerators, memory optimization techniques, illustrating model compression and neural architecture search techniques for energy-efficient and fast execution on resource-constrained hardware platforms, and understanding hardware-software codesign techniques for achieving even greater energy, reliability, and performance benefits. Discusses efficient implementation of machine learning in embedded, CPS, IoT, and edge computing; Offers comprehensive coverage of hardware design, software design, and hardware/software co-design and co-optimization; Describes real applications to demonstrate how embedded, CPS, IoT, and edge applications benefit from machine learning.

Author: Sudeep Pasricha
Publisher: Springer Nature
ISBN: 303140677X
Category : Technology & Engineering
Languages : en
Pages : 571

Book Description
This book presents recent advances towards the goal of enabling efficient implementation of machine learning models on resource-constrained systems, covering different application domains. The focus is on presenting interesting and new use cases of applying machine learning to innovative application domains, exploring the efficient hardware design of efficient machine learning accelerators, memory optimization techniques, illustrating model compression and neural architecture search techniques for energy-efficient and fast execution on resource-constrained hardware platforms, and understanding hardware-software codesign techniques for achieving even greater energy, reliability, and performance benefits. Discusses efficient implementation of machine learning in embedded, CPS, IoT, and edge computing; Offers comprehensive coverage of hardware design, software design, and hardware/software co-design and co-optimization; Describes real applications to demonstrate how embedded, CPS, IoT, and edge applications benefit from machine learning.