Deep Structured Models for Large Scale Object Co-detection and Segmentation PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Deep Structured Models for Large Scale Object Co-detection and Segmentation PDF full book. Access full book title Deep Structured Models for Large Scale Object Co-detection and Segmentation by Zeeshan Hayder. Download full books in PDF and EPUB format.

Deep Structured Models for Large Scale Object Co-detection and Segmentation

Deep Structured Models for Large Scale Object Co-detection and Segmentation PDF Author: Zeeshan Hayder
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Book Description
Structured decisions are often required for a large variety of image and scene understanding tasks in computer vision, with few of them being object detection, localization, semantic segmentation and many more. Structured prediction deals with learning inherent structure by incorporating contextual information from several images and multiple tasks. However, it is very challenging when dealing with large scale image datasets where performance is limited by high computational costs and expressive power of the underlying representation learning techniques. In this thesis, we present efficient and effective deep structured models for context-aware object detection, co-localization and instance-level semantic segmentation. First, we introduce a principled formulation for object co-detection using a fully-connected conditional random field (CRF). We build an explicit graph whose vertices represent object candidates (instead of pixel values) and edges encode the object similarity via simple, yet effective pairwise potentials. More specifically, we design a weighted mixture of Gaussian kernels for class-specific object similarity, and formulate kernel weights estimation as a least-squares regression problem. Its solution can therefore be obtained in closed-form. Furthermore, in contrast with traditional co-detection approaches, it has been shown that inference in such fully-connected CRFs can be performed efficiently using an approximate mean-field method with high-dimensional Gaussian filtering. This lets us effectively leverage information in multiple images. Next, we extend our class-specific co-detection framework to multiple object categories. We model object candidates with rich, high-dimensional features learned using a deep convolutional neural network. In particular, our max-margin and directloss structural boosting algorithms enable us to learn the most suitable features that best encode pairwise similarity relationships within our CRF framework. Furthermore, it guarantees that the time and space complexity is O(n t) where n is the total number of candidate boxes in the pool and t the number of mean-field iterations. Moreover, our experiments evidence the importance of learning rich similarity measures to account for the contextual relations across object classes and instances. However, all these methods are based on precomputed object candidates (or proposals), thus localization performance is limited by the quality of bounding-boxes. To address this, we present an efficient object proposal co-generation technique that leverages the collective power of multiple images. In particular, we design a deep neural network layer that takes unary and pairwise features as input, builds a fully-connected CRF and produces mean-field marginals as output. It also lets us backpropagate the gradient through entire network by unrolling the iterations of CRF inference. Furthermore, this layer simplifies the end-to-end learning, thus effectively benefiting from multiple candidates to co-generate high-quality object proposals. Finally, we develop a multi-task strategy to jointly learn object detection, localization and instance-level semantic segmentation in a single network. In particular, we introduce a novel representation based on the distance transform of the object masks. To this end, we design a new residual-deconvolution architecture that infers such a representation and decodes it into the final binary object mask. We show that the predicted masks can go beyond the scope of the bounding boxes and that the multiple tasks can benefit from each other. In summary, in this thesis, we exploit the joint power of multiple images as well as multiple tasks to improve generalization performance of structured learning. Our novel deep structured models, similarity learning techniques and residual-deconvolution architecture can be used to make accurate and reliable inference for key vision tasks. Furthermore, our quantitative and qualitative experiments on large scale challenging image datasets demonstrate the superiority of the proposed approaches over the state-of-the-art methods.