Integrated Feature Subset Selection/extraction with Applications in Bioinformatics

Integrated Feature Subset Selection/extraction with Applications in Bioinformatics PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 209

Book Description
Feature subset selection and extraction algorithms are actively and extensively studied in machine learning literature to reduce the dimensionality of feature space, since high dimensional data sets are generally not efficiently and effectively handled by a large array of machine learning and pattern recognition algorithms. When we stride into the analysis of large scale bioinformatics data sets, such as microarray gene expression data sets, the high dimensionality of feature space compounded with the low dimensionality of sample space, creates even more problems for data analysis algorithms. Two foremost characteristics of microarray gene expression data sets are: (1) the correlation between features (genes) and (2) the availability of domain knowledge in computable format. In this dissertation, we will study effective feature selection and extraction algorithms with applications to the analysis of the new emerging data sets in the bioinformatics domain. Microarray gene expression data set, the result of large scale RNA profiling techniques, is our primary focus in this thesis. Several novel feature (gene) selection and extraction algorithms are proposed to deal with peculiarities on microarray gene expression data set. To address the first characteristic of the microarray gene expression data set, we first propose a general feature selection algorithm called Boost Feature Subset Selection (BFSS) based on permutation analysis to broaden the scope of selected gene set and thus improve classification performance. In BFSS, subsequent features to be selected focus on those samples where previously selected features fail. Our experiments showed the benefit of BFSS for t-score and S2N (signal to noise) based single gene scores on a variety of publicly available microarray gene expression data sets. We then examine the correlations among features (genes) explicitly to see if such correlations are informative for the purpose of sample classification. This results in our gene extraction algorithm called virtual gene. A virtual gene is a group of genes whose expression levels are combined linearly. The combined expression levels of a virtual gene instead of the real gene expression levels are used for sample classification. Our experiments confirm that by taking into consideration the correlations between gene pairs, we could indeed build a better sample classifier. Microarray gene expression data set only represents one aspect of our knowledge of the underlying biological system. Currently there are lots of biological knowledge in computable format that can be accessed from Internet. Continue to address the second characteristic of the microarray gene expression data set, we investigate the integration of domain knowledge, such as those imbedded in gene ontology annotations, for the use of gene selection and extraction. (Abstract shortened by UMI.).

Feature Extraction

Feature Extraction PDF Author: Isabelle Guyon
Publisher: Springer
ISBN: 3540354883
Category : Computers
Languages : en
Pages : 765

Book Description
This book is both a reference for engineers and scientists and a teaching resource, featuring tutorial chapters and research papers on feature extraction. Until now there has been insufficient consideration of feature selection algorithms, no unified presentation of leading methods, and no systematic comparisons.

Feature Extraction, Construction and Selection

Feature Extraction, Construction and Selection PDF Author: Huan Liu
Publisher: Springer Science & Business Media
ISBN: 1461557259
Category : Computers
Languages : en
Pages : 418

Book Description
There is broad interest in feature extraction, construction, and selection among practitioners from statistics, pattern recognition, and data mining to machine learning. Data preprocessing is an essential step in the knowledge discovery process for real-world applications. This book compiles contributions from many leading and active researchers in this growing field and paints a picture of the state-of-art techniques that can boost the capabilities of many existing data mining tools. The objective of this collection is to increase the awareness of the data mining community about the research of feature extraction, construction and selection, which are currently conducted mainly in isolation. This book is part of our endeavor to produce a contemporary overview of modern solutions, to create synergy among these seemingly different branches, and to pave the way for developing meta-systems and novel approaches. Even with today's advanced computer technologies, discovering knowledge from data can still be fiendishly hard due to the characteristics of the computer generated data. Feature extraction, construction and selection are a set of techniques that transform and simplify data so as to make data mining tasks easier. Feature construction and selection can be viewed as two sides of the representation problem.

Feature Selection for High-Dimensional Data

Feature Selection for High-Dimensional Data PDF Author: Verónica Bolón-Canedo
Publisher: Springer
ISBN: 3319218581
Category : Computers
Languages : en
Pages : 163

Book Description
This book offers a coherent and comprehensive approach to feature subset selection in the scope of classification problems, explaining the foundations, real application problems and the challenges of feature selection for high-dimensional data. The authors first focus on the analysis and synthesis of feature selection algorithms, presenting a comprehensive review of basic concepts and experimental results of the most well-known algorithms. They then address different real scenarios with high-dimensional data, showing the use of feature selection algorithms in different contexts with different requirements and information: microarray data, intrusion detection, tear film lipid layer classification and cost-based features. The book then delves into the scenario of big dimension, paying attention to important problems under high-dimensional spaces, such as scalability, distributed processing and real-time processing, scenarios that open up new and interesting challenges for researchers. The book is useful for practitioners, researchers and graduate students in the areas of machine learning and data mining.

Data Mining for Bioinformatics

Data Mining for Bioinformatics PDF Author: Sumeet Dua
Publisher: CRC Press
ISBN: 1466588667
Category : Computers
Languages : en
Pages : 351

Book Description
Covering theory, algorithms, and methodologies, as well as data mining technologies, Data Mining for Bioinformatics provides a comprehensive discussion of data-intensive computations used in data mining with applications in bioinformatics. It supplies a broad, yet in-depth, overview of the application domains of data mining for bioinformatics to he

Unsupervised Feature Extraction Applied to Bioinformatics

Unsupervised Feature Extraction Applied to Bioinformatics PDF Author: Y-h. Taguchi
Publisher: Springer Nature
ISBN: 3030224562
Category : Technology & Engineering
Languages : en
Pages : 321

Book Description
This book proposes applications of tensor decomposition to unsupervised feature extraction and feature selection. The author posits that although supervised methods including deep learning have become popular, unsupervised methods have their own advantages. He argues that this is the case because unsupervised methods are easy to learn since tensor decomposition is a conventional linear methodology. This book starts from very basic linear algebra and reaches the cutting edge methodologies applied to difficult situations when there are many features (variables) while only small number of samples are available. The author includes advanced descriptions about tensor decomposition including Tucker decomposition using high order singular value decomposition as well as higher order orthogonal iteration, and train tenor decomposition. The author concludes by showing unsupervised methods and their application to a wide range of topics. Allows readers to analyze data sets with small samples and many features; Provides a fast algorithm, based upon linear algebra, to analyze big data; Includes several applications to multi-view data analyses, with a focus on bioinformatics.

Guide to Convolutional Neural Networks

Guide to Convolutional Neural Networks PDF Author: Hamed Habibi Aghdam
Publisher: Springer
ISBN: 3319575503
Category : Computers
Languages : en
Pages : 303

Book Description
This must-read text/reference introduces the fundamental concepts of convolutional neural networks (ConvNets), offering practical guidance on using libraries to implement ConvNets in applications of traffic sign detection and classification. The work presents techniques for optimizing the computational efficiency of ConvNets, as well as visualization techniques to better understand the underlying processes. The proposed models are also thoroughly evaluated from different perspectives, using exploratory and quantitative analysis. Topics and features: explains the fundamental concepts behind training linear classifiers and feature learning; discusses the wide range of loss functions for training binary and multi-class classifiers; illustrates how to derive ConvNets from fully connected neural networks, and reviews different techniques for evaluating neural networks; presents a practical library for implementing ConvNets, explaining how to use a Python interface for the library to create and assess neural networks; describes two real-world examples of the detection and classification of traffic signs using deep learning methods; examines a range of varied techniques for visualizing neural networks, using a Python interface; provides self-study exercises at the end of each chapter, in addition to a helpful glossary, with relevant Python scripts supplied at an associated website. This self-contained guide will benefit those who seek to both understand the theory behind deep learning, and to gain hands-on experience in implementing ConvNets in practice. As no prior background knowledge in the field is required to follow the material, the book is ideal for all students of computer vision and machine learning, and will also be of great interest to practitioners working on autonomous cars and advanced driver assistance systems.

Feature Selection and Ensemble Methods for Bioinformatics

Feature Selection and Ensemble Methods for Bioinformatics PDF Author: Oleg Okun
Publisher: IGI Global
ISBN: 9781609605575
Category : Computers
Languages : en
Pages : 0

Book Description
"This book offers a unique perspective on machine learning aspects of microarray gene expression based cancer classification, combining computer science, and biology"--Provided by publisher.

Data Mining for Bioinformatics Applications

Data Mining for Bioinformatics Applications PDF Author: He Zengyou
Publisher: Woodhead Publishing
ISBN: 008100107X
Category : Computers
Languages : en
Pages : 100

Book Description
Data Mining for Bioinformatics Applications provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems, including problem definition, data collection, data preprocessing, modeling, and validation. The text uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems, containing 45 bioinformatics problems that have been investigated in recent research. For each example, the entire data mining process is described, ranging from data preprocessing to modeling and result validation. Provides valuable information on the data mining methods have been widely used for solving real bioinformatics problems Uses an example-based method to illustrate how to apply data mining techniques to solve real bioinformatics problems Contains 45 bioinformatics problems that have been investigated in recent research

Advanced AI Techniques and Applications in Bioinformatics

Advanced AI Techniques and Applications in Bioinformatics PDF Author: Loveleen Gaur
Publisher: CRC Press
ISBN: 100046301X
Category : Technology & Engineering
Languages : en
Pages : 220

Book Description
The advanced AI techniques are essential for resolving various problematic aspects emerging in the field of bioinformatics. This book covers the recent approaches in artificial intelligence and machine learning methods and their applications in Genome and Gene editing, cancer drug discovery classification, and the protein folding algorithms among others. Deep learning, which is widely used in image processing, is also applicable in bioinformatics as one of the most popular artificial intelligence approaches. The wide range of applications discussed in this book are an indispensable resource for computer scientists, engineers, biologists, mathematicians, physicians, and medical informaticists. Features: Focusses on the cross-disciplinary relation between computer science and biology and the role of machine learning methods in resolving complex problems in bioinformatics Provides a comprehensive and balanced blend of topics and applications using various advanced algorithms Presents cutting-edge research methodologies in the area of AI methods when applied to bioinformatics and innovative solutions Discusses the AI/ML techniques, their use, and their potential for use in common and future bioinformatics applications Includes recent achievements in AI and bioinformatics contributed by a global team of researchers