Improving text classification with Boolean retrieval for rare categories

Improving text classification with Boolean retrieval for rare categories PDF Author: Robert F. Chew
Publisher: RTI Press
ISBN:
Category : Self-Help
Languages : en
Pages : 18

Book Description
Advancements in machine learning and natural language processing have made text classification increasingly attractive for information retrieval. However, developing text classifiers is challenging when no prior labeled data are available for a rare category of interest. Finding instances of the rare class using a uniform random sample can be inefficient and costly due to the rare category’s low base rate. This work presents an approach that combines the strengths of text classification and Boolean retrieval to help learn rare concepts of interest. As a motivating example, we use the task of finding conversations that reference firearm injury or violence in the Crisis Text Line database. Identifying rare categories, like firearm injury or violence, can improve crisis lines' abilities to support people with firearm-related crises or provide appropriate resources. Our approach outperforms a set of iteratively refined Boolean queries and results in a recall of 0.91 on a test set generated from a process independent of our study. Our results suggest that text classification with Boolean retrieval initialization can be effective for finding rare categories of interest and improve on the precision of using Boolean retrieval alone.

Introduction to Information Retrieval

Introduction to Information Retrieval PDF Author: Christopher D. Manning
Publisher: Cambridge University Press
ISBN: 1139472100
Category : Computers
Languages : en
Pages :

Book Description
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Applied Text Analysis with Python

Applied Text Analysis with Python PDF Author: Benjamin Bengfort
Publisher: "O'Reilly Media, Inc."
ISBN: 1491962992
Category : Computers
Languages : en
Pages : 328

Book Description
From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning. You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems. Preprocess and vectorize text into high-dimensional feature representations Perform document classification and topic modeling Steer the model selection process with visual diagnostics Extract key phrases, named entities, and graph structures to reason about data in text Build a dialog framework to enable chatbots and language-driven interaction Use Spark to scale processing power and neural networks to scale model complexity

Information Retrieval

Information Retrieval PDF Author: Stefan Buttcher
Publisher: MIT Press
ISBN: 0262528878
Category : Computers
Languages : en
Pages : 633

Book Description
An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation. Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

KI 2006

KI 2006 PDF Author: Christian Freksa
Publisher: Springer
ISBN: 3540699120
Category : Computers
Languages : en
Pages : 464

Book Description
This book constitutes the thoroughly refereed post-proceedings of the 29th Annual German Conference on Artificial Intelligence, KI 2006, held in Bremen, Germany, in June 2006. This was co-located with RoboCup 2006, the innovative robot soccer world championship, and with ACTUATOR 2006, the 10th International Conference on New Actuators. The 29 revised full papers presented together with two invited contributions were carefully reviewed and selected from 112 submissions.

Managing Social and Economic Change with Information Technology

Managing Social and Economic Change with Information Technology PDF Author: Information Resources Management Association. International Conference
Publisher: IGI Global
ISBN: 9781878289261
Category : Business & Economics
Languages : en
Pages : 564

Book Description
Many experts believe that through the utilization of information technology, organizations can better manage social and economic change. This book investigates the challenges involved in the use of information technologies in managing these changes.

Information Retrieval with Verbose Queries

Information Retrieval with Verbose Queries PDF Author: Manish Gupta
Publisher:
ISBN: 9781680830446
Category : Computers
Languages : en
Pages : 170

Book Description
The first monograph to provide a coherent and organized survey on this topic. It puts together the various research pieces of the puzzle, provides a comprehensive and structured overview of diverse proposed methods, and lists several application scenarios where effective verbose query processing can make a significant difference.

Statistical Language Models for Information Retrieval

Statistical Language Models for Information Retrieval PDF Author: ChengXiang Zhai
Publisher: Morgan & Claypool Publishers
ISBN: 159829590X
Category : Computers
Languages : en
Pages : 142

Book Description
As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central research problem in information retrieval for several decades. In the past ten years, a new generation of retrieval models, often referred to as statistical language models, has been successfully applied to solve many different information retrieval problems. Compared with the traditional models such as the vector space model, these new models have a more sound statistical foundation and can leverage statistical estimation to optimize retrieval parameters. They can also be more easily adapted to model non-traditional and complex retrieval problems. Empirically, they tend to achieve comparable or better performance than a traditional model with less effort on parameter tuning. This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. All the relevant literature has been synthesized to make it easy for a reader to digest the research progress achieved so far and see the frontier of research in this area. The book also offers practitioners an informative introduction to a set of practically useful language models that can effectively solve a variety of retrieval problems. No prior knowledge about information retrieval is required, but some basic knowledge about probability and statistics would be useful for fully digesting all the details. Table of Contents: Introduction / Overview of Information Retrieval Models / Simple Query Likelihood Retrieval Model / Complex Query Likelihood Model / Probabilistic Distance Retrieval Model / Language Models for Special Retrieval Tasks / Language Models for Latent Topic Analysis / Conclusions

First Text Retrieval Conference (TREC-1)

First Text Retrieval Conference (TREC-1) PDF Author: D. K. Harman
Publisher: DIANE Publishing
ISBN: 0788125214
Category :
Languages : en
Pages : 527

Book Description
Held in Gaithersburg, MD, Nov. 4-6, 1992. Evaluates new technologies in information retrieval. Numerous graphs, tables and charts.

Anaphora Resolution and Text Retrieval

Anaphora Resolution and Text Retrieval PDF Author: Helene Schmolz
Publisher: Walter de Gruyter GmbH & Co KG
ISBN: 3110416816
Category : Language Arts & Disciplines
Languages : de
Pages : 265

Book Description
This book covers anaphora resolution for the English language from a linguistic and computational point of view. First, a definition of anaphors that applies to linguistics as well as information technology is given. On this foundation, all types of anaphors and their characteristics for English are outlined. To examine how frequent each type of anaphor is, a corpus of different hypertexts has been established and analysed with regard to anaphors. The most frequent type are non-finite clause anaphors - a type which has never been investigated so far. Therefore, the potential of non-finite clause anaphors are further explored with respect to anaphora resolution. After presenting the fundamentals of computational anaphora resolution and its application in text retrieval, rules for resolving non-finite clause anaphors are established. Therefore, this book shows that a truly interdisciplinary approach can achieve results which would not have been possible otherwise. Open Access: In July 2019, this volume was retroactively turned into an Open Access publication thanks to the support of the Fachinformationsdienst Linguistik. https://www.linguistik.de/