Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80
Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
On the Efficient Determination of Most Near Neighbors
Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80
Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80
Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
On The Efficient Determination of Most Near Neighbors
Author: Mark Manasse
Publisher: Springer Nature
ISBN: 3031022815
Category : Computers
Languages : en
Pages : 80
Book Description
The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages -- and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
Publisher: Springer Nature
ISBN: 3031022815
Category : Computers
Languages : en
Pages : 80
Book Description
The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages -- and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
Explaining the Success of Nearest Neighbor Methods in Prediction
Author: George H. Chen
Publisher: Foundations and Trends (R) in Machine Learning
ISBN: 9781680834543
Category :
Languages : en
Pages : 264
Book Description
Explains the success of Nearest Neighbor Methods in Prediction, both in theory and in practice.
Publisher: Foundations and Trends (R) in Machine Learning
ISBN: 9781680834543
Category :
Languages : en
Pages : 264
Book Description
Explains the success of Nearest Neighbor Methods in Prediction, both in theory and in practice.
Transforming Technologies to Manage Our Information
Author: William Jones
Publisher: Springer Nature
ISBN: 3031023293
Category : Computers
Languages : en
Pages : 155
Book Description
With its theme, "Our Information, Always and Forever," Part I of this book covers the basics of personal information management (PIM) including six essential activities of PIM and six (different) ways in which information can be personal to us. Part I then goes on to explore key issues that arise in the "great migration" of our information onto the Web and into a myriad of mobile devices. Part 2 provides a more focused look at technologies for managing information that promise to profoundly alter our practices of PIM and, through these practices, the way we lead our lives. Part 2 is in five chapters: - Chapter 5. Technologies of Input and Output. Technologies in support of gesture, touch, voice, and even eye movements combine to support a more natural user interface (NUI). Technologies of output include glasses and "watch" watches. Output will also increasingly be animated with options to "zoom". - Chapter 6. Technologies to Save Our Information. We can opt for "life logs" to record our experiences with increasing fidelity. What will we use these logs for? And what isn’t recorded that should be? - Chapter 7. Technologies to Search Our Information. The potential for personalized search is enormous and mostly yet to be realized. Persistent searches, situated in our information landscape, will allow us to maintain a diversity of projects and areas of interest without a need to continually switch from one to another to handle incoming information. - Chapter 8. Technologies to Structure Our Information. Structure is key if we are to keep, find, and make effective use of our information. But how best to structure? And how best to share structured information between the applications we use, with other people, and also with ourselves over time? What lessons can we draw from the failures and successes in web-based efforts to share structure? - Chapter 9. PIM Transformed and Transforming: Stories from the Past, Present and Future. Part 2 concludes with a comparison between Licklider’s world of information in 1957 and our own world of information today. And then we consider what the world of information is likely to look like in 2057. Licklider estimated that he spent 85% of his "thinking time" in activities that were clerical and mechanical and might (someday) be delegated to the computer. What percentage of our own time is spent with the clerical and mechanical? What about in 2057?
Publisher: Springer Nature
ISBN: 3031023293
Category : Computers
Languages : en
Pages : 155
Book Description
With its theme, "Our Information, Always and Forever," Part I of this book covers the basics of personal information management (PIM) including six essential activities of PIM and six (different) ways in which information can be personal to us. Part I then goes on to explore key issues that arise in the "great migration" of our information onto the Web and into a myriad of mobile devices. Part 2 provides a more focused look at technologies for managing information that promise to profoundly alter our practices of PIM and, through these practices, the way we lead our lives. Part 2 is in five chapters: - Chapter 5. Technologies of Input and Output. Technologies in support of gesture, touch, voice, and even eye movements combine to support a more natural user interface (NUI). Technologies of output include glasses and "watch" watches. Output will also increasingly be animated with options to "zoom". - Chapter 6. Technologies to Save Our Information. We can opt for "life logs" to record our experiences with increasing fidelity. What will we use these logs for? And what isn’t recorded that should be? - Chapter 7. Technologies to Search Our Information. The potential for personalized search is enormous and mostly yet to be realized. Persistent searches, situated in our information landscape, will allow us to maintain a diversity of projects and areas of interest without a need to continually switch from one to another to handle incoming information. - Chapter 8. Technologies to Structure Our Information. Structure is key if we are to keep, find, and make effective use of our information. But how best to structure? And how best to share structured information between the applications we use, with other people, and also with ourselves over time? What lessons can we draw from the failures and successes in web-based efforts to share structure? - Chapter 9. PIM Transformed and Transforming: Stories from the Past, Present and Future. Part 2 concludes with a comparison between Licklider’s world of information in 1957 and our own world of information today. And then we consider what the world of information is likely to look like in 2057. Licklider estimated that he spent 85% of his "thinking time" in activities that were clerical and mechanical and might (someday) be delegated to the computer. What percentage of our own time is spent with the clerical and mechanical? What about in 2057?
Digital Libraries for Cultural Heritage
Author: Tatjana Aparac-Jelušić
Publisher: Springer Nature
ISBN: 3031023102
Category : Computers
Languages : en
Pages : 175
Book Description
European digital libraries have existed in diverse forms and with quite different functions, priorities, and aims. However, there are some common features of European-based initiatives that are relevant to non-European communities. There are now many more challenges and changes than ever before, and the development rate of new digital libraries is ever accelerating. Delivering educational, cultural, and research resources-especially from major scientific and cultural organizations-has become a core mission of these organizations. Using these resources they will be able to investigate, educate, and elucidate, in order to promote and disseminate and to preserve civilization. Extremely important in conceptualizing the digital environment priorities in Europe was its cultural heritage and the feeling that these rich resources should be open to Europe and the global community. In this book we focus on European digitized heritage and digital culture, and its potential in the digital age. We specifically look at the EU and its approaches to digitization and digital culture, problems detected, and achievements reached, all with an emphasis on digital cultural heritage. We seek to report on important documents that were prepared on digitization; copyright and related documents; research and education in the digital libraries field under the auspices of the EU; some other European and national initiatives; and funded projects. The aim of this book is to discuss the development of digital libraries in the European context by presenting, primarily to non-European communities interested in digital libraries, the phenomena, initiatives, and developments that dominated in Europe. We describe the main projects and their outcomes, and shine a light on the number of challenges that have been inspiring new approaches, cooperative efforts, and the use of research methodology at different stages of the digital libraries development. The specific goals are reflected in the structure of the book, which can be conceived as a guide to several main topics and sub-topics. However, the author’s scope is far from being comprehensive, since the field of digital libraries is very complex and digital libraries for cultural heritage is even moreso.
Publisher: Springer Nature
ISBN: 3031023102
Category : Computers
Languages : en
Pages : 175
Book Description
European digital libraries have existed in diverse forms and with quite different functions, priorities, and aims. However, there are some common features of European-based initiatives that are relevant to non-European communities. There are now many more challenges and changes than ever before, and the development rate of new digital libraries is ever accelerating. Delivering educational, cultural, and research resources-especially from major scientific and cultural organizations-has become a core mission of these organizations. Using these resources they will be able to investigate, educate, and elucidate, in order to promote and disseminate and to preserve civilization. Extremely important in conceptualizing the digital environment priorities in Europe was its cultural heritage and the feeling that these rich resources should be open to Europe and the global community. In this book we focus on European digitized heritage and digital culture, and its potential in the digital age. We specifically look at the EU and its approaches to digitization and digital culture, problems detected, and achievements reached, all with an emphasis on digital cultural heritage. We seek to report on important documents that were prepared on digitization; copyright and related documents; research and education in the digital libraries field under the auspices of the EU; some other European and national initiatives; and funded projects. The aim of this book is to discuss the development of digital libraries in the European context by presenting, primarily to non-European communities interested in digital libraries, the phenomena, initiatives, and developments that dominated in Europe. We describe the main projects and their outcomes, and shine a light on the number of challenges that have been inspiring new approaches, cooperative efforts, and the use of research methodology at different stages of the digital libraries development. The specific goals are reflected in the structure of the book, which can be conceived as a guide to several main topics and sub-topics. However, the author’s scope is far from being comprehensive, since the field of digital libraries is very complex and digital libraries for cultural heritage is even moreso.
Social Monitoring for Public Health
Author: Michael J. Paul
Publisher: Morgan & Claypool Publishers
ISBN: 1681736101
Category : Computers
Languages : en
Pages : 188
Book Description
Public health thrives on high-quality evidence, yet acquiring meaningful data on a population remains a central challenge of public health research and practice. Social monitoring, the analysis of social media and other user-generated web data, has brought advances in the way we leverage population data to understand health. Social media offers advantages over traditional data sources, including real-time data availability, ease of access, and reduced cost. Social media allows us to ask, and answer, questions we never thought possible. This book presents an overview of the progress on uses of social monitoring to study public health over the past decade. We explain available data sources, common methods, and survey research on social monitoring in a wide range of public health areas. Our examples come from topics such as disease surveillance, behavioral medicine, and mental health, among others. We explore the limitations and concerns of these methods. Our survey of this exciting new field of data-driven research lays out future research directions.
Publisher: Morgan & Claypool Publishers
ISBN: 1681736101
Category : Computers
Languages : en
Pages : 188
Book Description
Public health thrives on high-quality evidence, yet acquiring meaningful data on a population remains a central challenge of public health research and practice. Social monitoring, the analysis of social media and other user-generated web data, has brought advances in the way we leverage population data to understand health. Social media offers advantages over traditional data sources, including real-time data availability, ease of access, and reduced cost. Social media allows us to ask, and answer, questions we never thought possible. This book presents an overview of the progress on uses of social monitoring to study public health over the past decade. We explain available data sources, common methods, and survey research on social monitoring in a wide range of public health areas. Our examples come from topics such as disease surveillance, behavioral medicine, and mental health, among others. We explore the limitations and concerns of these methods. Our survey of this exciting new field of data-driven research lays out future research directions.
Task Intelligence for Search and Recommendation
Author: Chirag Shah
Publisher: Springer Nature
ISBN: 3031023269
Category : Computers
Languages : en
Pages : 140
Book Description
While great strides have been made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information and enabling task completion. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, and it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has unlocked new modalities for interacting with information, but these agents will need to be able to work understanding current and future contexts and assist users at task level. This book will focus on task intelligence in the context of search and recommendation. Chapter 1 introduces readers to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). This is followed by presenting several prominent ideas and frameworks about how tasks are conceptualized and represented in Chapter 2. In Chapter 3, the narrative moves to showing how task type relates to user behaviors and search intentions. A task can be explicitly expressed in some cases, such as in a to-do application, but often it is unexpressed. Chapter 4 covers these two scenarios with several related works and case studies. Chapter 5 shows how task knowledge and task models can contribute to addressing emerging retrieval and recommendation problems. Chapter 6 covers evaluation methodologies and metrics for task-based systems, with relevant case studies to demonstrate their uses. Finally, the book concludes in Chapter 7, with ideas for future directions in this important research area.
Publisher: Springer Nature
ISBN: 3031023269
Category : Computers
Languages : en
Pages : 140
Book Description
While great strides have been made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information and enabling task completion. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, and it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has unlocked new modalities for interacting with information, but these agents will need to be able to work understanding current and future contexts and assist users at task level. This book will focus on task intelligence in the context of search and recommendation. Chapter 1 introduces readers to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). This is followed by presenting several prominent ideas and frameworks about how tasks are conceptualized and represented in Chapter 2. In Chapter 3, the narrative moves to showing how task type relates to user behaviors and search intentions. A task can be explicitly expressed in some cases, such as in a to-do application, but often it is unexpressed. Chapter 4 covers these two scenarios with several related works and case studies. Chapter 5 shows how task knowledge and task models can contribute to addressing emerging retrieval and recommendation problems. Chapter 6 covers evaluation methodologies and metrics for task-based systems, with relevant case studies to demonstrate their uses. Finally, the book concludes in Chapter 7, with ideas for future directions in this important research area.
Images in Social Media
Author: Susanne Ørnager
Publisher: Springer Nature
ISBN: 3031023145
Category : Computers
Languages : en
Pages : 101
Book Description
This book focuses on the methodologies, organization, and communication of digital image collection research that utilizes social media content. ("Image" is here understood as a cultural, conventional, and commercial—stock photo—representation.) The lecture offers expert views that provide different interpretations of images and their potential implementations. Linguistic and semiotic methodologies as well as eye-tracking research are employed to both analyze images and comprehend how humans consider them, including which salient features generally attract viewers' attention. This literature review covers image—specifically photographic—research since 2005, when major social media platforms emerged. A citation analysis includes an overview of co-citation maps that demonstrate the nexus of image research literature and the journals in which they appear. Eye tracking tests whether scholarly templates focus on the proper features of an image, such as people, objects, time, etc., and if a prescribed theme affects the eye movements of the observer. The results may point to renewed requirements for building image search engines. As it stands, image management already requires new algorithms and a new understanding that involves text recognition and very large database processing. The aim of this book is to present different image research areas and demonstrate the challenges image research faces. The book's scope is, by necessity, far from comprehensive, since the field of digital image research does not cover fake news, image manipulation, mobile photos, etc.; these issues are very complex and need a publication of their own. This book should primarily be useful for students in library and information science, psychology, and computer science.
Publisher: Springer Nature
ISBN: 3031023145
Category : Computers
Languages : en
Pages : 101
Book Description
This book focuses on the methodologies, organization, and communication of digital image collection research that utilizes social media content. ("Image" is here understood as a cultural, conventional, and commercial—stock photo—representation.) The lecture offers expert views that provide different interpretations of images and their potential implementations. Linguistic and semiotic methodologies as well as eye-tracking research are employed to both analyze images and comprehend how humans consider them, including which salient features generally attract viewers' attention. This literature review covers image—specifically photographic—research since 2005, when major social media platforms emerged. A citation analysis includes an overview of co-citation maps that demonstrate the nexus of image research literature and the journals in which they appear. Eye tracking tests whether scholarly templates focus on the proper features of an image, such as people, objects, time, etc., and if a prescribed theme affects the eye movements of the observer. The results may point to renewed requirements for building image search engines. As it stands, image management already requires new algorithms and a new understanding that involves text recognition and very large database processing. The aim of this book is to present different image research areas and demonstrate the challenges image research faces. The book's scope is, by necessity, far from comprehensive, since the field of digital image research does not cover fake news, image manipulation, mobile photos, etc.; these issues are very complex and need a publication of their own. This book should primarily be useful for students in library and information science, psychology, and computer science.
Automatic Disambiguation of Author Names in Bibliographic Repositories
Author: Anderson A. Ferreira
Publisher: Springer Nature
ISBN: 3031023226
Category : Computers
Languages : en
Pages : 126
Book Description
This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.
Publisher: Springer Nature
ISBN: 3031023226
Category : Computers
Languages : en
Pages : 126
Book Description
This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.
Information Retrieval Models
Author: Thomas Roelleke
Publisher: Springer Nature
ISBN: 3031023285
Category : Computers
Languages : en
Pages : 141
Book Description
Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR). Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works." This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models. A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters. Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Foundations of IR Models / Relationships Between IR Models / Summary & Research Outlook / Bibliography / Author's Biography / Index
Publisher: Springer Nature
ISBN: 3031023285
Category : Computers
Languages : en
Pages : 141
Book Description
Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR). Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works." This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models. A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters. Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Foundations of IR Models / Relationships Between IR Models / Summary & Research Outlook / Bibliography / Author's Biography / Index