This manual investigation involved the study of thousands of these query. Pdf word sense disambiguation for information retrieval. Word sense disambiguation and information retrieval in proceedings of the 17th international acm sigir, pp 49 57, dublin, ie, 1994. In natural language processing, word sense disambiguation wsd is an open challenge which improves the performance of the applications such as machine translation and information retrieval system. Word sense disambiguation wsd is the process of identifying the meanings of words in context. Pdf word sense ambiguity is recognized as having a detrimental effect on the. One of the major applications of word sense disambiguation wsd is information retrieval ir. Introduction information retrieval 1 is a process of retrieving the relevant documents from the document database when the user enters his query in the search engine. Facing current challenges david martinez iraolak eneko agirre bengoaren zuzendaritzapean egindako tesiaren txostena, euskal herriko unibertsitatean informatikan doktore titulua eskuratzeko aurkeztua donostia, 2004ko urria.
In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space. It has often been thought that word sense ambiguity is a cause of poor performance in information retrieval ir systems. Challenges and practical approaches with word sense. Natural languages processing, word sense disambiguation 1. This is particularly due to the senseval evaluation exercises which created standard data sets for the task. This article begins with discussing the origins of the problem in the earliest machine translation systems. The ambiguity problem appears in all of these tasks. Word sense disambiguation and information retrieval mark sanderson department of computing science, university of glasgow, glasgow g12 8qq united kingdom email. Overall, the author concludes that keyword in context kwic collocations still offer a commonsense solution to accurate word disambiguation.
The second chapter describes some earlier approaches to word sense disambiguation and. Word sense disambiguation wsd is a subfield within computational linguistics, which is also referred to as natural language processing nlp, where computer systems are designed to identify the correct meaning or sense of a word in a given context. Information retrieval database with wordnet word sense disambiguation. Word sense disambiguation in biomedical applications. Introduction in all the major languages around the world, there are a lot of words which denote meanings in different contexts. Retrieval, word sense disambiguation, wordnet, owa operator. Word sense disambiguation, yarowsky algorithm, information retrieval, natural language processing, quran 1. This research work deals with natural language processing nlp and extraction of essential information in an explicit form. Many verbal languages will have many ambiguous words. An application of word sense disambiguation to information. An application of word sense disambiguation to information retrieval jason m. Graphbased word sense disambiguation in telugu language. Word sense disambiguation 2 wsd is the solution to the problem.
The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the authors intentions are shown. Introduction languages have several kinds of ambiguity where many words can be comprehended in various aspects based on certain contexts 1. Previous works tries to do word sense disambiguation, the process of assign a sense to a word inside a specific context, creating algorithms under a supervised or unsupervised. Dr systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. The difficulty of this problem stems from the subtlety of word sense differences and the need for some level of understanding. Word sense disambiguation wsd,the tagging of words in context with labels indicating the sense in which the words are used,has become an increasingly popular area of computational linguistics research.
Ontologybased word sense disambiguation for scienti c. In proceedings of the 5th international workshop on semantic evaluation, pages 387391, uppsala, sweden. Word sense disambiguation is a task of finding the correct sense of the words and automatically assigning its correct sense to the words which are polysemous in a particu. Word sense disambiguation and information retrieval citeseerx.
The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications. For this reason, we propose in this paper a semisupervised method for word sense disambiguation wsd for the scienti c literature domain. Automatic as opposed to manual and information as opposed to data or fact. Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. This is the first book to cover the entire topic of word sense disambiguation wsd including. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. For instance, it is frequently the case that a gene, a protein encoded by the. Word sense disambiguation in information retrieval. The task we address is the disambiguation of scienti c terms and acronyms used in scienti c abstracts. Foundations of statistical natural language processing.
Before choosing the word sense disambiguation algorithm to be used in the indices, i ran a simple benchmark of several disambiguation algorithms using the perl benchmark module. A survey 30 november 2000 by ed greengrass abstract information retrieval ir is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query or topic statement, which may itself be unstructured, e. In in proceedings of ranlp05, borovets, pages 525531, 2005. Its application lies in many different areas including sentiment analysis, information retrieval ir, machine translation and knowledge graph.
Word sense disambiguation in information retrieval article pdf available in intelligent information management 102. Word sense disambiguation and information retrieval white rose. Most approaches to word sense disambiguation or to. The word sense disambiguation process consists of assigning to each given word in a context, one definition or meaning predefine sense or not, that is distinguishable. A breakthrough in this field would have a significant impact on many relevant webbased applications, such as web information retrieval, improved access to web services, information extraction, etc. Acronym and abbreviation sense resolution is considered a special case of word sense disambiguation wsd 9,10,11. The author and publisher of this book have used their best efforts in preparing this book. The solution to this problem impacts other computerrelated writing, such as discourse, improving relevance of search engines, anaphora resolution. The main approaches to tackle the problem were dictionarybased, connectionist, and statistical strategies. Information retrieval is a wide, often looselydefined term but in these pages i shall be concerned only with automatic information retrieval systems. Word sense disambiguation roberto navigli and paola velardi abstractword sense disambiguation wsd is traditionally considered an aihard problem. Word sense disambiguation in information retrieval revisited. For example, the word back in back home and my back has. Note that in his book van rijsbergen betrays his preference for distance.
If one examines the words in a book, one at a time through an opaque mask. Pdf it has often been thought that word sense ambiguity is a cause of poor. Word sense disambiguation and information retrieval. While interpreting the specific meaning of acronyms and abbreviations within a sentence is often easy for a human reader, this process is nontrivial for a machine 10,11. Aslam,advisor abstract the problems of word sense disambiguation and document indexing for information retrieval have been extensively studied. Word sense disambiguation and information retrieval springerlink. Word sense disambiguation 15 is a technique to find the exact sense of an ambiguous word in a particular context. While wsd, in general, has a number of important applications in various fields of artificial intelligence information retrieval, text processing, machine. Introduction eneko agirre, philip edmonds download the pdf of chapter 1 contents. It is mainly developed for the purpose of word sense disambiguation in indian languages. On the importance of word sense disambiguation for information retrieval. The belief is that if ambiguous words can be correctly disambiguated, ir. This algorithm is to be used in a crosslanguage information retrieval system, cindor, which indexes queries and documents in a languageneutral concept representation based on wordnet synsets.
To disambiguate word senses a word is tagged with one of its senses, for instance a wordnet synset, induced by the context of the occurrence of the word, cf. Word sense ambiguity is recognized as having a detrimental effect on the precision of information retrieval systems in general and web search. Word sense disambiguation for crosslanguage information. The belief is that if ambiguous words can be correctly disambiguated, ir performance will increase. Wordsense disambiguation wsd is the process of identifying the meanings of words in context. A word sense disambiguation algorithm for information. This process is experimental and the keywords may be updated as the learning algorithm improves. This chapter describes the main approaches to the problem, methods for evaluating performance, and potential applications.
In computational linguistics, wordsense disambiguation wsd is an open problem of natural language processing, which governs the process of identifying which sense of a word i. Proceedings of the lrec 2002 workshop on creating and using semantics for information retrieval and filtering, third international conference on language resources and evaluation, las palmas, canary islands, spain, june. Work on word sense disambiguation continued throughout the next two decades in the framework of aibased natural language understanding research, as well as in the fields of content analysis, stylistic and literary analysis, and information retrieval. Pdf word sense disambiguation and information retrieval. New evaluation methods for word sense disambiguation. Word sense disambiguation is the process of removing and resolving the ambiguity between words. Ambiguity is a common phenomenon in text, especially in the biomedical domain. We have developed a word sense disambiguation algorithm, following cheng and wilensky 1997, to disambiguate among wordnet synsets. These efforts include the development, research, and testing of the theories. Retrieving with good sense in information retrieval, vol.
In proceedings of the 26th annual international acm sigir conference on research and. However, it can be used for various other natural language processing nlp applications like machine translation, information retrieval, sentiment analysis, text entailment, etc. Early attempts to solve the wsd problem suffered from a lack of coverage. Word sense disambiguation book bibliography of wsd. It has been observed that indexing using disambiguated mean. This is the companion website for the following book. Our approach is based on the use of both contextual information from. Information retrieval database with wordnet word sense. Unfortunately the word information can be very misleading. Information retrieval natural language processing ambiguous word sense score word sense disambiguation these keywords were added by machine and not by the authors. Pdf word sense disambiguation in information retrieval revisited. Pdf word sense disambiguation in information retrieval. The most common among the information management strategies is document retrieval dr and information filtering.
813 1084 1395 473 1051 903 1185 70 1425 1045 1329 88 464 738 276 702 15 89 1045 1423 1430 740 1012 50 86 1563 54 1454 176 1414 1058 1274 267 1141 354 560 655 102 602 665 102 1109 1350 73 582