Synopses & Reviews
This book is intended for researchers who want to keep abreast of current developments in corpus-based natural language processing. It captures the essence of a series of highly successful workshops organized over the last few years. The papers cover a range of current research topics in this field including part-of-speech tagging, word sense disambiguation, parsing on real-life texts, working with parallel corpora and improved techniques for document processing.
Synopsis
ABOUT THIS BOOK This book is intended for researchers who want to keep abreast of cur- rent developments in corpus-based natural language processing. It is not meant as an introduction to this field; for readers who need one, several entry-level texts are available, including those of (Church and Mercer, 1993; Charniak, 1993; Jelinek, 1997). This book captures the essence of a series of highly successful work- shops held in the last few years. The response in 1993 to the initial Workshop on Very Large Corpora (Columbus, Ohio) was so enthusias- tic that we were encouraged to make it an annual event. The following year, we staged the Second Workshop on Very Large Corpora in Ky- oto. As a way of managing these annual workshops, we then decided to register a special interest group called SIGDAT with the Association for Computational Linguistics. The demand for international forums on corpus-based NLP has been expanding so rapidly that in 1995 SIGDAT was led to organize not only the Third Workshop on Very Large Corpora (Cambridge, Mass. ) but also a complementary workshop entitled From Texts to Tags (Dublin). Obviously, the success of these workshops was in some measure a re- flection of the growing popularity of corpus-based methods in the NLP community. But first and foremost, it was due to the fact that the work- shops attracted so many high-quality papers.
Table of Contents
Introduction. Implementation and Evaluation of a German HMM for POS Disambiguation; H. Feldweg. Improvements in Part-of-Speech Tagging with an Application To German; H. Schmid. Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging; E. Brill, M. Pop. Tagging French without Lexical Probabilities - Combining Linguistic Knowledge and Statistical Learning; E. Tzoukermann, et al. Example-Based Sense Tagging of Running Chinese Text; X. Tong, et al. Disambiguating Noun Groupings with Respect to WordNet Senses; P. Resnik. A Comparison of Corpus-based Techniques for Restoring Accents in Spanish and French Text; D. Yarowsky. Beyond Word N-Grams; F. Pereira, et al. Statistical Augmentation of a Chinese Machine-Readable Dictionary; P. Fung, D. Wu. Text Chunking Using Transformation-based Learning; L. Ramshaw, M.P. Marcus. Prepositional Phrase Attachment through a Backed-off Model; M. Collins, J. Brooks. On the Unsupervised Induction of Phrase-Structure Grammars; C. de Marcken. Robust Bilingual Word Alignment for Machine Aided Translation; I. Dagan, et al. Iterative Alignment of Syntactic Structures for a Bilingual Corpus; R. Grishman. Trainable Coarse Bilingual Grammars for Parallel Text Bracketing; D. Wu. Comparative Discourse Analysis of Parallel Texts; P. van der Eijk. Comparing the Retrieval Performance of English and Japanese Text Databases; H. Fujii, W.B. Croft. Inverse Document Frequency (IDF): A Measure of Deviations from Poisson; K. Church, W. Gale. List of Authors. Subject Index.