CS674: Natural Language Processing
Spring 2002
Reaction Essay Readings

http://courses.cs.cornell.edu/cs674/2002sp/readings.html

The primary readings in this list were chosen with a variety of criteria in mind, including impact, accessibility to the non-expert, recency, brevity, and on-line availability. Also, since these papers are meant to serve as subjects for reaction essays, "provocativeness" was an additional consideration. It was not always easy to balance all these conditions, and in some cases potentially unorthodox choices were made. Hence, this collection should not be regarded as a compilation of major papers, but rather as a set of starting points for becoming acquainted with some issues and ideas in natural language processing.

In this vein, the related references are meant simply to give some idea of the range of work on the same topic. I've omitted papers covered in lecture.

Instructions: Reaction essays are due on the following Mondays: February 4, 11, 18, and 25, and March 4. The intent is threefold: to acquaint you with some important recent papers and subjects in the field, to help you find a project topic, and to train you in how to read a research paper both quickly and effectively. Of course, it is expected that you will look at papers that aren't on the list as well!

Your reaction essay is just that: a short (one or two pages) critical reading of one of the primary papers (in red and marked with a filled-in bullet) from the list below. Briefly (1-2 paragraphs) describe the problem attacked and the solution proposed, but do not merely summarize the paper. Then, address such questions as, is the evaluation fair and informative? Are the underlying assumptions valid? When are the proposed methods applicable? On the other hand, don't spend an inordinate amount of time on these essays -- they are meant to be brief, and will be graded on a check-plus/check/check-minus scale.

A note about proper citation: it is not acceptable to copy out sentences or phrases from papers without citation. If you have not placed quotation marks around a passage, then you are claiming that these words are your own. Violating the Code of Academic Integrity may result and has resulted in failing the course. When in doubt, credit your source!

Topic index:

Parsing
TAG's
Semantics
Discourse
Language modeling
Clustering
Machine translation
Learning

Parsing

Eugene Charniak, 1997. Statistical Parsing with a Context-free Grammar and Word Statistics

Proceedings of the Fourteenth National Conference on Artificial Intelligence

Abstract: We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance.
Related references:

Michael Collins, 1997. Three generative, lexicalised models for statistical parsing

35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of the Conference

Joshua Goodman, 1996. Parsing algorithms and metrics.

Proceedings of the 34th ACL

Eugene Charniak, 1999. A Maximum-Entropy-Inspired Parser.

Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics

Technical Report CS99-12

Adwait Ratnaparkhi, 1999. Learning to Parse Natural Language with Maximum Entropy Models.

Machine Learning

Michael Collins, 2000. Discriminative Reranking for Natural Language Parsing. ICML 2000.
Mark Johnson, 2001. Joint and conditional estimation of tagging and parsing models.

Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics/10th Conference of the European Chapter

TAG's (back to topic listing)

David Chiang, 2000. Statistical parsing with an automatically-extracted tree adjoining grammar

Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

Abstract: We discuss the advantages of lexicalized tree-adjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement over the EM-based method of (Hwa, 1998), and that the induced model yields results comparable to lexicalized PCFG.
Chiang's home page
Related references:

Yves Schabes and Richard C. Waters, 1995. Tree Insertion Grammar: A Cubic-Time Parsable Formalism That Lexicalizes Context-Free Grammar without Changing the Trees Produced.

Computational Linguistics

Note

Rebecca Hwa, 1998. An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars.

Proceedings of ACL/COLING 1998

Fei Xia, Martha Palmer, and Aravind Joshi, 2000. A Uniform Method of Grammar Extraction and Its Applications.

Proceedings of Empirical Methods in Natural Language Processing

Owen Rambow, K. Vijay-Shanker, and David Weir, 2001. D-Tree Substitution Grammars.

Computational Linguistics

The U. Penn XTAG system

Semantics (back to topic listing)

Jong C. Park, 1995. Quantifier Scope and Constituency

ACL '95

Abstract: Traditional approaches to quantifier scope typically need stipulation to exclude readings that are unavailable to human understanders. This paper shows that quantifier scope phenomena can be precisely characterized by a semantic representation constrained by surface constituency, if the distinction between referential and quantificational NPs is properly observed. A CCG implementation is described and compared to other approaches.
Park's home page
Related references:

For background: Mark Steedman. A Very Short Introduction to CCG (draft).
Stuart Shieber, Fernando Pereira, and Mary Dalrymple, 1996. Interactions of Scope and Ellipsis.

Linguistics and Philosophy

Alistair Willis and Suresh Manandhar, 1999. Two accounts of scope availability and semantic underspecification

Proceedings of the 37th annual meeting of the Association for Computational Linguistics

Mark Steedman, 1999. Alternating Quantifier Scope in CCG.

Proceedings of 37th Annual Meeting of the Association for Computational Linguistics

here

Jerry R. Hobbs and Andrew Kehler, 1997. A theory of parallelism and the case of VP ellipsis

35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of the Conference

Abstract: We provide a general account of parallelism in discourse, and apply it to the special case of resolving possible readings for instances of VP ellipsis. We show how several problematic examples are accounted for in a natural and straightforward fashion. The generality of the approach makes it directly applicable to a variety of other types of ellipsis and reference.
Author home pages: Hobbs home page, Kehler home page
Related references:

Mary Dalrymple, Stuart M. Shieber, and Fernando C. N. Pereira, 1991. Ellipsis and Higher-Order Unification.

Linguistics and Philosophy

Daniel Hardt, 1997. An Empirical Approach to VP Ellipsis

Computational Linguistics

Claire Gardent, 1999. Unifying parallels.

Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics,

Owen Rambow and Daniel Hardt, 2001. Generation of VP Ellipsis: a Corpus-based Approach .

Proceedings of ACL/EACL 2001

Discourse (back to topic listing)

Marilyn Walker, 1996. Limited Attention and Discourse Structure

Computational Linguistics

Abstract: This squib examines the role of limited attention in a theory of discourse structure and proposes a model of attentional state that relates current hierarchical theories of discourse structure to empirical evidence about human discourse processing capabilities. First, I present examples that are not predicted by Grosz and Sidner's stack model of attentional state. Then I consider an alternative model of attentional state, the cache model, which accounts for the examples, and which makes particular processing predictions. Finally I suggest a number of ways that future research could distinguish the predictions of the cache model and the stack model.
Walker home page
Related references:

David R. Traum and James F. Allen, 1994. Discourse Obligations in Dialogue Processing

Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics

Barbara J. Grosz and Peter C. Gordon, 1999. Conceptions of Limited Attention and Discourse Focus.

Computational Linguistics

Michael Strube, 1998. Never Look Back: An Alternative to Centering

Proceedings of COLING-ACL '98

Abstract: I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the centering model. The ranking criteria for the S-list are based on the distinction between hearer-old and hearer-new discourse entities and incorporate preferences for inter- and intra-sentential anaphora. The model is the basis for an algorithm which operates incrementally, word by word.
Strube homepage
Related references:

Barbara Grosz and Candace Sidner. 1998. Lost Intuitions and Forgotten Intentions. In Centering in Discourse, Marilyn Walker, Aravind Joshi and Ellen Prince, eds., pp. 39-51.
Marilyn A. Walker, 1998. Centering, anaphora resolution, and discourse structure.

Centering in Discourse

Eleni Miltsakaki and Karen Kukich, 2000. The Role of Centering Theory's Rough-Shift in the Teaching and Evaluation of Writing Skills..

Proceedings of the 38th ACL

Language Modeling (back to topic listing)

Ciprian Chelba and Frederick Jelinek, 1998. Exploiting syntactic structure for language modeling

COLING-ACL '98 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics: Proceedings of the Conference

Abstract: The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation and operates in a left-to-right manner - therefore usable for automatic speech recognition. The model, its probabilistic parameterization, and a set of experiments meant to evaluate its predictive power are presented; an improvement over standard trigram modeling is achieved.
Related references:

Stanley F. Chen and Joshua Goodman, 1996. An empirical study of smoothing techniques for language modeling

Proceedings of the 34th Meeting of the Association for Computational Linguistics

TR-10-98

Ronald Rosenfeld, 2000. Two decades of Statistical Language Modeling: Where Do We Go From Here?.

Proceedings of the IEEE

Ciprian Chelba and Milind Mahajan, 2001. Information Extraction Using the Structured Language Model.

Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

Clustering (back to topic listing)

Lillian Lee and Fernando Pereira, 1999. Distributional similarity models: Clustering vs. Nearest Neighbors

Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics

Abstract: Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distributional models of unseen events.
Related references:

Hinrich Schütze, 1992. Dimensions of Meaning.

Proceedings of Supercomputing

Here

Fernando Pereira, Naftali Tishby, and Lillian Lee, 1993. Distributional Clustering of English Words

Proceedings of the 31st ACL

Thomas Hofmann and Jan Puzicha, 1998. Statistical Models for Co-occurrence Data .

Dekang Lin, 1998. Automatic Retrieval and Clustering of Similar Words.

COLING-ACL98

Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil. Inducing a Semantically Annotated Lexicon via EM-Based Clustering

Proceedings of ACL '99

Naftali Z. Tishby, Fernando Pereira, and William Bialek, 1999. The Information Bottleneck Method.

Proceedings of the 37th Allerton Conference on Communication, Control and Computing

Machine Translation (back to topic listing)

David Yarowsky and Grace Ngai, 2001. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora .

Proceedings of NAACL 2001

Related references:

Dekai Wu, 1995. An algorithm for simultaneously bracketing parallel texts by aligning words.

Proceedings of ACL 1995

David Yarowsky, Grace Ngai, and Richard Wicentowski, 2001. Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora.

Proceedings of HLT 2001, First International Conference on Human Language Technology Research

Kenji Yamada and Kevin Knight, 2001. A Syntax-Based Statistical Translation Model.

Proceedings of ACL 2001

Related references:

Kevin Knight and Daniel Marcu, 2000. Statistics-Based Summarization -- Step One: Sentence Compression.

Proceedings of the American Association for Artificial Intelligence conference

Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada, 2001. Fast Decoding and Optimal Decoding for Machine Translation.

Proceedings of ACL 2001

For further reading (but not a reaction essay): Bonnie Dorr, Pamela Jordan, and J. Benoit, 1999, A Survey of Current Paradigms in Machine Translation

Advances in Computers

Learning (back to topic listing)

David Yarowsky, 1995. Unsupervised word sense disambiguation rivaling supervised methods

Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics

Abstract: This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have one sense per discourse and one sense per collocation -- exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.
Related references:

David Yarowsky, 1994. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French.

Proceedings, ACL-94

Avrim Blum and Tom Mitchell, 1998. Combining Labeled and Unlabeled Data with Co-Training.

Proceedings of the 11th Annual Conference on Computational Learning Theory

Note

Rosie Jones, Andrew McCallum, Kamal Nigam and Ellen Riloff, 1999. Bootstrapping for Text Learning Tasks.

IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications

Dan Roth, 1998. Learning to Resolve Natural Language Ambiguities: A Unified Approach

Proceedings of AAAI '98

Abstract: We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be re-cast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem.

We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes.

In particular, we present an extensive experimental comparison of our approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging. In all cases we show that our approach either outperforms other methods tried for these tasks or performs comparably to the best.

Related references:

Eric Brill, 1995. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging

Computational Linguistics

Dan Roth, 1999. Learning in Natural Language.

Proceedings of IJCAI '99

Steven Abney and Marc Light, 1999.Hiding a Semantic Class Hierarchy in a Markov Model

Proceedings of the ACL '99 Workshop on Unsupervised Learning in Natural Language Processing

Abstract: We introduce a new model of selectional preference induction. Unlike previous approaches, we provide a stochastic generation model for the words that appear as arguments of a predicate. More specifically, we define a hiddne Markov model with the general shape of a given semantic class hierarchy. This model has a number of attractive features, among them that selectional preference can be seen as distributions over words. Initial results are promising. However, unsupervised parameter estimation has proven problematic. A central problem is word sense ambiguity in the training corpora. We describe attempts to modify the forward-backward algorithm, an EM algorithm, to handle such disambiguation. Although these attempts were unsuccessful at improving performance, we believe they give insight into the nature of the bottlenecks and into the behavior of the EM algorithm.
Related references:

Longer version available here.
Philip Resnik, 1997. Selectional preference and sense disambiguation.

ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?

Stephen Clark and David Weir, 1999. An iterative approach to estimating frequencies over a semantic hierarchy.

Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

Massimiliano Ciaramita and Mark Johnson, 2000. Explaining away ambiguity: Learning verb selectional preference with Bayesian networks.

Proceedings of the 18th International Conference on Computational Linguistics

Michele Banko and Eric Brill, 2001.Scaling to Very Very Large Corpora for Natural Language Disambiguation

Proceedings of ACL 2001

Abstract: The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost.
Related references:

Andrew R. Golding and Dan Roth, 1999. A Winnow-Based Approach to Context Sensitive Spelling Correction.

Machine Learning

Eric Brill and Grace Ngai, 1999. Man [and Woman] vs. Machine: A Case Study in Base Noun Phrase Learning

Proceedings of ACL'99

Ulrich Germann, 2001. Building a Statistical Machine Translation System from Scratch: How Much Bang Can We Expect for the Buck.

Proceedings of the ACL 2001 Data-Driven MT Workshop

Back to home page
CS674, Spring '02
Lillian Lee

CS674: Natural Language Processing Spring 2002 Reaction Essay Readings

http://courses.cs.cornell.edu/cs674/2002sp/readings.html

Topic index:

Parsing

TAG's (back to topic listing)

Semantics (back to topic listing)

Discourse (back to topic listing)

Language Modeling (back to topic listing)

Clustering (back to topic listing)

Machine Translation (back to topic listing)

Learning (back to topic listing)

CS674: Natural Language Processing
Spring 2002
Reaction Essay Readings