CS674: Natural Language Processing
Spring 2000
Reaction Essay Readings

http://courses.cs.cornell.edu/cs674/2000sp/readings.html

The primary readings (in red and marked with a bullet) in this list were chosen with a variety of criteria in mind, including impact, accessibility to the non-expert, recency, brevity, and on-line availability. Also, since these papers are meant to serve as subjects for reaction essays, "provocativeness" was an additional consideration. It was not always easy to balance all these conditions, and in some cases potentially unorthodox choices were made. Hence, this collection should not be regarded as a compilation of major papers, but rather as a set of starting points for becoming acquainted with some issues and ideas in natural language processing.

In this vein, the related references are meant simply to give some idea of the range of work on the same topic. I've omitted papers covered in lecture.

Instructions: Reaction essays are due every Monday, starting February 7, for the first half of the semester (see syllabus for exact dates). The intent is both to acquaint you with some important recent papers in the field and to help you find a project topic. Of course, it is expected that you will look at papers that aren't on the list as well!

Your reaction essay is just that: a short (one or two pages) critical reading of one of the primary papers (bulleted and in red) from the list below. Briefly (1-2 paragraphs) describe the problem attacked and the solution proposed, but do not merely summarize the paper. Then, address such questions as, is the evaluation fair and informative? Are the underlying assumptions valid? When are the proposed methods applicable? On the other hand, don't spend an inordinate amount of time on these essays -- they are meant to be brief, and will be graded on a check-plus/check/check-minus scale.

Topic index:

Parsing
TAG's
Finite state methods
Scope and ellipsis
Discourse
Word sense disambiguation
Language modeling
Clustering
Machine translation
Learning

Parsing

Eugene Charniak, 1997. Statistical Parsing with a Context-free Grammar and Word Statistics

Proceedings of the Fourteenth National Conference on Artificial Intelligence

Abstract: We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance.
Related references:

David M. Magerman, 1995. Statistical decision-tree models for parsing

33rd Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference

Michael Collins, 1996. A New Statistical Parser Based on Bigram Lexical Dependencies

Proceedings of the 34th Annual Meeting of the ACL

Michael Collins, 1997. Three generative, lexicalised models for statistical parsing

35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of the Conference

Eugene Charniak, 1999. A Maximum-Entropy-Inspired Parser

John Henderson and Eric Brill, 1999.Exploiting Diversity for Natural Language Processing: Combining Parsers.

Fourth Conference on Empirical Methods in Natural Language Processing/Very Large Corpora.

TAG's (back to topic listing)

Stuart Shieber and Yves Schabes, 1990. Synchronous tree-adjoining grammars

Proceedings of the 13th International Conference on Computational Linguistics

Abstract: The unique properties of tree-adjoining grammars (TAG) present a challenge for the application of TAGs beyond the limited confines of syntax, for instance, to the task of semantic interpretation or automatic translation of natural language. We present a variant of TAGs, called synchronous TAGs, which characterize correspondences between languages. The formalism's intended usage is to relate expressions of natural languages to their associated semantics represented in a logical form language, or to their translates in another natural language; in summary, we intend it to allow TAGs to be used beyond their role in syntax proper. We discuss the application of synchronous TAGs to concrete examples, mentioning primarily in passing some computational issues that arise in its interpretation.
Related references:

Stuart Shieber, 1994. Restricting the Weak-Generative Capacity of Synchronous Tree-Adjoining Grammars

Computational Intelligence

Mark Dras, 1999. A Meta-Level Grammar: Redefining Synchronous TAG for Translation and Paraphrase

Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics

The U. Penn XTAG system synchronous TAG implementation notes

Finite State Methods (back to topic listing)

(To be read together)

Mehryar Mohri, Fernando Pereira and Michael Riley, 1996

Weighted automata in text and speech processing

András Kornai, 1996

Comments on Mohri, Pereira and Riley

Workshop on Extended Finite State Models of Language

Abstract: Finite-state automata are a very effective tool in natural language processing. However, in a variety of applications and especially in speech precessing, it is necessary to consider more general machines in which arcs are assigned weights or costs. We briefly describe some of the main theoretical and algorithmic aspects of these machines. In particular, we de scribe an efficient composition algorithm for weighted transducers, and give examples illustrating the value of determinization and minimization algorithms for weighted automata.

Scope and Ellipsis (back to topic listing)

Jong C. Park, 1995. Quantifier Scope and Constituency

ACL '95

Abstract: Traditional approaches to quantifier scope typically need stipulation to exclude readings that are unavailable to human understanders. This paper shows that quantifier scope phenomena can be precisely characterized by a semantic representation constrained by surface constituency, if the distinction between referential and quantificational NPs is properly observed. A CCG implementation is described and compared to other approaches.
Related references:

For background: Mark Steedman. A Very Short Introduction to CCG (draft)
Stuart Shieber, Fernando Pereira, and Mary Dalrymple, 1996. Interactions of Scope and Ellipsis

Linguistics and Philosophy

Alistair Willis and Suresh Manandhar, 1999. Two accounts of scope availability and semantic underspecification

Proceedings of the 37th annual meeting of the Association for Computational Linguistics

Mark Steedman, 1999. Alternating Quantifier Scope in CCG

Proceedings of 37th Annual Meeting of the Association for Computational Linguistics

Jerry R. Hobbs and Andrew Kehler, 1997. A theory of parallelism and the case of VP ellipsis

35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of the Conference

Here

Abstract: We provide a general account of parallelism in discourse, and apply it to the special case of resolving possible readings for instances of VP ellipsis. We show how several problematic examples are accounted for in a natural and straightforward fashion. The generality of the approach makes it directly applicable to a variety of other types of ellipsis and reference.
Related references:

Mary Dalrymple, Stuart M. Shieber, and Fernando C. N. Pereira, 1991. Ellipsis and Higher-Order Unification.

Linguistics and Philosophy

Daniel Hardt, 1997. An Empirical Approach to VP Ellipsis

Computational Linguistics

Claire Gardent, 1999. Unifying parallels

Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics,

Discourse (back to topic listing)

David R. Traum and James F. Allen, 1994. Discourse Obligations in Dialogue Processing

Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics

Abstract: We show that in modeling social interaction, particularly dialogue, the attitude of obligation can be a useful adjunct to the popularly considered attitudes of belief, goal, and intention and their mutual and shared counterparts. In particular, we show how discourse obligations can be used to account in a natural manner for the connection between a question and its answer in dialogue and how obligations can be used along with other parts of the discourse context to extend the coverage of a dialogue system.
Related references:

Harry Bunt, 1995. Dynamic Interpretation and Dialogue Theory.

The Structure of Multimodal Dialogue

Marilyn Walker, 1996. Limited Attention and Discourse Structure

Computational Linguistics

Abstract: This squib examines the role of limited attention in a theory of discourse structure and proposes a model of attentional state that relates current hierarchical theories of discourse structure to empirical evidence about human discourse processing capabilities. First, I present examples that are not predicted by Grosz and Sidner's stack model of attentional state. Then I consider an alternative model of attentional state, the cache model, which accounts for the examples, and which makes particular processing predictions. Finally I suggest a number of ways that future research could distinguish the predictions of the cache model and the stack model.
Related references:

Barbara J. Grosz and Peter C. Gordon, 1999. Conceptions of Limited Attention and Discourse Focus.

Computational Linguistics

Michael Strube, 1998. Never Look Back: An Alternative to Centering

Proceedings of COLING-ACL '98

Abstract: I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the centering model. The ranking criteria for the S-list are based on the distinction between hearer-old and hearer-new discourse entities and incorporate preferences for inter- and intra-sentential anaphora. The model is the basis for an algorithm which operates incrementally, word by word.

Word Sense Disambiguation (back to topic listing)

Phillip Resnik, 1995.Disambiguating Noun Groupings with Respect to WordNet Senses

Proceedings of the 3rd Workshop on Very Large Corpora

Abstract: Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word senses, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one finds in on-line thesauri, or as the output of distributional clustering algorithms. Disambiguation is performed with respect to WordNet senses, which are fairly fine-grained; however, the method also permits the assignment of higher-level WordNet categories rather than sense labels. The method is illustrated primarily by example, though results of a more rigorous evaluation are also presented.
Related references:

Michael Sussna, 1993. Word sense disambiguation for free-text indexing using a massive semantic network

Proceedings of the second international conference on Information and knowledge management

Steven Abney and Marc Light, 1999. Hiding a Semantic Class Hierarchy in a Markov Model

Proceedings of the ACL '99 Workshop on Unsupervised Learning in Natural Language Processing

David Yarowsky, 1995. Unsupervised word sense disambiguation rivaling supervised methods

Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics

Abstract: This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have one sense per discourse and one sense per collocation -- exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.
Related references:

David Yarowsky, 1994. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French.

Proceedings, ACL-94

Language Modeling (back to topic listing)

Ciprian Chelba and Frederick Jelinek, 1998. Exploiting syntactic structure for language modeling

COLING-ACL '98 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics: Proceedings of the Conference

Abstract: The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation and operates in a left-to-right manner - therefore usable for automatic speech recognition. The model, its probabilistic parameterization, and a set of experiments meant to evaluate its predictive power are presented; an improvement over standard trigram modeling is achieved.
Related references:

Stanley F. Chen and Joshua Goodman, 1996. An empirical study of smoothing techniques for language modeling

Proceedings of the 34th Meeting of the Association for Computational Linguistics

TR-10-98

Clustering (back to topic listing)

Lillian Lee and Fernando Pereira, 1999. Distributional similarity models: Clustering vs. Nearest Neighbors

Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics

Abstract: Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distributional models of unseen events.
Related references:

Hinrich Schütze, 1992. Dimensions of Meaning.

Proceedings of Supercomputing

Here

Fernando Pereira, Naftali Tishby, and Lillian Lee, 1993. Distributional Clustering of English Words

Proceedings of the 31st ACL

Here

Dekang Lin, 1998. Automatic Retrieval and Clustering of Similar Words.

COLING-ACL98

Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil. Inducing a Semantically Annotated Lexicon via EM-Based Clustering
Proceedings of ACL '99

Machine Translation (back to topic listing)

David M. Zajic and Keith J. Miller, 1998. Where Interlingua Can Make a Difference

Proceedings of the Second AMTA SIG-IL Workshop on Interlinguas

Here

Abstract: Furnished with an English text and its equivalent in 13 foreign languages, we set out to determine the potential for improving the quality of translation results by an Interlingual (IL) approach to machine translation (MT) as compared to transfer-based systems. In this paper, we analyze the errors made by two commercial transfer-based MT systems, provide an observational classification of the errors, and group the errors according to whether or not an Interlingual approach would improve system output. We describe an existing IL, Composed Lexical Conceptual Structures, and illustrate with examples how some of the observed errors might be corrected using this IL representation.

Michael Dorna and Martin Emele, 1996. Semantic-based Transfer

Proceedings of the 16th International Conference on Computational Linguistics (COLING-96)

Abstract:

For further reading (but not a reaction essay): Bonnie Dorr, Pamela Jordan, and J. Benoit, 1999, A Survey of Current Paradigms in Machine Translation

Advances in Computers

Learning (back to topic listing)

Eric Brill, 1993. Automatic grammar induction and parsing free text: A transformation-based approach

31st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference

Abstract: In this paper we describe a new technique for parsing free text: a transformational grammar is automatically learned that is capable of accurately parsing text into binary-branching syntactic trees with nonterminals unlabelled. The algorithm works by beginning in a very naive state of knowledge about phrase structure. By repeatedly comparing the results of bracketing in the current state to proper bracketing provided in the training corpus, the system learns a set of simple structural transformations that can be applied to reduce error. After describing the algorithm, we present results and compare these results to other recent results in automatic grammar induction.
Related references:

Eric Brill, 1997. Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging .

Natural Language Processing Using Very Large Corpora

Dan Roth, 1998. Learning to Resolve Natural Language Ambiguities: A Unified Approach

Proceedings of AAAI '98

Abstract: We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be re-cast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem.
We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes.
In particular, we present an extensive experimental comparison of our approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging. In all cases we show that our approach either outperforms other methods tried for these tasks or performs comparably to the best.
Related references:

Dan Roth, 1999. Learning in Natural Language.

Proceedings of IJCAI '99

Jakub Zavrel and Walter Daelemans, 1997. Memory-Based Learning: Using Similarity for Smoothing Proc. of the 35th Annual Meeting of the ACL,

Eric Brill and Grace Ngai, 1999. Man [and Woman] vs. Machine: A Case Study in Base Noun Phrase Learning
Proceedings of ACL'99

Abstract: A great deal of work has been done demonstrating the ability of machine learning algorithms to automatically extract linguistic knowledge from annotated corpora. Very little work has gone into quantifying the difference in ability at this task between a person and a machine. This paper is a first step in that direction.

Back to home page
CS674, Spring '00
Lillian Lee

CS674: Natural Language Processing Spring 2000 Reaction Essay Readings

http://courses.cs.cornell.edu/cs674/2000sp/readings.html

Topic index:

Parsing

TAG's (back to topic listing)

Finite State Methods (back to topic listing)

Scope and Ellipsis (back to topic listing)

Discourse (back to topic listing)

Word Sense Disambiguation (back to topic listing)

Language Modeling (back to topic listing)

Clustering (back to topic listing)

Machine Translation (back to topic listing)

Learning (back to topic listing)

CS674: Natural Language Processing
Spring 2000
Reaction Essay Readings