CS674: Natural Language Processing
Spring 2000
Reaction Essay Readings
http://courses.cs.cornell.edu/cs674/2000sp/readings.html
The primary readings (in red and marked with a bullet) in this list
were chosen with a variety of criteria in mind, including impact, accessibility
to the non-expert, recency, brevity, and on-line availability. Also, since
these papers are meant to serve as subjects for reaction essays,
"provocativeness" was an additional consideration. It was not always easy
to balance all these conditions, and in some cases potentially unorthodox
choices were made. Hence, this collection should not be regarded as a compilation
of major papers, but rather as a set of starting points for becoming acquainted
with some issues and ideas in natural language processing.
In this vein, the related references are meant simply to give some idea
of the range of work on the same topic. I've omitted papers covered in
lecture.
Instructions: Reaction essays are due every Monday, starting
February 7, for the first half of the semester (see syllabus for exact
dates). The intent is both to acquaint you with some important recent papers
in the field and to help you find a project topic. Of course, it is expected
that you will look at papers that aren't on the list as well!
Your reaction essay is just that: a short (one or two pages) critical
reading of one of the primary papers (bulleted and in red) from
the list below. Briefly (1-2 paragraphs) describe the problem attacked
and the solution proposed, but do not merely summarize the paper. Then,
address such questions as, is the evaluation fair and informative? Are
the underlying assumptions valid? When are the proposed methods applicable?
On the other hand, don't spend an inordinate amount of time on these essays
-- they are meant to be brief, and will be graded on a check-plus/check/check-minus
scale.
Topic index:
Parsing
TAG's
Finite state methods
Scope and ellipsis
Discourse
Word sense disambiguation
Language modeling
Clustering
Machine translation
Learning
Parsing
-
Eugene Charniak, 1997. Statistical
Parsing with a Context-free Grammar and Word Statistics
Proceedings of the Fourteenth National Conference on Artificial
Intelligence (AAAI-97), pp 598-603.
-
Abstract: We describe a parsing system based upon a language model for
English that is, in turn, based upon assigning probabilities to possible
parses for a sentence. This model is used in a parsing system by finding
the parse for the sentence with the highest probability. This system outperforms
previous schemes. As this is the third in a series of parsers by different
authors that are similar enough to invite detailed comparisons but different
enough to give rise to different levels of performance, we also report
on some experiments designed to identify what aspects of these systems
best explain their relative performance.
-
Related references:
-
David M. Magerman, 1995. Statistical
decision-tree models for parsing
33rd Annual Meeting of the Association for Computational Linguistics:
Proceedings of the Conference, pp. 276-283.
-
Michael Collins, 1996. A
New Statistical Parser Based on Bigram Lexical Dependencies
Proceedings of the 34th Annual Meeting of the ACL
-
Michael Collins, 1997. Three
generative, lexicalised models for statistical parsing
35th Annual Meeting of the Association for Computational Linguistics
and 8th Conference of the European Chapter of the Association for Computational
Linguistics: Proceedings of the Conference (ACL/EACL '97), pp 16-23.
-
Eugene Charniak, 1999. A
Maximum-Entropy-Inspired Parser
Technical Report CS99-12, Department of Computer Science, Brown University
-
John Henderson and Eric Brill, 1999.Exploiting
Diversity for Natural Language Processing: Combining Parsers.
Fourth Conference on Empirical Methods in Natural Language Processing/Very
Large Corpora.
-
Stuart Shieber and Yves Schabes, 1990.
Synchronous
tree-adjoining grammars
Proceedings of the 13th International Conference on Computational
Linguistics, volume 3, pp 1-6.
-
Abstract: The unique properties of tree-adjoining grammars (TAG) present
a challenge for the application of TAGs beyond the limited confines of
syntax, for instance, to the task of semantic interpretation or automatic
translation of natural language. We present a variant of TAGs, called synchronous
TAGs, which characterize correspondences between languages. The formalism's
intended usage is to relate expressions of natural languages to their associated
semantics represented in a logical form language, or to their translates
in another natural language; in summary, we intend it to allow TAGs to
be used beyond their role in syntax proper. We discuss the application
of synchronous TAGs to concrete examples, mentioning primarily in passing
some computational issues that arise in its interpretation.
-
Related references:
-
(To be read together)
Mehryar Mohri, Fernando Pereira and Michael
Riley, 1996. Weighted
automata in text and speech processing
and
András Kornai, 1996 Comments
on Mohri, Pereira and Riley
Workshop on Extended Finite State Models of Language (ECAI
'96)
-
Abstract: Finite-state automata are a very effective tool in natural language
processing. However, in a variety of applications and especially in speech
precessing, it is necessary to consider more general machines in which
arcs are assigned weights or costs. We briefly describe some of the main
theoretical and algorithmic aspects of these machines. In particular, we
de scribe an efficient composition algorithm for weighted transducers,
and give examples illustrating the value of determinization and minimization
algorithms for weighted automata.
-
Jong C. Park, 1995. Quantifier
Scope and Constituency
ACL '95
-
Abstract: Traditional approaches to quantifier scope typically need stipulation
to exclude readings that are unavailable to human understanders. This paper
shows that quantifier scope phenomena can be precisely characterized by
a semantic representation constrained by surface constituency, if the distinction
between referential and quantificational NPs is properly observed. A CCG
implementation is described and compared to other approaches.
-
Related references:
-
Jerry R. Hobbs and Andrew Kehler, 1997.
A theory of
parallelism and the case of VP ellipsis
35th Annual Meeting of the Association for Computational Linguistics
and 8th Conference of the European Chapter of the Association for Computational
Linguistics: Proceedings of the Conference
Here is an ungzipped postscript version.
-
Abstract: We provide a general account of parallelism in discourse, and
apply it to the special case of resolving possible readings for instances
of VP ellipsis. We show how several problematic examples are accounted
for in a natural and straightforward fashion. The generality of the approach
makes it directly applicable to a variety of other types of ellipsis and
reference.
-
Related references:
-
David R. Traum and James F. Allen, 1994. Discourse
Obligations in Dialogue Processing
Proceedings of the 32nd Annual Meeting of the Association for Computational
Linguistics (ACL-94), pp 1-8.
-
Abstract: We show that in modeling social interaction, particularly dialogue,
the attitude of obligation can be a useful adjunct to the popularly considered
attitudes of belief, goal, and intention and their mutual and shared counterparts.
In particular, we show how discourse obligations can be used to account
in a natural manner for the connection between a question and its answer
in dialogue and how obligations can be used along with other parts of the
discourse context to extend the coverage of a dialogue system.
-
Related references:
-
Marilyn Walker, 1996. Limited
Attention and Discourse Structure
Computational Linguistics, 22(2)
-
Abstract: This squib examines the role of limited attention in a theory
of discourse structure and proposes a model of attentional state that relates
current hierarchical theories of discourse structure to empirical evidence
about human discourse processing capabilities. First, I present examples
that are not predicted by Grosz and Sidner's stack model of attentional
state. Then I consider an alternative model of attentional state, the cache
model, which accounts for the examples, and which makes particular processing
predictions. Finally I suggest a number of ways that future research could
distinguish the predictions of the cache model and the stack model.
-
Related references:
-
Barbara J. Grosz and Peter C. Gordon, 1999. Conceptions of Limited Attention
and Discourse Focus.
Computational Linguistics 25(4), pp 617-624.
-
Michael Strube, 1998.
Never
Look Back: An Alternative to Centering
Proceedings of COLING-ACL '98
-
Abstract: I propose a model for determining the hearer's attentional state
which depends solely on a list of salient discourse entities (S-list).
The ordering among the elements of the S-list covers also the function
of the backward-looking center in the centering model. The ranking criteria
for the S-list are based on the distinction between hearer-old and hearer-new
discourse entities and incorporate preferences for inter- and intra-sentential
anaphora. The model is the basis for an algorithm which operates incrementally,
word by word.
-
Phillip Resnik, 1995.Disambiguating
Noun Groupings with Respect to WordNet Senses
Proceedings of the 3rd Workshop on Very Large Corpora.
-
Abstract: Word groupings useful for language processing tasks are increasingly
available, as thesauri appear on-line, and as distributional word clustering
techniques improve. However, for many tasks, one is interested in relationships
among word senses, not words. This paper presents a method for automatic
sense disambiguation of nouns appearing within sets of related nouns ---
the kind of data one finds in on-line thesauri, or as the output of distributional
clustering algorithms. Disambiguation is performed with respect to WordNet
senses, which are fairly fine-grained; however, the method also permits
the assignment of higher-level WordNet categories rather than sense labels.
The method is illustrated primarily by example, though results of a more
rigorous evaluation are also presented.
-
Related references:
-
David Yarowsky, 1995. Unsupervised
word sense disambiguation rivaling supervised methods
Proceedings of the 33rd Annual Meeting of the Association for Computational
Linguistics, pp 189--196.
-
Abstract: This paper presents an unsupervised learning algorithm for sense
disambiguation that, when trained on unannotated English text, rivals the
performance of supervised techniques that require time-consuming hand annotations.
The algorithm is based on two powerful constraints -- that words tend to
have one sense per discourse and one sense per collocation -- exploited
in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.
-
Related references:
-
Ciprian Chelba and Frederick Jelinek, 1998.
Exploiting
syntactic structure for language modeling
COLING-ACL '98 36th Annual Meeting of the Association for Computational
Linguistics and 17th International Conference on Computational Linguistics:
Proceedings of the Conference, Vol 1, pp 225--231
-
Abstract: The paper presents a language model that develops syntactic structure
and uses it to extract meaningful information from the word history, thus
enabling the use of long distance dependencies. The model assigns probability
to every joint sequence of words-binary-parse-structure with headword annotation
and operates in a left-to-right manner - therefore usable for automatic
speech recognition. The model, its probabilistic parameterization, and
a set of experiments meant to evaluate its predictive power are presented;
an improvement over standard trigram modeling is achieved.
-
Related references:
-
Lillian Lee and Fernando Pereira, 1999.
Distributional
similarity models: Clustering vs. Nearest Neighbors
Proceedings of the Thirty-Seventh Annual Meeting of the Association
for Computational Linguistics (ACL'99), pp 23-40.
-
Abstract: Distributional similarity is a useful notion in estimating the
probabilities of rare joint events. It has been employed both to cluster
events according to their distributions, and to directly compute averages
of estimates for distributional neighbors of a target event. Here, we examine
the tradeoffs between model size and prediction accuracy for cluster-based
and nearest neighbors distributional models of unseen events.
-
Related references:
-
Hinrich Schütze, 1992. Dimensions
of Meaning.
Proceedings of Supercomputing, pp 787-796.
Here
is an HTML version.
-
Fernando Pereira, Naftali Tishby, and Lillian Lee, 1993. Distributional
Clustering of English Words
Proceedings of the 31st ACL, pp 183-90.
Here
is a PDF version.
-
Dekang Lin, 1998. Automatic
Retrieval and Clustering of Similar Words.
COLING-ACL98.
- Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and
Franz Beil. Inducing a Semantically Annotated Lexicon via EM-Based
Clustering
Proceedings of ACL '99
-
David M. Zajic and Keith J. Miller, 1998.
Where
Interlingua Can Make a Difference
Proceedings of the Second AMTA SIG-IL Workshop on Interlinguas
Here is an HTML version.
-
Abstract:
Furnished with an English text and its equivalent in 13 foreign languages,
we set out to determine the potential for improving the quality of translation
results by an Interlingual (IL) approach to machine translation (MT) as
compared to transfer-based systems. In this paper, we analyze the errors
made by two commercial transfer-based MT systems, provide an observational
classification of the errors, and group the errors according to whether
or not an Interlingual approach would improve system output. We describe
an existing IL, Composed Lexical Conceptual Structures, and illustrate
with examples how some of the observed errors might be corrected using
this IL representation.
-
Michael Dorna and Martin Emele, 1996. Semantic-based
Transfer
Proceedings of the 16th International Conference on Computational
Linguistics (COLING-96), pp. 316-321.
-
Abstract:
This article presents a new semantic-based transfer approach developed and applied within the Verbmobil Machine Translation
project. We give an overview of the declarative transfer formalism together with its procedural realization. Our approach is
discussed and compared with several other approaches from the MT literature.
-
For further reading (but not a reaction essay): Bonnie Dorr, Pamela Jordan,
and J. Benoit, 1999, A
Survey of Current Paradigms in Machine Translation
Advances in Computers, edited by Marvin V. Zelkowitz, Vol 49,
Academic Press.
-
Eric Brill, 1993. Automatic
grammar induction and parsing free text: A transformation-based approach
31st Annual Meeting of the Association for Computational Linguistics:
Proceedings of the Conference, pp 259-265.
-
Abstract: In this paper we describe a new technique for parsing free
text: a transformational grammar is automatically learned that is
capable of accurately parsing text into binary-branching syntactic
trees with nonterminals unlabelled. The algorithm works by beginning
in a very naive state of knowledge about phrase structure. By
repeatedly comparing the results of bracketing in the current state to
proper bracketing provided in the training corpus, the system learns a
set of simple structural transformations that can be applied to reduce
error. After describing the algorithm, we present results and compare
these results to other recent results in automatic grammar
induction.
-
Related references:
-
Dan Roth, 1998. Learning
to Resolve Natural Language Ambiguities: A Unified Approach
Proceedings of AAAI '98.
-
Abstract: We analyze a few of the commonly used statistics based and machine
learning algorithms for natural language disambiguation tasks and observe
that they can be re-cast as learning linear separators in the feature space.
Each of the methods makes a priori assumptions, which it employs, given
the data, when searching for its hypothesis. Nevertheless, as we show,
it searches a space that is as rich as the space of all linear separators.
We use this to build an argument for a data driven approach which merely
searches for a good linear separator in the feature space, without
further assumptions on the domain or a specific problem.
We present such an approach - a sparse network of linear separators,
utilizing the Winnow learning algorithm - and show how to use it in a variety
of ambiguity resolution problems. The learning approach presented is attribute-efficient
and, therefore, appropriate for domains having very large number of attributes.
In particular, we present an extensive experimental comparison of our
approach with other methods on several well studied lexical disambiguation
tasks such as context-sensitive spelling correction, prepositional phrase
attachment and part of speech tagging. In all cases we show that our approach
either outperforms other methods tried for these tasks or performs comparably
to the best.
-
Related references:
-
Eric Brill and Grace Ngai, 1999. Man
[and Woman] vs. Machine: A Case Study in Base Noun Phrase
Learning
Proceedings of ACL'99
- Abstract: A great deal of work has been done demonstrating the
ability of machine learning algorithms to automatically extract
linguistic knowledge from annotated corpora. Very little work has
gone into quantifying the difference in ability at this task between a
person and a machine. This paper is a first step in that direction.
Back
to home page
CS674, Spring '00
Lillian Lee