CS 674 SP'00: Natural Language Processing
References from lecture
1/24: Bill
Gates quote, Gartner symposium, 1997.
1/26:
- Noam Chomsky, 1956. Three models for the description of
language. IRE Transactions on Information Theory, 2(3):113-124.
-
Christopher Culy, 1985. The complexity of the vocabulary of
Bambara. Linguistics and Philosophy, 8:345-351.
- John E. Hopcroft and Jeffrey D. Ullman, 1979. The Chomsky
Hierarchy (Chapter 9). Introduction
to Automata Theory, Languages, and Computation, Addison-Wesley.
- G. A. Miller, E. B. Newman, and E. A. Friedman, 1957. Some
effects of intermittent silence. American J. Psychology 70, 311-313.
- Geoff Pullum, 1991. Footloose
and context-free. In The Great Eskimo Vocabulary Hoax, U. of
Chicago Press.
- Stuart Shieber, 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy,
8:333-343.
- George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press .
1/31:
- Ted Briscoe, 1996. Robust
Parsing. Chapter 3.9 of Survey of the State of the Art
in Human Language Technology, Ed. Ronald A. Cole, Joseph
Mariani, Hans Uszkoreit, Giovanni Battista Varile, Annie Zaenen,
Antonio Zampolli, and Victor Zue, Cambridge University Press.
- John E. Hopcroft and Jeffrey D. Ullman, 1979. Introduction
to Automata Theory, Languages, and Computation. [See pp. 139-141
for CKY pseudocode.]
- Howard Lasnik, 1990. Syntax. In An Invitation to
Cognitive Science: Language , Ed. Daniel N. Osherson and Howard
Lasnik. [See section 1.3 for a brief intro to trace theory]
2/2:
- Jay Earley, 1970. An
efficient context-free parsing algorithm. Communications of the ACM
- Mitchell P. Marcus, 1980. A theory of syntactic recognition
for natural language, MIT Press. (based on Marcus' '77 PhD
thesis).
See also Marcus's (1978) "A computational account of some constraints
on language", reprinted in Readings in Natural Language
Processing, Ed. Barbara J. Grosz, Karen Sparck Jones, and Bonnie
Lynn Webber, Morgan Kaufmann, 1986.
- Fernando Pereira and David Warren, 1983. Parsing
as deduction. Proceedings of the 21st Annual Meeting of the ACL.
- Andreas Stolcke, 1995. An Efficient Probabilistic Context-Free
Parsing Algorithm that Computes Prefix Probabilities. Computational
Linguistics 21(2), 165-201. Longer version: ICSI TR 93-065
2/7:
- Taylor L. Booth and Richard A. Thompson, 1973. Applying
probability measures to abstract languages. IEEE Transactions on
Computers C-22.442--450.
- Joshua Goodman, 1996. Parsing
algorithms and metrics. Proceedings of the 34th Annual Meeting of the ACL, pages 177-183.
2/9:
- Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag, 1985.
Generalized phrase structure grammar, Harvard University
Press.
- Maurice Gross, 1975. Méthodes en syntax, Hermann.
2/14:
- Aravind K. Joshi, Leon S. Levy, Masako Takahashi, 1975. Tree
Adjunct Grammars. Journal of Computer and System Sciences
10(1): 136-163.
- Aravind K. Joshi and Yves Schabes, Tree-adjoining
grammars. I don't know of a publication source for this.
2/16:
- Anne Abeille and Yves Schabes, 1989. Parsing idioms in
lexicalized TAGs. Fourth Conference of the European Chapter of
the Association for Computational Linguistics (EACL '89).
- Tilman Becker, Aravind K. Joshi, and Owen Rambow, 1991.
Lon-distance scrambling and tree adjoining grammars. Fifth Conference of the European Chapter of
the Association for Computational Linguistics (EACL '91).
- Rebecca Hwa, 1998. An Empirical
Evaluation of Probabilistic Lexicalized Tree Insertion
Grammars. Proceedings of ACL-COLING, pp. 557--563.
- Aravind K. Joshi, 1985. Tree adjoining grammars: how much
context-sensitivity is required to provide reasonable structural
description. In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky,
eds, Natural Language Processing: psychological, computational,
and theoretical perspectives, Cambridge.
- James Rogers, 1994. Capturing
CFLs with Tree Adjoining Grammars. Proceedings of the 32nd
Annual Meeting of the ACL, pages. 155--162.
- Giorgio Satta and William Schuler. Restrictions on Tree
Adjoining Languages. Proceedings of ACL-COLING, pp. 1176--1182.
- Yves Schabes and Roger Waters, 1993. Stochastic lexicalized
context-free grammar. Proceedings of the Third Internatioanl
Workshop on Parsing Technologies, pp. 257--266.
(see also the 1994 Mitsubishi Electric Research Labs Technical Report
TR-94-13, Tree insertion grammar: a cubic-time parsable
formalism that lexicalizes context-free grammar without changing the
tree produced.)
- K. Vijay-Shankar and Aravind K. Joshi, 1988. Feature structure
based tree adjoining grammars. Proceedings of the 12th
International Conference on Computational Linguistics (COLING '88).
2/21:
2/23:
- Bernard Merialdo, 1994. Tagging English text with a
probabilistic model. Computational Linguistics 20 (2), pp. 155-172.
- Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and
selected applications in speech recognition. Proceedings of the
IEEE 77 (2), pp. 257--286.
- Lawrence R. Rabiner and B. H. Juang, 1986. An introduction to
hidden Markov models. IEEE ASSP Magazine, pp. 4-15.
2/28:
- Bob Carpenter, 1997. Type-Logical Semantics. MIT Press.
- Robin Cooper, 1983. Quantification and Semantic
Theory. D. Reidel.
- R. Montague, 1974. Formal Philosophy. Yale University
Press.
- Mark Steedman, 1996. Surface Structure and
Interpretation. MIT Press.
3/1:
3/6:
- Ralph Grishman, 1986. Computational Linguistics: An
Introduction. Cambridge.
- Barbara Grosz, Aravind Joshi, and Scott Weinstein, 1995. Centering:
A Framework for Modeling the Local Coherence of Discourse.
Computational Linguistics 21(2), pp. 203-225.
- Barbara J. Grosz and Candace L. Sidner, 1986. Attention,
Intentions, and the Structure of Discourse. Computational
Linguistics 12(3).
- Jerry R. Hobbs, 1978. Resolving Pronoun References. Reprinted
in Grosz, Sparck Jones, and Webber, Readings in Natural Language
Processing.
- Jerry R. Hobbs, 1979. Coherence and co-reference. Cognitive
Science 3(1), pages 67--82.
- Ray Jackendoff, 1972. Semantic Interpretation in Generative
Grammar. MIT Press.
- W. C. Mann and S. A. Thompson, 1983. Relational Propositions in
Discourse. TR RR-83-115, Information Sciences Institute, Marina del
Rey, CA.
- Johanna Moore and Martha Pollack, 1992. A problem for RST: The
need for multi-level discourse analysis. Computational
Linguistics 18(4), pp. 537--544.
- L. Polyani and R. Scha, 1988. Discourse Syntax and Semantics. In
Liva Polyani, ed., The Structure of Discourse, Ablex.
- R. Scha and L. Polyani, 1988. An augmented context free grammar
for discourse. Proceedings of COLING.
- Yorick Wilks, 1975. An intelligent analyzer and understander of
English. Communications of the ACM 18(5), 264--274. Reprinted in Grosz et al,
Readings in Natural Language Processing
3/8:
- Kasparov vs. Deep Blue game Match 1, game 2 commentary
3/13:
- Yehoshua Bar-Hillel, 1960. The present status of automatic
translation of languages. In Franz Alt, A. Donald Booth, and
R. E. Meagher, eds., Advances in Computers. Academic Press.
- Eric Brill, 1995. Transformation-Based
Error-Driven Learning and Natural Language Processing: A Case Study in
Part of Speech Tagging Computational Linguistics
- Kathleen G. Dahlgren, 1988. Naive Semantics for Natural
Language Understanding. Kluwer.
- William Gale, Kenneth Church, and David Yarowsky, 1992. A
method for disambiguating word senses in a large corpus.
Computers and the Humanities 26, pages 415--439.
- Nancy Ide and Jean Veronis. Introduction to the special issue on
word sense disambiguation: The state of the art. Computational
Linguistics 24(1), pages 1--40.
- Abraham Kaplan, 1950. An experimental study of ambiguity and
context. Mechanical Translation 2(2): 39-46 (issue appeared
in 1955).
- J. J. Katz and J. A. Fodor, 1963, The structure of semantic
theory. Language 39, pages 170--210.
- Bob Krovetz and Bruce Croft, 1992. Lexical Ambiguity and
Information Retrieval. ACM Transactions on information
Systems 10(2), oages 115--141.
- Alpha Luk, 1995. Statistical Sense Disambiguation with
Relatively Small Corpora using Dictionary Definitions.
Proceedings of the 33rd ACL.
- Ray Mooney, 1996. Comparative Experiments on Disambiguating Word
Senses: An Illustration of the Role of Bias in Machine Learning. Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing, pp. 82-91
- Erwin Reifler, 1955. The mechanical determination of meaning.
In William N. Locke and A. Donald Booth, eds., Machine
Translation of Languages. John Wiley amd Sons.
- Mark Sanderson, 1994. Word
Sense Disambiguation and Information Retrieval. PhD Thesis,
Technical Report (TR-1997-7), Dept. of Computing Science, University
of Glasgow.
- Hinrich Schütze, 1998. Automatic Word Sense Discrimination.
Computational Linguistics 24(1), pages 97--124.
- Hinrich Schütze and Jan O. Pedersen, 1995. Information
Retrieval Based on Word Senses. Fourth Annual Symposium on
Document Analysis and Information Retrieval, pages 161-175.
- Yorick Wilks and Mark Stevenson, 1998. The
Grammar of Sense: Using part-of-speech tags as a first step in
semantic disambiguation. Journal of Natural Language
Engineering 4(2), pages. 135-144. (See also cmp-lg/9607028)
- David Yarowsky, 1992. Word sense disambiguation using
statistical models of Roget's categories trained on large
corpora.. Proceedings of COLING. Nantes, pp. 454-460.
3/15:
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra,
and Robert L. Mercer, 1991. Word-sense disambiguation using
statistical methods. Proceedings of the 29th ACL.
- Michael Lesk, 1986. Automatic sense disambiguation
using machine readable dictionaries: how to tell a pine code from an
ice cream cone. Proceedings of the fifth international
conference on Systems documentation, pages 24--26.
- Raymond J. Mooney, 1996. Comparative
Experiments on Disambiguating Word Senses: An Illustration of the Role
of Bias in Machine Learning . Proceedings of the 1996
Conference on Empirical Methods in Natural Language Processing,
pages 82--91.
- Ronald L. Rivest, 1987. Learning
decision lists. Machine Learning2(3), pages 229--246.
- Hinrich Schütze, 1992. Context space. Working Notes of the
AAAI Fall Symposium on Probabilistic Approaches to Natural
Language, pages 113--120.
- David Yarowsky, 1996. Homograph
Disambiguation in Speech Synthesis. In Julia Hirschberg, Richard
Sproat, and Jan van Santen, eds., Progress in Speech
Synthesis, pages 159--175.
3/27:
- Timothy C. Bell, John G. Cleary, and Ian H. Witten, 1990.
Text Compression. Prentice Hall.
- William A. Gale and Kenneth W. Church, 1990. Estimation
procedures for language context: poor estimates are worse than none.
COMPSTAT: Proceedings in Computational Statistics, pages
69--74.
- William A. Gale and Kenneth W. Church, 1994. What's wrong with
adding one? In Corpus-Based Research into Language,
N. Oostdijk and P. de Haan, eds., Rodolpi.
- I. J. Good, 1953. The population frequencies of species and the
estimation of population parameters. Biometrika 40, pages
237--264.
- Frederick Jelinek and Robert L. Mercer, 1980. Interpolated
Estimation of Markov Source Parameters from Sparse Data.
Proceedings of the Workshop on Pattern Recognition in
Practice, pages 381--397.
- Slava M. Katz, 1987. Estimation of Probabilities from Sparse
Data for the Language Model Component of a Speech Recognizer.
IEEE Transactions on Acoustics, Speech and Signal Processing,
ASSP-35 (3), pp 400-401.
- Pierre Simon Laplace. Essai Philosophique sur les
probabilities. There appear to be several editions; Ristad's A
natural law of succession dates it to 1775.
- Arthur Nadas, 1985. On Turing's formula for word probabilities.
IEEE Transactions on Acoustics, Speech, and Signal
Processing, ASSP-33 (6), pages 1414--1416.
3/29:
- Peter F. Brown and Vincent J. DellaPietra and Peter V. deSouza
and Jennifer C. Lai and Robert L. Mercer, 1992. Class-based n-gram
models of natural language. Computational Linguistics 18(4),
pages 467--479.
- Frederick Jelinek, Robert L. Mercer and Salim Roukos, 1992.
Principles of Lexical Language Modeling for Speech Recognition. In
Sadaoki Furui and M. Mohan Sondhi, eds., Advances in Speech Signal
Processing, Mercer Dekker.
4/3:
- Thomas M. Cover and Joy A. Thomas, 1991. Elements of
Information Theory. Wiley.
- Bonnie Dorr, 1994. Machine
Translation Divergences: A Formal Description and Proposed Solution
. Computational Linguistics 20(4), pages 597--633.
- Eduard H. Hovy and Kevin Knight, 1996. Machine Translation.
Tutorial at ACL '96.
- Bente Maegaard, editor, 1999. Machine
Translation. In Multilingual Information
Management: Current Levels and Future Abilities, Eduard Hovy,
Nancy Ide, Robert Frederking, Joseph Mariani, and Antonio Zampolli,
editors.
4/5:
- Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent
J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer,
Harry Printz, and Lubos Ures, 1994. The
Candide System for Machine Translation. Proceedings of the
1994 ARPA Workshop on Human Language Technology
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra,
and Robert L. Mercer, 1993.
The
Mathematics of Statistical Machine Translation". Computational
Linguistics 19(2), pp. 263--311.
- Noam Chomsky, 1957. Syntactic Structures, Mouton.
- John Rupert Firth, 1957. A synopsis of linguistic theory
1930--1955. In Studies in Linguistic Analysis.
- Zellig S. Harris, 1951. Structural Linguistics,
U. Chicago Press.
- Claude Shannon, 1948. A
mathematical theory of communication. Bell System Technical
Journal, vol. 27, pp. 379-423 and 623-656. Republished as
"The mathematical theory of communication" in Warren Weaver and
Claude E. Shannon, eds., The Mathematical Theory of
Communication, U. Illinois Press, 1949.
4/10:
- E. Mark Gold, 1967. Language Identification in the Limit.
Information and Control 10(50), pp. 447--474.
4/12:
- L. Baum, 1972. An inequality and associated maximization
technique in statistical estimation of probabilistic functions of a
Markov process. Inequalities 3, pp. 1-8.
- Arthur P. Dempster and Nan M. Laird and Donald B. Rubin, 1977.
Maximum
Likelihood From Incomplete Data via the EM Algorithm.
Journal of the Royal Statistical Society Series B, 39(1),
pp. 1-38.
- Other useful Baum-Welch references are chapter 9 from the Jelinek book,
and Chapter 11 of R. Durbin, S. Eddy, A. Krogh, and G. Mitchison,
Biological Sequence Analysis: Probabilistic Models of Proteins
and Nucleic Acids. A useful reference for EM in general is Michael Collins' exam paper The EM
Algorithm.
4/17:
- David W. Aha, Dennis Kibler, and Marc K. Albert, 1991.
Instance-based Learning Algorithms. Machine Learning 6,
pp. 37--66.
- James K. Baker, 1979. Trainable Grammars for Speech Recognition
Proceedings of the Spring Conference of the Acoustical Society of
America, pp. 547--550.
- Glenn Carroll and Eugene Charniak, 1992. Two
Experiments on Learning Probabilistic Dependency Grammars from
Corpora. Brown University Tech Report CS-92-16.
- Lari and Young, 1990. The
estimation of stochastic context-free grammars using the
inside-outside algorithm. Computer Speech and Language
4, pp. 35--56.
- Fernando Pereira and Yves Schabes, 1992. Inside-outside reestimation from
partially bracketed corpora. Proceedings of the 30th Annual
Meeting of the ACL, pp. 128-135
4/19
4/24:
- Frederick Mosteller and David L. Wallace, 1964. Inference
and disputed authorship: The Federalist. Addison-Wesley.
- Frederick Mosteller and David L. Wallace, 1984. Applied
Bayesian and classical inference : the case of the Federalist
papers. Springer-Verlag.
- The Federalist Papers themselves are available online.
4/25:
- Hugh Loebner. In Response
[to Shieber's article]
- James H. Moor, 1976. An
analysis of the Turing test, Philosophical Studies 30,
pp. 249--257.
- Alan M. Turing, 1950. Computing
machinery and intelligence Mind LIX(236), pp. 433-460.
- Stuart Shieber, 1994. Lessons
from a Restricted Turing Test, Communications of the
Association for Computing Machinery,, 37(6), pp. 70-78.
Back
to home page
CS674, Spring '00
Lillian Lee