CS 674 SP'00: Natural Language Processing

References from lecture

1/24: Bill Gates quote, Gartner symposium, 1997.

1/26:

Noam Chomsky, 1956. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113-124.
Christopher Culy, 1985. The complexity of the vocabulary of Bambara. Linguistics and Philosophy, 8:345-351.
John E. Hopcroft and Jeffrey D. Ullman, 1979. The Chomsky Hierarchy (Chapter 9). Introduction to Automata Theory, Languages, and Computation, Addison-Wesley.
G. A. Miller, E. B. Newman, and E. A. Friedman, 1957. Some effects of intermittent silence. American J. Psychology 70, 311-313.
Geoff Pullum, 1991. Footloose and context-free. In The Great Eskimo Vocabulary Hoax, U. of Chicago Press.
Stuart Shieber, 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8:333-343.
George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press .

1/31:

Ted Briscoe, 1996. Robust Parsing. Chapter 3.9 of Survey of the State of the Art in Human Language Technology, Ed. Ronald A. Cole, Joseph Mariani, Hans Uszkoreit, Giovanni Battista Varile, Annie Zaenen, Antonio Zampolli, and Victor Zue, Cambridge University Press.
John E. Hopcroft and Jeffrey D. Ullman, 1979. Introduction to Automata Theory, Languages, and Computation. [See pp. 139-141 for CKY pseudocode.]
Howard Lasnik, 1990. Syntax. In An Invitation to Cognitive Science: Language , Ed. Daniel N. Osherson and Howard Lasnik. [See section 1.3 for a brief intro to trace theory]

2/2:

Jay Earley, 1970. An efficient context-free parsing algorithm. Communications of the ACM
Mitchell P. Marcus, 1980. A theory of syntactic recognition for natural language, MIT Press. (based on Marcus' '77 PhD thesis). See also Marcus's (1978) "A computational account of some constraints on language", reprinted in Readings in Natural Language Processing, Ed. Barbara J. Grosz, Karen Sparck Jones, and Bonnie Lynn Webber, Morgan Kaufmann, 1986.
Fernando Pereira and David Warren, 1983. Parsing as deduction. Proceedings of the 21st Annual Meeting of the ACL.
Andreas Stolcke, 1995. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2), 165-201. Longer version: ICSI TR 93-065

2/7:

Taylor L. Booth and Richard A. Thompson, 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442--450.
Joshua Goodman, 1996. Parsing algorithms and metrics. Proceedings of the 34th Annual Meeting of the ACL, pages 177-183.

2/9:

Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag, 1985. Generalized phrase structure grammar, Harvard University Press.
Maurice Gross, 1975. Méthodes en syntax, Hermann.

2/14:

Aravind K. Joshi, Leon S. Levy, Masako Takahashi, 1975. Tree Adjunct Grammars. Journal of Computer and System Sciences 10(1): 136-163.
Aravind K. Joshi and Yves Schabes, Tree-adjoining grammars. I don't know of a publication source for this.

2/16:

Anne Abeille and Yves Schabes, 1989. Parsing idioms in lexicalized TAGs. Fourth Conference of the European Chapter of the Association for Computational Linguistics (EACL '89).
Tilman Becker, Aravind K. Joshi, and Owen Rambow, 1991. Lon-distance scrambling and tree adjoining grammars. Fifth Conference of the European Chapter of the Association for Computational Linguistics (EACL '91).
Rebecca Hwa, 1998. An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars. Proceedings of ACL-COLING, pp. 557--563.
Aravind K. Joshi, 1985. Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural description. In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky, eds, Natural Language Processing: psychological, computational, and theoretical perspectives, Cambridge.
James Rogers, 1994. Capturing CFLs with Tree Adjoining Grammars. Proceedings of the 32nd Annual Meeting of the ACL, pages. 155--162.
Giorgio Satta and William Schuler. Restrictions on Tree Adjoining Languages. Proceedings of ACL-COLING, pp. 1176--1182.
Yves Schabes and Roger Waters, 1993. Stochastic lexicalized context-free grammar. Proceedings of the Third Internatioanl Workshop on Parsing Technologies, pp. 257--266. (see also the 1994 Mitsubishi Electric Research Labs Technical Report TR-94-13, Tree insertion grammar: a cubic-time parsable formalism that lexicalizes context-free grammar without changing the tree produced.)
K. Vijay-Shankar and Aravind K. Joshi, 1988. Feature structure based tree adjoining grammars. Proceedings of the 12th International Conference on Computational Linguistics (COLING '88).

2/21:

Mehryar Mohri, 1997. Finite-state transducers in language and speech processing. Computational Linguistics 23(2).
Emmanuel Roche, 1997. Parsing with finite-state transducers. Chapter 8 in Roche and Schabes, editors, Finite-State Language Processing.
Emmanuel Roche and Yves Schabes, 1997. Finite-State Language Processing. MIT Press.
Richard Sproat, Chilin Shih, William Gale, and Nancy Chang, 1994. A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Proc. ACL '94

2/23:

Bernard Merialdo, 1994. Tagging English text with a probabilistic model. Computational Linguistics 20 (2), pp. 155-172.
Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (2), pp. 257--286.
Lawrence R. Rabiner and B. H. Juang, 1986. An introduction to hidden Markov models. IEEE ASSP Magazine, pp. 4-15.

2/28:

Bob Carpenter, 1997. Type-Logical Semantics. MIT Press.
Robin Cooper, 1983. Quantification and Semantic Theory. D. Reidel.
R. Montague, 1974. Formal Philosophy. Yale University Press.
Mark Steedman, 1996. Surface Structure and Interpretation. MIT Press.

3/1:

Mary Dalrymple, Stuart M. Shieber, and Fernando C. N. Pereira. Ellipsis and Higher-Order Unification. Linguistics and Philosophy 14(4), pages 339-452.
Jerry Hobbs and Stuart Shieber, 1987. An Algorithm for Generating Quantifier Scopings. Computational Linguistics, 13(1-2), pages 47-63.
Fernando C. N. Pereira and Stuart Shieber, 1987. Prolog and Natural Language Analysis. CSLI Publications.

3/6:

Ralph Grishman, 1986. Computational Linguistics: An Introduction. Cambridge.
Barbara Grosz, Aravind Joshi, and Scott Weinstein, 1995. Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2), pp. 203-225.
Barbara J. Grosz and Candace L. Sidner, 1986. Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12(3).
Jerry R. Hobbs, 1978. Resolving Pronoun References. Reprinted in Grosz, Sparck Jones, and Webber, Readings in Natural Language Processing.
Jerry R. Hobbs, 1979. Coherence and co-reference. Cognitive Science 3(1), pages 67--82.
Ray Jackendoff, 1972. Semantic Interpretation in Generative Grammar. MIT Press.
W. C. Mann and S. A. Thompson, 1983. Relational Propositions in Discourse. TR RR-83-115, Information Sciences Institute, Marina del Rey, CA.
Johanna Moore and Martha Pollack, 1992. A problem for RST: The need for multi-level discourse analysis. Computational Linguistics 18(4), pp. 537--544.
L. Polyani and R. Scha, 1988. Discourse Syntax and Semantics. In Liva Polyani, ed., The Structure of Discourse, Ablex.
R. Scha and L. Polyani, 1988. An augmented context free grammar for discourse. Proceedings of COLING.
Yorick Wilks, 1975. An intelligent analyzer and understander of English. Communications of the ACM 18(5), 264--274. Reprinted in Grosz et al, Readings in Natural Language Processing

3/8:

Kasparov vs. Deep Blue game Match 1, game 2 commentary

3/13:

Yehoshua Bar-Hillel, 1960. The present status of automatic translation of languages. In Franz Alt, A. Donald Booth, and R. E. Meagher, eds., Advances in Computers. Academic Press.
Eric Brill, 1995. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging Computational Linguistics
Kathleen G. Dahlgren, 1988. Naive Semantics for Natural Language Understanding. Kluwer.
William Gale, Kenneth Church, and David Yarowsky, 1992. A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, pages 415--439.
Nancy Ide and Jean Veronis. Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics 24(1), pages 1--40.
Abraham Kaplan, 1950. An experimental study of ambiguity and context. Mechanical Translation 2(2): 39-46 (issue appeared in 1955).
J. J. Katz and J. A. Fodor, 1963, The structure of semantic theory. Language 39, pages 170--210.
Bob Krovetz and Bruce Croft, 1992. Lexical Ambiguity and Information Retrieval. ACM Transactions on information Systems 10(2), oages 115--141.
Alpha Luk, 1995. Statistical Sense Disambiguation with Relatively Small Corpora using Dictionary Definitions. Proceedings of the 33rd ACL.
Ray Mooney, 1996. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning. Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing, pp. 82-91
Erwin Reifler, 1955. The mechanical determination of meaning. In William N. Locke and A. Donald Booth, eds., Machine Translation of Languages. John Wiley amd Sons.
Mark Sanderson, 1994. Word Sense Disambiguation and Information Retrieval. PhD Thesis, Technical Report (TR-1997-7), Dept. of Computing Science, University of Glasgow.
Hinrich Schütze, 1998. Automatic Word Sense Discrimination. Computational Linguistics 24(1), pages 97--124.
Hinrich Schütze and Jan O. Pedersen, 1995. Information Retrieval Based on Word Senses. Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161-175.
Yorick Wilks and Mark Stevenson, 1998. The Grammar of Sense: Using part-of-speech tags as a first step in semantic disambiguation. Journal of Natural Language Engineering 4(2), pages. 135-144. (See also cmp-lg/9607028)
David Yarowsky, 1992. Word sense disambiguation using statistical models of Roget's categories trained on large corpora.. Proceedings of COLING. Nantes, pp. 454-460.

3/15:

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, 1991. Word-sense disambiguation using statistical methods. Proceedings of the 29th ACL.
Michael Lesk, 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine code from an ice cream cone. Proceedings of the fifth international conference on Systems documentation, pages 24--26.
Raymond J. Mooney, 1996. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning . Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing, pages 82--91.
Ronald L. Rivest, 1987. Learning decision lists. Machine Learning2(3), pages 229--246.
Hinrich Schütze, 1992. Context space. Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 113--120.
David Yarowsky, 1996. Homograph Disambiguation in Speech Synthesis. In Julia Hirschberg, Richard Sproat, and Jan van Santen, eds., Progress in Speech Synthesis, pages 159--175.

3/27:

Timothy C. Bell, John G. Cleary, and Ian H. Witten, 1990. Text Compression. Prentice Hall.
William A. Gale and Kenneth W. Church, 1990. Estimation procedures for language context: poor estimates are worse than none. COMPSTAT: Proceedings in Computational Statistics, pages 69--74.
William A. Gale and Kenneth W. Church, 1994. What's wrong with adding one? In Corpus-Based Research into Language, N. Oostdijk and P. de Haan, eds., Rodolpi.
I. J. Good, 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40, pages 237--264.
Frederick Jelinek and Robert L. Mercer, 1980. Interpolated Estimation of Markov Source Parameters from Sparse Data. Proceedings of the Workshop on Pattern Recognition in Practice, pages 381--397.
Slava M. Katz, 1987. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35 (3), pp 400-401.
Pierre Simon Laplace. Essai Philosophique sur les probabilities. There appear to be several editions; Ristad's A natural law of succession dates it to 1775.
Arthur Nadas, 1985. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33 (6), pages 1414--1416.

3/29:

Peter F. Brown and Vincent J. DellaPietra and Peter V. deSouza and Jennifer C. Lai and Robert L. Mercer, 1992. Class-based n-gram models of natural language. Computational Linguistics 18(4), pages 467--479.
Frederick Jelinek, Robert L. Mercer and Salim Roukos, 1992. Principles of Lexical Language Modeling for Speech Recognition. In Sadaoki Furui and M. Mohan Sondhi, eds., Advances in Speech Signal Processing, Mercer Dekker.

4/3:

Thomas M. Cover and Joy A. Thomas, 1991. Elements of Information Theory. Wiley.
Bonnie Dorr, 1994. Machine Translation Divergences: A Formal Description and Proposed Solution . Computational Linguistics 20(4), pages 597--633.
Eduard H. Hovy and Kevin Knight, 1996. Machine Translation. Tutorial at ACL '96.
Bente Maegaard, editor, 1999. Machine Translation. In Multilingual Information Management: Current Levels and Future Abilities, Eduard Hovy, Nancy Ide, Robert Frederking, Joseph Mariani, and Antonio Zampolli, editors.

4/5:

Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry Printz, and Lubos Ures, 1994. The Candide System for Machine Translation. Proceedings of the 1994 ARPA Workshop on Human Language Technology
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, 1993. The Mathematics of Statistical Machine Translation". Computational Linguistics 19(2), pp. 263--311.
Noam Chomsky, 1957. Syntactic Structures, Mouton.
John Rupert Firth, 1957. A synopsis of linguistic theory 1930--1955. In Studies in Linguistic Analysis.
Zellig S. Harris, 1951. Structural Linguistics, U. Chicago Press.
Claude Shannon, 1948. A mathematical theory of communication. Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656. Republished as "The mathematical theory of communication" in Warren Weaver and Claude E. Shannon, eds., The Mathematical Theory of Communication, U. Illinois Press, 1949.

4/10:

E. Mark Gold, 1967. Language Identification in the Limit. Information and Control 10(50), pp. 447--474.

4/12:

L. Baum, 1972. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, pp. 1-8.
Arthur P. Dempster and Nan M. Laird and Donald B. Rubin, 1977. Maximum Likelihood From Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society Series B, 39(1), pp. 1-38.
Other useful Baum-Welch references are chapter 9 from the Jelinek book, and Chapter 11 of R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. A useful reference for EM in general is Michael Collins' exam paper The EM Algorithm.

4/17:

David W. Aha, Dennis Kibler, and Marc K. Albert, 1991. Instance-based Learning Algorithms. Machine Learning 6, pp. 37--66.
James K. Baker, 1979. Trainable Grammars for Speech Recognition Proceedings of the Spring Conference of the Acoustical Society of America, pp. 547--550.
Glenn Carroll and Eugene Charniak, 1992. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora. Brown University Tech Report CS-92-16.
Lari and Young, 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, pp. 35--56.
Fernando Pereira and Yves Schabes, 1992. Inside-outside reestimation from partially bracketed corpora. Proceedings of the 30th Annual Meeting of the ACL, pp. 128-135

4/19

Jean-Pierre Chanod and Pasi Tapanainen, 1995. Tagging French -- comparing a statistical and a constraint-based method. Proceedings of the EACL.
Christer Samuelsson and Atro Voutilainen (1997) Comparing a Linguistic and a Stochastic Tagger, Proceedings of 35th ACL/8th Conference of the EACL. Also cmp-lg/97060
Pasi Tapanainen and Atro Voutilainen, 1994. Tagging accurately - Don't guess if you know, Proceedings of Fourth ACL Conference on Applied Natural Language Processing. Also cmp-lg/9408009

4/24:

Frederick Mosteller and David L. Wallace, 1964. Inference and disputed authorship: The Federalist. Addison-Wesley.
Frederick Mosteller and David L. Wallace, 1984. Applied Bayesian and classical inference : the case of the Federalist papers. Springer-Verlag.
The Federalist Papers themselves are available online.

4/25:

Hugh Loebner. In Response [to Shieber's article]
James H. Moor, 1976. An analysis of the Turing test, Philosophical Studies 30, pp. 249--257.
Alan M. Turing, 1950. Computing machinery and intelligence Mind LIX(236), pp. 433-460.
Stuart Shieber, 1994. Lessons from a Restricted Turing Test, Communications of the Association for Computing Machinery,, 37(6), pp. 70-78.

Back to home page
CS674, Spring '00
Lillian Lee