CS 430
Information Discovery
Fall 2001

Readings and References


Text Book

William B. Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms.  Prentice Hall, 1992.

Other books

Readings

Readings for discussion classes are to be studied in preparation for the classes on Wednesday evenings..

Week 1: Overview of information discovery

Discussion class
  •  
Other readings
  • Frakes, W.B., Introduction to information storage and retrieval systems. (Frakes and Baeza-Yates, Chapter 1)

Week 2: Basic concepts of information retrieval, Inverted files

Discussion class
  • Harman, D., Fox, E., Baeza-Yates, R.A., Inverted files. (Frakes and Baeza-Yates, Chapter 3)
Other readings
  • Baeza-Yates, R.A., Introduction to data structures and algorithms related to information retrieval. (Frakes and Baeza-Yates, Chapter 2)
  • Zipf, G. K., Human Behaviour and the Principle of Least Effort. Addison-Wesley, 1949
  • Faloutsos, C., Signature files. (Frakes and Baeza-Yates, Chapter 4)
  • Gonnet, G.H., Baeza-Yates, R.A., Lee, W., New Indices for Text: PAT trees and PAT arrays.  (Frakes and Baeza-Yates, Chapter 5)

Week 3:  

Discussion class
  • Fox, C., Lexical Analysis and Stoplists. (Frakes and Baeza-Yates, Chapter 7) 
    [Do not study the details of the computer codes in 7.8, 7.9, 7.10.] 
Other readings

Week 4:  

Discussion class [No discussion class]
Other readings

Week 5: 

Discussion class
  • Frakes, W.B., Stemming Algorithms. (Frakes and Baeza-Yates, Chapter 8)
Other readings

Week 6: 

Discussion class  [No discussion class]
Other readings
  •  Cleverdon, Cyril William. Report on the testing and analysis of an investigation into the comparative efficiency of indexing system. Cranfield, England, College of Aeronautics;1962. 305p LC:63-60414.

  • Cleverdon, Cyril William. The Cranfield tests on index language devices, in ASLIB proceedings, June 1967, v.19, n.6, pp173-194.

  • Text Retrieval Conferences (TREC).  http://trec.nist.gov/

Week 7: 

Discussion class
Other readings
  • E. Fox, S. Betrabet, M. Koushik, W. Lee, Extended Boolean models. (Frakes and Baeza-Yates, Chapter 15)

Week 8: 

Discussion class
Other readings

Week 9: 

Discussion class
  • Harman, D., Ranking algorithms (Frakes and Baeza-Yates, Chapter 14)
Other readings

 

Week 10:  

Discussion class [No discussion class]
Other readings
  •  Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999.  Pages 30 to 34 have an introduction to probabilistic information retrieval.

Week 11: 

Discussion class
Other readings

Week 12: 

Discussion class
Other readings

Week 13: 

Discussion class [No discussion class]
Other readings
  • D. Harman, Relevance feedback and other query modification techniques. (Frakes and Baeza-Yates, Chapter 11, Section 11.2.6)

Week 14: 

Discussion class
  • Srinivasdan, P., Thesaurus construction. (Frakes and Baeza-Yates, Chapter 9)
Other readings
  • W. Y. Arms and C. R. Arms, Cluster analysis used on social science citations, Journal of Documentation, 34 (1) pp 1-11, March 1978.

  • Bruce Schatz, William H. Mischo, Timothy W. Cole, Joseph B. Hardin, Ann P. Bishop, and Hsinchun Chen , Federating Diverse Collections of Scientific Literature, IEEE Computer, May 1996.

Week 15: 

Discussion class
  • Rasmussen, E., Clustering algorithms.  (Frakes and Baeza-Yates, Chapter 16)
Other readings

[CS 430 Home Page]

William Y. Arms
(wya@cs.cornell.edu)
Last changed: November 3, 2001