INFORMATION RETRIEVAL
17:610:551
Spring 2002
Gheorghe Muresan
SCHEDULE AND ASSIGNED READINGS (expect revisions)
You are requested to read at least two chapters or articles
from the list provided each week. According to your background, interest and
potential project you have in mind, as well as the time available, you are
encouraged to read the other items in the list, or to read background material
recommended, or to implement some of the algorithms we discuss.
Lecture 1 ?Jan 24: Introduction
and overview of the course.
Lecture 2 ?Jan 31: The
goals of IR. IR problems, the IR situation, and IR systems.
Readings: Hersh, Chapters
1 and 2. In Sparck Jones & Willett, "Overall introduction", and
Chapter Two, "Introduction". Belew, R. K. (2000), Chapter 1,
“Overview? Belkin, N.J. (1980) Anomalous states of knowledge as a basis for
information retrieval. Canadian Journal of Information Science, v. 5:
133-143. Also: Belkin & Vickery (1985) Chapters 1 and 2; Ingwersen (1992),
Chapter 3; the introductory chapters to any of: Lancaster (1978); Lancaster
& Warner (1993); Meadow (1992); van Rijsbergen (1979),
chapter 1: Introduction; Salton & McGill (1983).
Lecture 3 ?Feb 7: Fundamental concepts in IR. Information,
meaning, aboutness, relevance.
Readings: In Sparck Jones
& Willett, from Chapter 3: the "Introduction". Belkin, N.J.
(1978) Information concepts for information science. Journal of
Documentation, v. 34, no.1: 55-85. Hutchins, W.J. (1978) The concept of
"aboutness" in subject indexing. Aslib Proceedings, vol. 30:
172-181 (Also in Sparck Jones & Willett, pp. 93-97). Saracevic, T. (1975)
Relevance: a review of and a framework for the thinking on the topic. Journal
of the American Society for Information Science, vol. 26: 321-343 (Also in
Sparck Jones & Willett, pp. 143-165).
Lecture 4 ?Feb 14: Actors
and processes in IR systems. What do we want from Information Retrieval ?
Readings: Belkin, N.J.
(1993) Interaction with texts: Information retrieval as information-seeking
behavior. In: Information Retrieval `93: Von der modellierung zur Anwendung.
Konstanz: Universitaetsverlag Konstanz, 55-66. Croft, W.B. (1995) What do people want
from information retrieval? D-Lib Magazine, November. In Kowalski
and Maybury: Chapter 2 “Information Retrieval System Capabilities? Belkin N.J.
& Croft, W.B. (1992) Information
filtering and information retrieval: Two sides of the same coin? Communications
of the ACM, v. 35 no. 12: 29-38.
Lecture 5 ?Feb 21: Document and query representation. Manual vs.
automatic indexing.
Compulsory
readings: Hersh, Chapters 5: “Indexing? J. D. Anderson & J. Perez-Carballo,
“The nature of indexing: how humans and machines analyze messages and texts for
retrieval. Part
I: Research, and the nature of human indexing; Part II:
Machine indexing, and the allocation of human versus machine effort?
Information Processing and Management, vol. 37 (2001), p. 231-254, p. 255-277.
Other readings: In Sparck Jones
& Willett, from Chapter 6, the "Introduction" (especially the
section on Indexing). Foskett, D.J. (1980) Thesaurus. In A. Kent, J. Lancour
& J.E. Daily, eds., Encyclopedia of Library and Information Science, v.
30, pp. 416-462. New York: Marcel Dekker (Also in Sparck Jones &
Willett, pp. 111-134).
Lecture 6 ?Feb
28: Automatic indexing. Lexical analysis. Weighting. Data structures.
Compulsory
readings: van Rijsbergen (1979), Chapter 2:
“Automatic text analysis? Also review “Automatic indexing?from last
week.
Other readings: Hersh, Chapters
8: “Lexical-statistical systems? Belew,
R. K. (2000), Chapter 2, “Extracting lexical features? In Sparck Jones
& Willett, from Chapter 6, the "Introduction" (especially the
section on Indexing). Salton, G. & Buckley, C. (1988) “Term weighting
approaches in automatic text retrieval? Information Processing and
Management, vol. 24: 513-523 (Also in Sparck Jones & Willett, pp.
323-328). Robertson, S. E. and Sparck Jones, K. (1997), ?SPAN
style="mso-bidi-font-size: 12.0pt">Simple, proven approaches to
text retrieval?/SPAN>, University of Cambridge Computer Laboratory
Technical Report no. 356, 1994 (updated 1996,1997).
For stemming code or
a demo, see Martin Porter’s site.
Presentations:
Eakins, J. P. and Graham, M.
E. "Content-based
Image Retrieval: A Report to the JISC Technology Applications Programme"
- Stacy Adduci.
Mikheev, Andrei “Document
Centered Approach to Text Normalization?/A>, SIGIR 2000, Athens ?Craig Willard.
Homework
!
Lecture7 - Mar 7: Models
of IR. Interaction models. Indexing models. Relevance feedback.
Readings: In Sparck Jones
& Willett, from Chapter 5, the "Introduction". Cooper, W.S.
Getting beyond Boole. Information Processing and Management, vol. 24:
243-248. Also in Sparck Jones & Willett, pp. 265-267. Robertson, S.E. The
probability ranking principle in IR. Journal of Documentation. vol 33:
294-304 (Also in Sparck Jones & Willett, pp. 281-286). Salton, G., Wong, A.
& Yang, C.S. (1975) “A vector
space model for automatic indexing?/A>, Communications of the ACM, vol 18:
613-620. Also in Sparck Jones and Willett, pp. 273-280. Saracevic, T. (1996).
Interactive models in information retrieval (IR): Progress, problems, proposal.
In Proceedings of the 1996 ASIS Annual Meeting. Medford, NJ: Learned
Information. Turtle, H. & Croft, W.B. (1990) “Inference
networks for document retrieval?/A>, SIGIR 1990, New York: ACM, 1-24.
Readings proposed
for presentation:
Rajashekar, T.
B. and Croft, W. B. “Combining
Automatic and Manual Index Representations in Probabilistic Retrieval?/A>,
JASIS, 1995.
Campbell, I. “Supporting Information
Needs by Ostensive Definition in an Adaptive Information Space?/A>, MIRO?5.
Bates, Marcia J. “The Design of
Browsing and Berrypicking Techniques for the Online Search Interface." Online Review 13
(October 1989): 407-424 ?Sharon Kaye.
Lecture 8 ?Mar 14: User
interfaces for IR systems.
Part I: Interaction models.
Compulsory readings: Chapter 10: “User
Interfaces and Visualization?/A> by Marti Hearst in ?/SPAN>Modern Information Retrieval?/SPAN>.
Recommended
readings: Journal of the American Society of Information Science, vol.
43, issue 2, 1992, special issue on Human-Computer Interface: “Introduction and
Overview?/A> by Lunin and Harman, ?SPAN
style="mso-bidi-font-size: 12.0pt">Interfaces for end-user
information seeking?/SPAN> by Gary Marchionini, “User-friendly
systems instead of user-friendly front-ends?/A> by Donna Harman, “Intelligent
information retrieval: An introduction?/A> by Susan Gauch, “Models for
hypertext?/A> by Mark F. Frisse and Steve B. Cousins; Muresan,
G. and Harper, D. J. ?SPAN
lang=EN-US style="mso-bidi-font-size: 12.0pt; mso-ansi-language:
EN-US">Document Clustering and Language Models for System-Mediated
Information Access?/SPAN>, ECDL?1, Darmstadt, p. 438-449.
Presentations:
Bates, M. (1990) “Where should
the person stop and the information search interface start??/A> Information
Processing and Management, v 26(5): 575-591 ?Cheryl Milburn.
O’Day, V. L. and
Jeffries, R. ?SPAN lang=EN
style="mso-bidi-font-size: 13.0pt; mso-ansi-language:
EN">Orienteering in an information landscape: how information
seekers get from here to there?/SPAN>, InterCHI?3, Amsterdam ?Tamara Richman.
Hendry, D. G. and Harper, D.
J. “An
informal information-seeking environment?/A>, JASIS 48 (11), 1997 ?
Roman Santillan.
Lecture 9 ?Mar
28: User interfaces for IR systems.
Part II : Tools and
techniques. Information Visualization. Structure. Categorization vs.
clustering.
Readings: Shneiderman,
Ben, chapter “Information Search and Visualization?in “Designing the user
Interface? 3rd ed., 1997 (and associated webpage);
Belkin, N.J., Marchetti, P.-G., Cool, C. (1993) BRAQUE: Design of an interface
to support user interaction in information retrieval. Information Processing
and Management, 29 (3): 325-344; Chalmers, M. and Chitson, P. “Bead:
Exploration in information visualization?/A>, SIGIR?2, Copenhagen, p.
330-337; Nowell, L.T., France, R.K., Hix, D., Heath, L.S., Fox, E.A. (1996)
“Visualizing search results: Some alternatives to query-document similarity? SIGIR?96,
New York, p. 67-75; Williamson, C., Shneiderman, B. (1992) “The
Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information
exploration system?/A>, SIGIR?2, New York, p. 338-346; Nowell, L. T.
and France R. K. and Hix, D. and Heath, L. S. and Fox, E. A. ?SPAN lang=EN
style="mso-bidi-font-size: 13.0pt; mso-ansi-language:
EN">Visualizing search results: some alternatives to
query-document similarity?/SPAN>, SIGIR?6,
Zurich, p. 67-75; Lin, Xia ?SPAN lang=EN-US
style="mso-bidi-font-size: 12.0pt; mso-ansi-language: EN-US">Map
displays for information retrieval?/SPAN>, JASIS, 48(1), 1997,
p. 40-54.
Further readings on HCI:
Preece, J., Rogers, Y. and
Sharp, H. (2002) ?“Interaction Design ?Beyond Human-Computer Interaction?/SPAN> (and associated webpage).
Further readings on
Information Visualization (IV):
Spence, R. (2000) ?
“Information Visualization? ISBN: 0201596261; Chen, C. (1999) ??/SPAN>Information
Visualisation and Virtual Environments?/SPAN>, ISBN: 1852331364; Card, S.
K., MacKinlay, J. D. and Shneiderman (1999) ??/SPAN>Readings in Information
Visualization : Using Vision to Think? ISBN: 1558605339. Also,
University of Maryland’s HCI Lab website,
and InfoViz, a
repository for IV.
Readings proposed for
presentation:
Korfhage, Robert R. ?SPAN
lang=EN style="mso-bidi-font-size: 13.0pt; mso-ansi-language:
EN">To see, or not to see - is
Cutting, D. R., Pedersen, J.
O., Karger, D. and Tukey, J. W. “Scatter/Gather:
A cluster-based approach to browsing large document collections?/A>, SIGIR?2,
Copenhagen, p. 318-329.
Gary Marchionini, ?SPAN
style="mso-bidi-font-size: 12.0pt">Interfaces for end-user
information seeking?/SPAN>, JASIS, 43(2), 1992 ? Minsoo Park.
Lecture 10 ?Apr 4: Evaluation of IR systems. Experimental vs
operational IR systems.
Readings: Hersh, chapter
3: “System evaluation? and chapter 7: “Evaluation? In Baeza-Yates &
Ribeiro-Neto “Modern
Information Retrieval?/A>, chapter 3: “Retrieval Evaluation? In Sparck Jones
& Willett, from Chapter 4, the "Introduction" and the articles by
Saracevic, et al., Lancaster, and Harman. Su, L. (1992) Evaluation measures for
interactive information retrieval. Information Processing and Management,
28(4): 503-516; Harman, Donna “Overview of
the first TREC conference?/A>, SIGIR?3, Pittsburg.
In JASIS,
47(1), January 1996, Special Issue: Evaluation
of Information Retrieval :- Tague-Sutcliffe, J. M. ? “Some
perspectives on the evaluation of information retrieval systems?/A>, Blair,
D. C. ?“STAIRS
redux: Thoughts on the STAIRS evaluation, ten years after?/A>, Hersh, W. et
al. ?“A
task-oriented approach to information retrieval evaluation?/A>; Ellis, D. ?“The dilemma
of measurement in information retrieval research?/A>; Beaulieu, M. et al. ?“Evaluating
interactive systems in TREC?/A>.
In Information
Processing and Management, 31 (3), May-June 1995, Special issue: TREC
:- Harman, D. - ?SPAN
style="mso-bidi-font-size: 12.0pt">Overview of the Second Text
Retrieval Conference (TREC-2)?/SPAN>; Sparck Jones, K. ?“Reflections
on TREC?/A>; Robertson, S. E. et al. ?“Large Test
Collection Experiments on an Operational, Interactive System: Okapi at
TREC?/A>; Belkin, N. et al. ?“Combining the
Evidence of Multiple Query Representations for Information Retrieval?/A>.
In Information
Processing and Management, 36 (1), January 2000, Special issue: TREC :-
Harman, D. - ?SPAN
style="mso-bidi-font-size: 12.0pt">Overview of the Sixth Text
REtrieval Conference (TREC-2)?/SPAN>; Sparck Jones, K. ?“Further
reflections on TREC?/A>; Robertson, S. E. et al. ?“Experimentation
as a way of life: Okapi at TREC?/A>.
The Text Retrieval Conference (TREC) webpage.
Presentations:
Brajnik, G., Mizzaro, S.,
Tasso, C. and Venuti, F. “Strategic
Help in User Interfaces for Information Retrieval?/A>, JASIST, 53(5),
2002, p. 343-358 ?Tina Marie Doody.
Saracevic, T. “Evaluation of
Evaluation in Information Retrieval?/A>, SIGIR?5, Seattle ?Dana
Knauff.
Lecture 11 ?Apr
11: Evaluation of interactive IR systems. IR evaluation in context.
Readings: Hersh, Chapters
3, 7. In Sparck Jones & Willett, from Chapter 4, the
"Introduction" and the articles by Saracevic, et al., Lancaster, and Harman.
Su, L. (1992) Evaluation measures for interactive information retrieval.
Information Processing and Management, 28(4): 503-516; Borlund, P. and
Ingwersen, P. (1997) “The development
of a method for the evaluation of interactive information retrieval
systems?/A>, Journal of Documentation, 53(3).
In Information
Processing and Management, 37 (3), May 2001, Special issue: Interactive
TREC :- Hersh, W. and Over, P. - ?SPAN
style="mso-bidi-font-size: 12.0pt">Interactivity at the Text
Retrieval Conference (TREC)?/SPAN>; Over, P. - “The TREC
interactive track: an annotated bibliography?/A>; Hersh et al. ??SPAN
style="mso-bidi-font-size: 12.0pt">Challenging conventional
assumptions of automated information retrieval with real users: Boolean
searching and batch retrieval evaluations?/SPAN>; Belkin, N. et al. “Iterative
exploration, design and evaluation of support for query reformulation in interactive
information retrieval?/A>; Allan, J. et al. ?“Evaluating
combinations of ranked lists and visualizations of inter-document
similarity?/A>; Wu, M. et al. ?“Using clustering
and classification approaches in interactive retrieval?/A>; Larson, R. R. - ?TREC
interactive with Cheshire II?/A>; Bodner, R. C. et al. ?“The impact of
text browsing on text retrieval performance?/A>; Yang, K. - “Passage feedback
with IRIS?/A>.
Belkin et al. ?SPAN
style="mso-bidi-font-size: 12.0pt">Rutgers' TREC 2001 Interactive
Track Experience?/SPAN>, at TREC 2001.
Preece, J., Rogers, Y. and
Sharp, H. (2002) ?“Interaction Design ?Beyond Human-Computer
Interaction?/SPAN> (and associated webpage)
?chapters on Evaluation.
Hull, D. ?SPAN
style="mso-bidi-font-size: 12.0pt">Using Statistical Testing in
the Evaluation of Retrieval Experiments?/SPAN>, SIGIR ?3; Wilcox, R. R.
“Statistics for Social Sciences?or any other book on Stats; also, a Statistics textbook
online.
Presentations:
Reid, J. ?SPAN
style="mso-bidi-font-size: 12.0pt">A Task-Oriented Non-Interactive
Evaluation Methodology for Information Retrieval Systems?/SPAN>, Information
Retrieval, 2(1), Feb 2000 - Melissa Roll.
PROJECT TOPICS DUE.
Lecture
12 ?Apr 18: Structure. Classification. Clustering.
Readings. van Rijsbergen
(1979), Chapter 3:
“Automatic classification? In Sparck Jones & Willett, from Chapter 6 the
article by Griffiths, Luckhurst & Willett; from Chapter 8, the article by
Hayes, Knecht and Cellio and the article by Rau; Leuski, Anton "Evaluating
Document Clustering for Interactive Information Retrieval", CIKM'01,
33-40; Hearst, Marti ?SPAN
style="mso-bidi-font-size: 12.0pt">The Use of Categories and
Clusters in Information Access Interfaces?/SPAN>, in Natural Language
Information Retrieval, Strzalkowski (ed.), Kluwer Academic Publishers, 1999; Sanderson,
M. and Croft, W. B. “Deriving
concept hierarchies from text?/A>, SIGIR 1999, Berkeley; Tombros, A., Villa,
R. and Van Rijsbergen, C. J. (2002) “The
effectiveness of query-specific hierarchic clustering in information
retrieval?/A>, Information
Processing and Management,
38(4); Yang, Yiming “An Evaluation of
Statistical Approaches to Text Categorization?/A>, Information Retrieval 1,
1999, p.69-90.
Hearst, M. A. and Pedersen,
J. O. ?SPAN lang=EN
style="mso-bidi-font-size: 13.0pt; mso-ansi-language:
EN">Reexamining the cluster hypothesis: scatter/gather on
retrieval results?/SPAN>, SIGIR?6,
Zurich, p. 76-84 ?Mary Ellen Valverde.
Kural, Y. and Robertson, S.
and Jones, S. “Deciphering
cluster representations?/A>, Information Processing and Management,
37, 2001, p. 593-601 ?Brendan Banks.
Lecture 13 ?Apr
25: IR on the Web.
Readings: See Journal of the
American Society for Information Science and Technology, 53(2), 2002 - Special issue on
Web research; Almind, T. C. and Ingwersen, P. (1997) “Informetric Analysis on the World Wide Web: Methodological Approaches to
Webometrics?/SPAN>, Journal of Documentation, 53(4); Chu, H.
and Rosenthal, M (1996) ?/SPAN>Search
Engines for the World Wide Web: A Comparative Study and Evaluation
Methodology?/A>, Proceedings of ASIS?6.
“The Internet: Bringing
Order from Chaos?/A>, special report in Scientific American, March
1997.
“PageRank: Bringing Order
to the Web?/A> the model behind Google.
Spink, Amanda (2002) ?SPAN
style="mso-bidi-font-size: 12.0pt">A user-centered approach to
evaluating human interaction with Web search engines: an exploratory
study?/SPAN>, Information Processing and Management, 38(3) ?Shilpa Shanbhag.
Ellis, D., Ford, N. and
Furner, J. (1998) “In search of
the unknown user: indexing, hypertext and the World Wide Web?/A>, Journal
of Documentation, 54(1) ?Jinyoung
Park.
Readings: Hersh, Chapter
9: “Linguistic Systems? In Sparck Jones & Willett, from Chapter 8, the
"Introduction" and any other article there that looks interesting;
Chapter 9. SIGIR?9 Workshop on Recommender
Systems, UC Berkeley; Set of articles on Recommender Systems in Communications
of the ACM, 40 (3), March 1997 ?leading article: Resnick, P. and Varian, H.
R. “Recommender Systems? Xie, H. “Patterns between
Interactive Intentions and Information-Seeking Strategies?/A>, Information
Processing and Management, 38, 2002; Chalmers,
Matthew “Paths
and Contextually Specific Recommendations?/A>, DELOS Workshop, 2001;
Pazzani, M. and Billsus, D. “Learning and
Revising User Profiles: The Identification of Interesting Web Sites?/A>, Machine
Learning 27, 1997, p313-331.
Gaizauskas, R. and Wilks, Y. (1998) “Information Extraction: Beyond Document Retrieval?/A>, Journal
of Documentation, 54(1) ?Fran Pfeffer.
Lecture 15 ?May
9: Discussion/presentation of final projects.