Any questions about this information should be directed to the
general forum on IVLE. Note that no university holidays affect the scheduling for this course.
Supplemental readings are marked with a "*".
Unit
|
Date
|
Description
|
Deadlines
|
What is a DL |
Week 0: (3 Aug)
| Class cancelled due to school policy.
|
Building a DL |
Week 1: (10 Aug)
| Orientation / Fundamentals of information
retrieval Course information, policies and scope, breadth of
research encompassed by DLs. Document indexing, TF*IDF, Boolean
retrieval model, Vector space model, Algebraic models of retrieval.
Slides: Lecture Notes [ .htm ]
[ .pdf ]
Readings:
- Vannevar Bush (1945) As we may think, The Atlantic Monthly (selected parts during class) [ Section 6 ] [ Section 7 ]
- Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
- *Lesk (1997), Chapter 1, Evolution of Libraries
- *Lesk (1997), Chapter 2, Text Access Methods
- *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
- *Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
- *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
- *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
|
|
Week 2: (17 Aug)
| Storing information Multimedia encodings: text
(SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG,
PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).
Slides: [ .htm ] [ .pdf ] Self-study module: Huffman Encoding [ .htm ] [ .pdf ]
Readings:
- David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, Rodger J. McNab, (1999) Towards a Digital Library of Popular Music.
Available from the ACM Digital Library or LINC or directly from Nevill-Manning's website).
- Lesk (1997), Chapter 3, Images of Pages.
- Lesk (1997), Chapter 4, Multimedia Storage and Access.
- *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
- *Witten, Moffat and Bell (1999), Chapter 7, Section 1
- *Witten, Moffat and Bell (1999), Chapter 8.
| Pick and finalize survey paper area (in class).
|
Week 3: (24 Aug)
| Classification Traditional classification
schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick
framework.
Slides: [ .htm ] [ .pdf ] Self-study module: WordNet [ .htm ] [ .pdf ]
Readings:
|
|
Week 4: (31 Aug)
| DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, metadata harvesting, OpenURL.
DL economics and social policy and issues.
Slides: [ .htm ] [ .pdf ]
Readings:
|
|
|
Week 5: (7 Sep)
| One-hour Midterm Short session to catch up with material presented thus far.
Slides (are an abbreviated form of last week's): [ .htm ] [ .pdf ]
Midterm test [ .pdf ]
|
|
Using the DL |
Week 6: (14 Sep)
| Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.
Slides: [ .htm ] [ .pdf ]
Readings:
- For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
- For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment
In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
- Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital
Libraries and Autonomous Citation Indexing
- ISI's
Impact Factor: Essays by Eugene Garfield on citation analysis
(Re-printed from Current Contents)
- *Simone Teufel and Marc Moens (2000) What's
yours and what's mine: Determining Intellectual Attribution in
Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
- *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
| Survey papers due in IVLE by the 17th. Form project teams and schedule a
meeting with Min to go over your project proposal.
|
Week 7: (18 Sep) 1 hr make-up lecture
| Semantic Web
Motivation for the Semantic Web, SW Layer cake, overview on RDF, OWL.
Slides: [ .htm ] [ .pdf ]
Project proposal slides: [ .pdf ] [ .htm ]
Readings:
- Tim Berners-Lee, James Hendler and Ora Lassila (2001) The Semantic Web, Scientific American, May 2001.
- Frank Manola and Eric Miller (2004) RDF Primer, W3C Recommendation.
Read Sections 1, 2.1-2.2, 2.5, 3.1.
- *James Hendler (2003) Science and the Semantic Web, Science.
- *Michael K. Smith, Chris Welty, Deborah
McGuinness (2003) OWL Web
Ontology Language Guide, W3C Recommendation. Read up to Sections
1-3, and relevant portions ofSection 6. You should read the text
under the main headers and the first subheader (e.g., Section 1,
Section 1.1). You can safely skip the material in the sub-subheaders
(e.g., 1.1.1)
|
Mid-semester Break (Sun 19 Sep - Thu 23 Sep 2004)
|
Week 8: (29 Sep)
| Information seeking Reference interviews,
Information seeking process, Anomalous state of knowledge.
Slides: [ .htm ] [ .pdf ]
Readings:
| Project proposals returned.
|
Week 9: (5 Oct)
| User interfaces for querying and displaying
documents Survey of query (text, Venn, faceted metadata) and
document displays (ranked list, Infocrystal, Table lens, tilebars)
Slides: [ .htm ] [ .pdf ]
Readings:
- Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
- Hearst, Marti A. (1999) User Interfaces, In
Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
| Survey paper grades.
|
Week 10: (12 Oct)
| Usage patterns in the DL Usage mining. How DLs
and web sites are used, and their relation to information seeking and
HCI.
Slides: [ .htm ] [ .pdf ]
Readings:
| Midterm grades returned.
Midterm Answers [ .pdf ]
|
Week 11: (19 Oct)
| Evaluation Traditional library evaluation,
review of standard IR evaluation metrics.
Slides: [ .htm ] [ .pdf ]
Readings:
- Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Section 7.6
- Witten, Moffat and Bell (99) Managing Gigabytes, Section 4.5.
- *Baker and Lancaster (91) The Measurement and Evaluation of Library Services, Information Resources Press
|
|
Week 12: (26 Oct)
| Extended services for the DL
Collaborative filtering, Recommender systems, Reputation schilling,
Authorship attribution, Plagiarism detection.
Slides: [ .htm ] [ .pdf ]
Addendum on Naive Bayes: [ .htm ] [ .pdf ]
Readings:
- Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive
Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [
.ps
]
- Khmelev and Teahan (04) A repetition based measure for
verification of text collections and for text categorization, Proc. of
WWW 2004.
- *Lam and Riedl (04) Shilling recommender systems for fun and
profit. In Proc. of WWW 2004. [ ACM Portal
link ]
- *Karlgren & Cutting (94) Recognizing Text
Genres with Simple Metrics Using Discriminant Analysis, Proc. of
COLING-94.
- *Shivakumar & Garcia-Molina (95) SCAM: A copy
detection mechanism for digital documents, Proc. of DL 95
|
|
Week 13: (2 Nov)
| Instant Messaging, Email, Web logs and Wikis: New media for
information Characteristics and their use, tracking knowledge
development in new media, and
Course Revision.
Slides: [ .htm ] [ .pdf ]
Readings:
- Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
- Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
- *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
- *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
- *Christopher Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
|
|
|
| Week 14: (9 Nov)
| Project Presentations No class. Poster presentations in lieu of class.
| Final project poster presentation.
|
| Reading Week (Fri 12 Nov - Thu 18 Nov 2004)
|