The first four weeks will focus on building a digital library and
will be reinforced by a practical homework assignment. The remaining
lectures will focus on using digital libraries.
Any questions about this information should be directed to the
general forum on IVLE. Supplemental readings are marked with a "*".
Week 1: (16 Aug)
| Orientation / Fundamentals of information
retrieval Course information, policies and scope, breadth of
research encompassed by DLs. Document indexing, TF*IDF, Boolean
retrieval model, Vector space model, Algebraic models of retrieval.
Slides: [ .htm ]
[ .pdf ]
- Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
- Bush (1945) As we may think, The Atlantic Monthly (selected parts during class) [ Sections 6-7]
- Taylor (2004), Chapter 1, Organization of Recorded Information
- Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
- *Lesk (1997), Chapter 1, Evolution of Libraries
- *Lesk (1997), Chapter 2, Text Access Methods
- *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
- *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
- *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
Week 2: (23 Aug)
| Storing information Multimedia encodings: text
(SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG,
PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).
Slides: [ .htm ] [ .pdf ] Self-study module: Huffman Encoding [ .htm ] [ .pdf ]
- Lesk (1997), Chapter 3, Images of Pages.
- Lesk (1997), Chapter 4, Multimedia Storage and Access.
- *Bainbridge, Nevill-Manning, Witten, Smith and McNab, (1999) Towards a Digital Library of Popular Music.
Available from the ACM Digital Library or LINC or directly from Nevill-Manning's website).
- *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
- *Witten, Moffat and Bell (1999), Chapter 7, Section 1
- *Witten, Moffat and Bell (1999), Chapter 8.
| - Pick and finalize survey paper area (in class).
- Homework #1 out (Building a digital library with Greenstone)
- Greenstone tutorial
Week 3: (Make-up lecture: 29 Aug 2005 SR 4 (SoC 1, Lvl 6 #12))
| Classification Traditional classification
schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick
Slides: [ .htm ] [ .pdf ] Self-study module: WordNet [ .htm ] [ .pdf ]
- Lesk (1997), Chapter 5, Sections 5.1-5.3, Knowledge
Representation Methods.
- Taylor (2004), Chapter 4, Encoding Standards.
- Taylor (2004), Chapter 6, Metadata.
- *Marshall, Catherine (1998), Making Metadata: a study of metadata creation for a mixed physical-digital collection. In Proc of Digital Libraries 1998
- *Ipeirotis et al. (2002) Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, in Proc. of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL), 2002.
- *Vellucci, Sherry L. "Metadata." Annual Review of Information
Science and Technology 33 (1998): 187-222. On reserve from the RBR.
Week 4: (30 Aug)
| DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, OpenURL. DL economics and social policy and issues.
Slides: [ .htm ] [ .pdf ]
Week 5: (6 Sep)
| Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.
Slides: [ .htm ] [ .pdf ]
- For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
- For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment
In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
- Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital
Libraries and Autonomous Citation Indexing
- ISI's
Impact Factor: Essays by Eugene Garfield on citation analysis
(Re-printed from Current Contents)
- *Simone Teufel and Marc Moens (2000) What's
yours and what's mine: Determining Intellectual Attribution in
Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
- *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
- Homework #1 due directly to me by 6 Sep, 11:59 pm SGT. You may
submit your CDROM after class or on 7 Sep during office hours.
Week 6: (13 Sep)
| Information seeking Reference interviews,
Information seeking process, Anomalous state of knowledge.
Slides: [ .htm ] [ .pdf ]
- Survey papers due in IVLE workbin by 13 Sep 11:59 pm SGT.
- Form project teams and schedule a meeting with Min to go over your project proposal.
Mid-semester Break (Fri 16 Sep - Thu 22 Sep 2005)
Week 7: (27 Sep)
| User interfaces for querying and displaying
documents Survey of query (text, Venn, faceted metadata) and
document displays (ranked list, Infocrystal, Table lens, tilebars)
Slides: [ .htm ] [ .pdf ]
- Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
- Hearst, Marti A. (1999) User Interfaces, In
Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
- Project proposals returned.
- Homework #1 grades returned.
Week 8: (10 Oct 8-10 *AM*, LT 33)
| Usage patterns in the DL Usage mining. How DLs
and web sites are used, and their relation to information seeking and
Slides: [ .htm ] [ .pdf ]
Week 9: (11 Oct)
| Computational Analysis of Genre, Authorship and Duplication
Authorship attribution, Plagiarism detection.
Self-Study on Naive Bayes: [ .htm ] [ .pdf ]
Slides: [ .htm ] [ .pdf ]
- Khmelev and Teahan (04) A repetition based measure for
verification of text collections and for text categorization, Proc. of
WWW 2004.
- Karlgren & Cutting (94) Recognizing Text
Genres with Simple Metrics Using Discriminant Analysis, Proc. of
- *Mosteller & Wallace (63) Inference in an authorship problem, J American Statistical Association 58(3)
- *de Vel, Anderson, Corney & Mohay (01) Mining Email Content for Author Identification Forensics, SIGMOD Record
- *Foster (00) Author Unknown. Owl Books PE1421 Fos
- *Biber (89) A typology of English texts, Linguistics, 27(3)
- *Lee and Myaeng (02) Text genre classification with genre-revealing and subject-revealing features, SIGIR 02
- *Shivakumar & Garcia-Molina (95) SCAM: A copy
detection mechanism for digital documents, Proc. of DL 95
- *Belkouche et al. (04) Plagiarism Detection in Software Designs, ACM Southeast Conference
- *Bilenko and Mooney (03) Adaptive duplicate detection using learnable string similarity measures, Proc. of KDD 03.
- *Ramaswamy et al. (04) Automatic detection of fragments in dynamically generated web pages, Proc. WWW 04.
- Survey papers returned.
- Homework #2 out - (Authorship attribution of reviews)
- SVMlight tutorial (immediately following class)
Week 10: (18 Oct)
| Collaborative Filtering
- Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive
Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [
- *Lam and Riedl (04) Shilling recommender systems for fun and
profit. In Proc. of WWW 2004. [ ACM Portal
link ]
Week 11: (25 Oct)
| Instant Messaging, Email, Web logs and Wikis: New media for
information Characteristics and their use, tracking knowledge
development in new media, and
Course Revision.
Slides: [ .htm ] [ .pdf ]
- Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
- Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
- *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
- *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
- *Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
- Homework #2 due in IVLE workbin by 25 Oct 11:59 pm SGT
Week 12: (1 Nov)
| Deepavali No class. Poster presentations in lieu of class, later on the 19th.
| |
Reading Week (Fri 12 Nov - Thu 18 Nov 2004)
19 Nov, Sat
| Project Presentations Poster presentations in Min's office (S15 05-05).
Final Exam (Tue 22 Nov 7:30-9:30 pm)