We will have a short, 30-minute or so orientation meeting on the
8th. If you cannot make it, just review the slides in IVLE. The first
four weeks will focus on building a digital library and will be
reinforced by a practical homework assignment. The remaining lectures
will focus on using digital libraries.
I will be away from the 3-9 October, so I'm rescheduling
the class one day earlier on Monday (venue to be announced). Hari Raya
Puasa falls on our lecture day so we have one fewer lecture as a
result.
The lecture notes here are not as complete as those in IVLE. You
should use the ones in IVLE if possible. Any questions about this
information should be directed to the general forum on IVLE.
Supplemental readings are marked with a "*".
The hyperlinks here all work as of Fri Jul 14 14:35:52 GMT-8 2006,
when I updated this page. Use a search engine with the appropriate
text if the links below stop working.
Date
|
Description
|
Deadlines
|
Week 0: (8 Aug)
| Orientation Course information, policies and scope
Slides: [ .htm ]
[ .pdf ] (same link as next week)
|
- Please fill out the pre-flight survey in IVLE
|
Week 1: (15 Aug)
| Orientation / Fundamentals of information
retrieval Course information, policies and scope, breadth of
research encompassed by DLs. Document indexing, TF*IDF, Boolean
retrieval model, Vector space model, Algebraic models of retrieval.
Slides: [ .htm ]
[ .pdf ]
Readings:
- Baeza-Yates and Riberto-Neto (1999), Chapter 2.1 - 2.5.5
- Bush (1945) As we may think, The Atlantic Monthly (selected parts during class)
- Taylor (2004), Chapter 1, Organization of Recorded Information
- Witten, Moffat and Bell (1999), Chapter 3.1 - 3.3 (up to Nonparameterized models) & 3.7
- *Lesk (1997), Chapter 1, Evolution of Libraries
- *Lesk (1997), Chapter 2, Text Access Methods
- *Lesk (1997), Chapter 5, Sections 5.5-5.8, Knowledge Representation Methods
- *Witten, Moffat and Bell (1999), Chapter 4.4 - 4.6 (Query processing)
- *Witten, Moffat and Bell (1999), Chapter 5.1 - 5.2 (Index Construction)
|
|
Week 2: (22 Aug)
| Storing information Multimedia encodings: text
(SGML, Unicode, TEI, XML), page images (CCITT FAX IV) images (JPEG,
PNG, PS), video (MPEG), audio (MP3,WAV), synchronized media (SMIL).
Slides: [ .htm ] [ .pdf ] Self-study module: Huffman Encoding [ .htm ] [ .pdf ]
Readings:
- Lesk (1997), Chapter 3, Images of Pages.
- Lesk (1997), Chapter 4, Multimedia Storage and Access.
- *Bainbridge, Nevill-Manning, Witten, Smith and McNab, (1999) Towards a Digital Library of Popular Music.
Available from the ACM Digital Library or LINC).
- *Witten, Moffat and Bell (1999), Chapter 6.1 - 6.2, 6.5 (GIF/PNG section only), 6.6 (JPEG section)
- *Witten, Moffat and Bell (1999), Chapter 7, Section 1
- *Witten, Moffat and Bell (1999), Chapter 8.
| - Pick and finalize survey paper area (in class).
- Homework #1 out (Building a digital library with Greenstone)
- Greenstone tutorial
|
Week 3: (29 Aug)
| Classification Traditional classification
schemes (DDC, LCSH, MeSH), metadata types, Dublin core, Warwick
framework.
Slides: [ .htm ] [ .pdf ] Self-study module: WordNet [ .htm ] [ .pdf ]
Readings:
- Lesk (1997), Chapter 5, Sections 5.1-5.3, Knowledge
Representation Methods.
- Taylor (2004), Chapter 4, Encoding Standards.
- Taylor (2004), Chapter 6, Metadata.
- *Marshall, Catherine (1998), Making Metadata: a study of metadata creation for a mixed physical-digital collection. In Proc of Digital Libraries 1998
- *Ipeirotis et al. (2002) Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative, in Proc. of the Second ACM+IEEE Joint Conference on Digital Libraries (JCDL), 2002.
- *Vellucci, Sherry L. "Metadata." Annual Review of Information
Science and Technology 33 (1998): 187-222. On reserve from the RBR.
|
|
Week 4: (5 Sep)
| DL policy, interoperability and access rights
Identifiers: Open Archives Initiative, OpenURL. DL economics and social policy and issues.
Slides: [ .htm ] [ .pdf ]
Readings:
|
|
Week 5: (12 Sep)
| Bibliometrics and its applications
Laws of bibliometrics, Citations and references, Pagerank, HITS.
Slides: [ .htm ] [ .pdf ]
Readings:
- For more details on Google in general: Brin and Page (1998) The Anatomy of a Search Engine.
- For more details on the HITS algorithm: Jon Kleinberg (1998) Authoritative sources in a hyperlinked environment
In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York.
- Steve Lawrence, C. Lee Giles, Kurt Bollacker (1999) Digital
Libraries and Autonomous Citation Indexing
- ISI's
Impact Factor: Essays by Eugene Garfield on citation analysis
(Re-printed from Current Contents)
- *Simone Teufel and Marc Moens (2000) What's
yours and what's mine: Determining Intellectual Attribution in
Scientific Text Proceedings of EMNLP Hong Kong, Oct 2000.
- *Steve Hitchcock, Arouna Woukeu, Tim Brody, Les Carr, Wendy Hall and Stevan Harnad (2003) Evaluating Citebase: Key Usability Results
|
- Homework #1 due at the middle of class 12 Sep, 7:30 pm SGT. You may
submit your CDROM during class break.
|
Week 6: (19 Sep)
| User interfaces for querying and displaying
documents Survey of query (text, Venn, faceted metadata) and
document displays (ranked list, Infocrystal, Table lens, tilebars)
Slides: [ .htm ] [ .pdf ]
Readings:
- Lesk (1997), Chapter 7, Usability and Retrieval Evaluation, Sections 7.1-7.5.
- Hearst, Marti A. (1999) User Interfaces, In
Baeza-Yates and Ribeiro-Neto (eds.), Modern Information Retrieval, 1999.
|
- Survey papers due in IVLE workbin by 19 Sep 11:59 pm SGT.
- Form project teams and schedule a meeting with Min to go over your project proposal.
|
Mid-semester Break (Fri 23 Sep - Fri 30 Sep 2006)
|
Week 7: (Special date: Mon, 2 Oct in TR 3(S16 03-09)
| Information seeking Reference interviews,
Information seeking process, Anomalous state of knowledge.
Slides: [ .htm ] [ .pdf ]
Readings:
|
- Project proposals returned.
- Homework #1 grades returned.
|
Week 8: (10 Oct)
| Usage patterns in the DL and the Web Usage mining. How DLs
and web sites are used, and their relation to information seeking and
HCI.
Slides: [ .htm ] [ .pdf ]
Readings:
|
|
Week 9: (17 Oct)
| Computational Analysis of Genre, Authorship and Duplication
Authorship attribution, Plagiarism detection.
Self-Study on Naive Bayes: [ .htm ] [ .pdf ]
Slides: [ .htm ] [ .pdf ]
Readings:
- Khmelev and Teahan (04) A repetition based measure for
verification of text collections and for text categorization, Proc. of
WWW 2004. [ CiteSeer Link ]
- Karlgren & Cutting (94) Recognizing Text
Genres with Simple Metrics Using Discriminant Analysis, Proc. of
COLING-94. [ CiteSeer Link ]
- *Mosteller & Wallace (63) Inference in an authorship problem, J American Statistical Association 58(3)
- *de Vel, Anderson, Corney & Mohay (01) Mining Email Content for Author Identification Forensics, SIGMOD Record
- *Foster (00) Author Unknown. Owl Books PE1421 Fos
- *Biber (89) A typology of English texts, Linguistics, 27(3)
- *Lee and Myaeng (02) Text genre classification with genre-revealing and subject-revealing features, SIGIR 02
- *Shivakumar & Garcia-Molina (95) SCAM: A copy
detection mechanism for digital documents, Proc. of DL 95
- *Belkouche et al. (04) Plagiarism Detection in Software Designs, ACM Southeast Conference
- *Bilenko and Mooney (03) Adaptive duplicate detection using learnable string similarity measures, Proc. of KDD 03.
- *Ramaswamy et al. (04) Automatic detection of fragments in dynamically generated web pages, Proc. WWW 04.
|
- Survey papers returned.
- Homework #2 out - (Authorship attribution of Amazon.com reviews)
- SVMlight tutorial (immediately following class)
|
Week 10: (31 Oct)
| Social Navigation: Collaborative Filtering
Readings:
- Breese, Heckerman, Kadie (98) Empirical Analysis of Predictive
Algorithms for Collaborative Filtering, Proc. of Uncertainty in AI [ CiteSeer link ]
- *Lam and Riedl (04) Shilling recommender systems for fun and
profit. In Proc. of WWW 2004. [ ACM Portal
link ]
- *Wexelblat and Maes (99)
Footprints: History-rich tools for information foraging. In Proc. of CHI 1999. [ CiteSeer link ]
- *Resnick et al. (94) GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Internal Research Report, MIT Center for Coordination Science. [ CiteSeer link ]
- *Sarwar et al. (01) Item-based collaborative filtering recommendation algorithms. In Proc. of WWW '01 [ CiteSeer link ]
- *Shardanand and Maes (95) Social Information Filtering: Algorithms for Automating Word of Mouth. In Proc. of CHI '95 [ ]
- *Smyth et al. (04) Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. User Modeling and User-Adapted Interaction. 14(5)
- *Lam and Riedl (04) Shilling recommender systems for fun and
profit. In Proc. of WWW 2004. [ ACM Portal
link ]
|
|
Week 11: (7 Nov)
| Library 2.0: New Media
Characteristics and their use, tracking knowledge development and
dissemination in new media: Email, Instant Messaging, Weblogs, Wikis
and Folksonomies.
Slides: [ .htm ] [ .pdf ]
Readings:
- Maness (2006) Library 2.0 Theory: Web 2.0 and Its Implications for Libraries. Webology 3(2) 2006.
- Bellotti et al. (2003) Integrating tools and tasks: Taking email to task: the design and evaluation of a task management centered email tool, Proc. CHI 2003
- Gruhl et al. (2004) Information diffusion through blogspace, Proc. WWW 2004.
- Sen et al. (2006) tagging, communities, vocabulary, evolution. Best paper at CHI 2006.
- *Kleinberg (2003) Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7(4)
- *Jackson et al. (2003) Understanding email interaction increases organizational productivity, CACM
- *Campbell et al. (2003) Expertise identification using email communications, Proc. CIKM 2003.
- *Golder and Huberman (2006) Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2) 198-203.
.
|
- Homework #2 due in IVLE workbin by 7 Nov 11:59 pm SGT
|
Week 12: (14 Nov)
| No Class
| Poster presentations in lieu of class. See below.
| |
Reading Week (Fri 12 Nov - Thu 24 Nov 2006)
|
Mon 20 Nov 6-9 pm
| Project Presentations Poster presentations in TR 4, SR 1, TR 5. You will presenting your poster to me during the SoC Graduate Course Project Poster Session. See http://www.comp.nus.edu.sg/~kanmy/courses/poster_session_sem1_2006/.
|
|
Tue 21 Nov 6:30-8:30 pm @ SR4 (SoC 1 06-12)
| Course Revision
|
|
Final Exam (Tue 28 Nov, evening [SR1 S16 3/F, 7:30-9:30pm])
|