(Last updated on:
Sat Nov 20 13:39:25 GMT-8 2004
)
Current Project Status
N.B.: This course is finished. I am maintaining
this website for visitor's benefits. The projects below were done by
the students in Semester I of 2004. Students were required to make a
poster
presentation, these are the slides that were used. Their final
project submission was a paper in the form of a normal conference
submission (8-10 page limit). If you have any questions about the
project or would like to get a hold of their final report, please
email the appropriate student(s). You can also find earlier projects from the earlier version of this course run in Semester I 2003
- Lee Sue-Yin, Melvin Yap, Tan Hoon Hoon: Music retrieval using lyrics
- Neo Shi Yong, Low Jin Kiat: Multi-lingual Retrieval in Digital Video Library Using ASR with Speaker Verification
- Ho Van Phong: Analyzing Structural Similarity in XML Documents
- Eileen Khoo: Digital copyright in mobile phones
- Chia Hoo Hon: Conceptual Design of Electronic Internet News Automatic Capturing System for Digital Libraries
- Chen Chao, Cheng Weiwei, Jiang Zheng Ping: Applying Semantic Parsing to Question Answering System
- Feng Chun, Zhang Li: Warped Image Restoration with Applications to Digital Libraries
- Woo Wei Leng, Lee Chie Ping: InfoGate: Critique and improvements of the User Interface
- Kwan Weng Wah: Mobile Platform Digital Library Limitation without Today's Hardware and Software Constraint
- Li Qiang : Summarizing Discussion Threads
- Jon Tan Hiang Chuan, Wong Kok Hoong, Zhang Xia: Agent-based Recommender System on the Semantic Web
- Chen Ding, He Cong, Zhan Jiaming: Spelling Correction for Online Public Access Catalog Records in Library Integrated Catalogue (LINC)
- Fan Peck Ling, Ong Yi Jie Paulynn : Digital Library Social Policy: Open Access A New Proposed Business Model for Open Access
- Wang Jiren: Bioinformatics Data Integration Using Web Services
- Steven Halim: MSNMine: A Tool for Supporting Instant Messaging Research via Chat Logs
- Chong Kian Ming, Lim Choon Wu Gary, Ong Yong Wah Noel: Digital Divide in Singapore: Current State and Future Steps
Information about the joint poster presentation and a complete
listing of projects can be found
here.
The project write-up, including the slides that you used for your
presentation should be uploaded to the IVLE workbin by 11:59:59 pm
(IVLE time) by Friday, 19 November 2004. Please send me the updated
titles of your project as you finalize your project area.
What is the project?
Projects will be done individually or in groups of two or three.
Note that grading criteria for projects will not differ between
projects based on manpower; individuals and teams of two are often
better coordinated than teams of three, especially in short projects.
A good research project must (i) define a problem (ii) propose a
solution (iii) implement the solution (simulated or real) and (iv)
evaluate againsts any applicable existing solutions or related work.
Your research project can take one of the following
manifestations:
- New research problem/solution - You define a new, interesting
problem and propose a solution. Your solution does not have to be real
good, since you are pioneering a new area of research.
- Existing research problem/new solution - You look at an
existing, interesting problem, and propose a new, novel solution that
is better than existing solutions, which can lead to new ways of
looking/understanding the problem. Your solution doesn't have to
outperform existing methods in all categories but at least in some
particular domain. For example, we are concerned with digital
libraries in this course. It will suffice if your solution for
typical documents in digital libraries is statistically significantly
better than in the more general case.
- Existing research problem/compare existing solutions - You look
at an existing problem and its solutions. Implement the solutions,
compare them and provide new insights to why one solution is better
than another. Provide public-domain software for letting others share
and use your work.
- Build an innovative system - Build a novel application that no
one, or few, have built before. But most importantly, identify new
issues in your system that no existing solutions can adequately
solve.
- Empirical analysis of some collected data - Researchers often
need to build systems that actually solve or improve on real problems.
Papers that analyze the usability of systems or characterize the data
in some way assist others to understand the problem or the clientele
(our users) for a particular problem.
Remember, good research always teaches other researchers something
new.
I do not expect you to write any code from scratch. In fact, if
you have an account on sf3/sunfire, you can access a host of related
software that I use in my research, in the NLP/IR software
repository. Feel free to suggest to me other resources that you
feel would be useful to have installed and available to the class.
Also, please contact me if your quota of disk space is not sufficient
for you to do the scale of research that you need.
A few highlighted resources in the NLP/IR software repository that
can help you do research for this course are:
- NUS SMS corpus - a corpus of SMS messages collected from
students here.
- NUS / Excite query logs - one day logs of (the old) Excite
search engine, and ten months of queries for the NUS LINC system
(parsed and grouped by sessions).
- WT10G - a 10 gigabyte collection of web documents, used for
standard experiments.
- WEKA / SVMlight / Boostexter / etc. - easy to use machine
learning utilities. Probably the easiest to use is Boostexter;
and the most complete one is WEKA. WEKA includes a number of
different machine learners so that one can do a comparative
analysis of different machine learning algorithms on your data.
- Open directory project RDF dump - the data structure and
content of the ODP, a Yahoo! like repository of categorized
websites.
- WebBase statistics - document frequencies for a large portion
of tokens that appear on the web. Can be used for TF*IDF
calculations among other things.
Choosing a project
Below you will find a list of possible final projects. As this is
a seminar, research course, you will be primarily assessed on the work
you do on the final project. As such I expect and demand that each
student/team of students achieve some novel research development or
finding that is not a rehashing of the existing literature. The
midterm survey paper is intended to foster this understanding and
encourage you to poke into new territories.
You are welcomed and encouraged to propose alternate
projects. Your topic should blend together your strengths from your
background, experience and current coursework, yet be applicable to
digital libraries research. I have listed some ideas for projects in
certain areas. Teams that have taken projects that interest them
and/or have relevance to their research or jobs seem to always do
best. Some of the possible projects include (but are
not limited to):
- Social Network Analysis
- Building a better citation parser
- Web hyperlink classification
- Exploring the relationships between prestige, authorities and hubs
- Centrality and density of different genres of websites
- Automatic computation of an area's journal and conference reputations
- Access and Usability Issues
- Multi-object summarization
- The use of VR and immersive environments in the DL
- Efficient social network visualization
- Critique of current approaches in crosswalking of metadata
- Novel querying tools for E-mail, blogs, and IM
- Organizing photo and video content
- User modeling
- Classifying browsing and searching strategies based on
information trails
- Differences in retrieval effectiveness in speech queries
as opposed to text/typed queries
- Conceptual Search / Polysemy and synonymy
- Query expansion and restriction from user query logs
- Characterizing known item queries
- Automatic jargon and terminology canonicalization
- Classification and Filtering
- Automatic ACM classification for theses and technical reports
- Home page interest networking
- Automatic ODP categorization for web sites
- Threading and summarizing blog, email or IM searches
- Digital Library Creation
- GIS: Integration of maps at different scales
- Inferring useful metadata for genres of web documents
- Dateline and timeline history collection and canonicalization
- Digital Library Cataloging and Indexing
- Multimedia Metadata Features
- Digital Library Policy:
- Exploring the integrity of skyreading/skyreading and its
effect on scholarship.
- Cost models for the digital library in specialized domains/forms of media
- Convenience, user rights and usability of linkages in
the digital library
- Authorship Analysis
- Styles and Genres for authorship identification in web pages
- Linkage styles and classification for webpage creators
- Linking SMS and chat log short forms to long forms
I have references some starting references for some of these
topics. You may find it helpful to view past projects by previous
students of mine in a similar course, Special Topics in
Computer Science.
Project write-up, presentation and grading
Here are some slides on how to do your project proposal.
[ .pdf ] [ .htm ]
Part of the skills that you should practice in a project-based
graduate class is how to report your work. Expert researchers will
tell you that half (if not most) of your time on a project will
involve polishing your paper so it is easy to read and
straightforward. Generally, filling up the page limit is easy, but
deciding what to omit and how to succinctly express your idea is
difficult.
Your team's write-up will take the form of a research paper intended
for a conference submission with a 10 page limit. You should use an
ACM proceedings style (You can follow the instructions for WWW 2004,
for example). You may supplement this with a reference to your
project's website / blog (if one was created) and any amount of
appendices that you feel will help determine a grade. Selected final
projects will be asked to submit their work to a relevant conference
or journal, such as the ones listed on the
miscellaneous page of this site.
On the last class session we will not have class. In lieu of
class, we will meet for three hours during the following Saturday. It
will be part of joint poster evaluation session (for this course and a
few other graduate courses that have a research focus. It will be
broken up into three 55-minute periods. Class projects will be
assigned to either Group 1, 2 or 3 (presenting in the first, second or
third 55 minute session, respectively). The presentation will be in a
poster style (with demos, as appropriate). If you are not presenting
in a session, walk around the class and learn about your peers'
projects.
Students from other classes will be coming in to review and learn
about your project, so be ready to answer their questions as well.
Similarly, you should take this opportunity to network and get to know
your fellow students in other classes and their projects. SoC is a
friendly place after all, so please do yourself and others a favor and
discuss your projects.
Grading for the project's final report and presentation are likely
to follow similar weights as ones used in
the previous version of this course.
Final Workload Disclaimer
The project is the primary method in which you will be assessed
for your course. The workload throughout the rest of the course is
purposely light to ensure that you have enough time to produce
high-quality research in the project. As such you need to budget your
team's time wisely and ensure that you have appropriately scoped your
project and covered the topic with enough detail and with appropriate
evaluation. Part-time students with other commitments need to be
particularly aware of this, as past cases have shown this problem
crops up with part-time students most often.
Some students inevitably start the project too late or mismanage
their time and neglect such open-ended courses, in order to advance in
classes that have more concrete assessment milestones. I warn you now
to budget your time between classes wisely. As this is a four MC
module, there are ten hours of time that a student should allot to
this course. Eight of these are preparation time, and for this course
the bulk of this time is intended for your project. Roughly speaking,
you should invest about 7 weeks * 6 hours/week = 42 hours on your
project.
Min-Yen Kan <kanmy@comp.nus.edu.sg>
Created on: Mon Dec 1 19:36:22 2003
| Version: 1.0
| Last modified:
Mon Nov 29 14:22:48 2004