Homework #2 - Factual List Question Answering
In this assignment, you will developing a question answering system
for list questions. A list question differs from the traditional
factoid question in that there are multiple correct answers. As such,
a question answering system that answers list questions is assessed on
the completeness of the list returned to the user. Like factoid
question answering, the answer returned should be an exact answer --
additional verbiage outside of the question will be penalized.
Your system will be fed correctly spelled, well specified questions
in English. They will be in the format of a question asked on a
single line of input, provided to your program by standard input.
Your program should return a list of relevant answers on standard
output. No difference in score will be assessed to answers on
different lines; all answers are judged equally important.
For example, given a list question such as "List the public
universities in Singapore", the (current) answers should be "National
University of Singapore and Nanyang Technological University" (SMU is
a private university, but some of its funding comes from public
coffers). Each result should on a separate line, written in UTF-8,
and separated by a carriage return. If there are no valid answers to
the question, a single line response with the word "nil" should be
returned. No question will have more than fifty correct answers, and
all questions that do not have a specific timeframe indicated will
refer to the present answer (vs. historical).
To assess your list QA system, we'll be again testing it with some
training and test questions. Below are five list questions which your
system should be able to return a correct result, which are provided
for you for training and development of your system.
- List the lines in the Hong Kong MTR system: Tsuen Wan, Kwun
Tong, Island, Tung Chung, Airport Express, Tseung Kwan O, Disneyland
Resort.
- List the member countries of the Association of Southeast Asian
Nations (ASEAN): Brunei Darussalam, Cambodia, Indonesia, Laos, Malaysia,
Myanmar, Philippines, Singapore, Thailand, Vietnam
- List the classic ice cream flavors that are produced by
Häagen-Dazs: Baileys® Irish cream, Banana split, Black walnut,
Butter pecan, Caramel cone, Cherry vanilla, Chocolate, Chocolate
chip cookie dough, Chocolate chocolate chip, Chocolate peanut
butter, Cinnamon dulce de leche, Coffee, Cookies & cream,
Crème Brulée, Dulce de leche, Mango, Mint chip, Mocha
chip, Pineapple coconut, Pistachio, Rocky road, Rum raisin, Sticky
toffee pudding, Strawberry, Vanilla, Vanilla bean, Vanilla chocolate
chip, Vanilla swiss almond, White chocolate raspberry truffle
- List the world's rivers that are over 8000 kilometers (km)
long: nil
- List the Exchange Traded Funds (ETFs) that are listed on the
Singapore Exchange (SGX) that are not U.S. cross-listed: ABF
Singapore Bond Index Fund, CIMB FTSE ASEAN40 ETF, Daiwa FTSE Shariah
Japan 100, iShares MSCI India ETF, Lyxor ETF China Enterprise
(HSCEI), Lyxor ETF Commodities CRB, Lyxor ETF Hong Kong (HSI), Lyxor
ETF India (S&P CNX Nifty), Lyxor ETF Japan (TOPIX©), Lyxor
ETF MSCI AC Asia-Pacific Ex Japan, Lyxor ETF Korea, Lyxor ETF
Taiwan, SPDR® GOLD SHARES, streetTRACKS® Strait Times Index
Fund
Aside from these five questions, there will be two additional
sources of questions. Each homework submission (individual or group)
will need to come up with a single list question and its correct
answer by Week 9 (15 Oct) of the assignment and will post it to the
forum. This is counted as a deliverable for your assignment, and will
be graded. Together with the above five questions, the answers to
these questions will form the known set of list questions that your
system will be graded on. (Update 4 Nov: An updated zip file of the queries is now available: hw2-needs-v2.0.zip
An additional five hidden (test) questions, will be revealed after
the assignment is due. Your system will also be assessed on these test
questions. The hidden questions will be slightly higher in assessment
weight than the training questions. Minor typographical differences
(capitalization, diacritics) as well as misspellings and variations on
names will be counted as correct. All answers should be in English
where possible (e.g., Question 1 above also has corresponding Chinese
answers).
You can again work in teams of two or individually for this
assignment. There will be no adjustment to scores in factoring for
whether the assignment is done in a team or individually. However, if
you had previously worked in Homework #1 as a group, you will not be
allowed to do this assignment in a group, you may only do this
individually; the group option is only open to those who have done
Homework #1 as individuals.
Note that since this is an assignment that comprises at least 25%
of your grade, I expect the level of effort for this assignment to be
similar. You have five weeks to do this assignment. The list
questions will all be numbered and be made available as a zip archive,
following the submission and verification of all list questions by
Week 10.
Restrictions: You are allowed to use any
resources on the web that are not themselves list or factoid question
answering systems. For example, submitting the questions to Ask.com's
question answering service is not allowed. Also, retrieving and
analyzing this specific homework page (which has the answers to the
five training questions), is not allowed -- you may have to write code
specifically to discard URLs / web resources that come from
http://www.comp.nus.edu.sg.
What to turn in
You will upload an HT0000000.zip (where HT0000000 is your matric
ID, where all letters are in uppercase) archive by the due date,
consisting of the following four sets of items. Please use a ZIP (not
RAR, B2Z or TAR) utility to construct your submission. Do not include
a subdirectories in the submission to extract to (e.g., unzipping
X.zip should give files like X.sum, not X/X.sum or
submission/X.sum). Please use all capital letters when writing your
matric number (matric numbers should start with U, NT, HT or HD for
all students in this class). Your cooperation with the submission
format will allow me to grade the assignment in a timely manner. Note
that I do not want to know who you are, with respect to grading
assignments, so it is important that you try not to reveal your
identity in your submission. Please follow the below instructions to
the letter.
- A summary file in plain text (not MS Word, not OpenOffice),
that describes your submission and the architecture for retrieval.
You should include your matric number and your NUS (u|g) prefixed
email address as the only form of ID. In this file you also need to
describe how your source code can be built and executed on
sf3/sunfire. If your submission cannot be run on sunfire, you'll
need to demonstrate it to me, sometime soon after the submission
date (by downloading your submission file and running it on your
system). You should include notes about the development of your
submission, and special features that you developed to handle the
structure of the queries and documents (filename:
ReadmeHT0000000.txt, where HT0000000 is your matric ID).
Warning! If you use any lexicons, resources, code or
algorithmic description that are beyond the references on this page,
you need to give proper credit and acknowledge the contribution of
others. Please cite or acknowledge work that helped you that you
did not do on your own. I will deduct the credit accordingly, if
applicable. Failure to acknowledge your sources constitutes
plagiarism and will be punished accordingly.
- The files for the retrieval results for all public queries.
These should be in a similar form to the gold-standard files; the
list question ID on the first line and the answers on the following
lines. Each answer line should have the exact answer, followed by a
tab (\t) character, followed by a URL where the answer was extracted
from These files should named
nX.txt
, where X should be
replaced by the list question ID. A sample file is here.
- Your source code tree. These should be relatively well
documented so that I can follow the logic of your code, with the
help of the
ReadmeHT0000000.txt
file. Typing in "make"
or "ant" should build the appropriate code, such as an executable,
if needed. In your assignment submission, please do not assume that
any environment variables (e.g., PATH and CLASSPATH) are necessarily
correctly set. The executable file to run your system should be
named runHT0000000
(where HT0000000 is to be replaced
by your matric number, as above) and be set as executable (by you or
by your buildfile if it is compiled).
Grading scheme
Your grade will take into account 1) features used, 2) retrieval
accuracy, 3) peer annotation, 4) documentation and 5) time efficiency.
These factors are listed in order of importance/weighting to your
final grade for the assignment. Warning -- I will be reading your
code, so please make sure it is tidy and well documented.
- [41 percent] Features used. This will be judged on the basis
on your code and your summary file. What features do you use,
whether you take advantage of the semi-structure in the input,
how you modified the ranking score to get the final results.
- [37 percent] Retrieval accuracy. This will be judged based on
the pooled list judgments that all students turn in (the
nX-answers.txt
files in your submission. I will
also include some additional test queries that you will not
know ahead of time. I'll be using the average instance
precision and instance recall metrics, as used in the TREC 2004
list question evaluation, except that a "nil" answer will only
be scored with an IP/IR score of 1 if and only if the system
returns only "nil" as the only answer. Note that minor changes
to the answer (case differences, missing diacritics, etc.) that
do not change the semantics of the answer
- [7 percent] List Question and Answer. To judge #2 (retrieval
accuracy) I will be looking at your submitted question and its
answer, for completeness, lack of ambiguity and possible
judgment. As discussed in class, questions such as "Where is
the Taj Mahal?" would be considered a poor question (since
several answers are possible). Your list question should
include any expansions of acronyms and should have at most one
scoping clause (e.g., in question 5, "that are not
U.S. cross-listed"). Note: You may decide to ask a question
that generates a nil answer -- you are not obligated to have a
question that has answers.
- [13 percent] Documentation. How well the summary file and
source code is documented. This will include how easy it is
for me to run your software and the state of your code (is it
readable, and the workflow well partitioned?).
- [2 percent] Time efficiency of the system. As long as the
system takes no longer than 5 minutes to produce the results
for a question, it will be considered satisfactory. Again, the
purpose of this is to ensure that your system can be run and
graded within three weeks.
Due date and late policy
According to the syllabus, this homework is due on 5 Nov at 11:59
pm SGT. Submit your zip file to the IVLE workbin by
this time. The late policy for submissions applies as per the policy
set forth on the "Grading" page.
References
- The BOSS homepage. Probably not as useful as the forum or the PDF documentation.
- wget - an open-source command-line URL fetching utility. Also already installed on sunfire. Recommended for interacting with BOSS.
- A slightly
outdated list of QA system components and papers by Nimar
S. Arora, of the UC Berkeley TREC group.
- You may want to read Ellen Voorhees' paper, Implementing
a Question Answering Evaluation which touches on question
formulation, before deciding on your training question for Week
9.
- Hui Yang, a
former MS student here, worked quite extensively on list
questions. You may want to read her techniques in finding list
answers from her publications.
- You'll find that a number of sites on the web contain lots of
factual information that can be mined (see the note below about
research systems too). Some of these sites may be useful to
you. If you find others, please list them in the forum.
- General enclopedia, events - Wikipedia
- Geographic facts - CIA Factbook
- Famous People - Biography.com
- FAQs - Yahoo! Answers, Google Knol, phpBB sites
- Song lyrics, Famous quotations, etc.
Hints
- You can partially leverage on the knowledge and the system that
have built previously in Homework #1. In particular, you may
harness Yahoo! BOSS again as was done in homework #1.
- You can use external sources in RPNLPIR (such as lexica like
WordNet or statistics like IDF statistics over the WebBase
corpus) to assist your programs. If you do plan to use
external resources, please be aware that they take time to
compile and preprocess into a useable form for you to take
advantage of.
- You may note many research systems (including ones created here
by Prof. Chua Tat-Seng's group), mine and use resources on the
web. You may want to look into integrating these with your
homework submission. This is at your discretion however, as
for a short homework assignment such as this one, you may find
it better to concentrate on algorithm design, rather than
compiling resources. To create a simple version of mining
resources, consider using the
"site:"
query
restrictor in search engine queries.
- You may find downloading the documents yourself and processing
them may be helpful. If you do download documents, please note
that given the five minute deadline for each query, please make
sure you that your program doesn't hang if faced with a
recalcitrant page download.
Min-Yen Kan <kanmy@comp.nus.edu.sg>
Created on: Mon Sep 29 22:58:43 SGT 2008
| Version: 1.0
| Last modified:
Tue Nov 4 09:26:31 2008