[ IVLE ]
[ Overview ]
[ Syllabus ]
[ Grading ]
[ Homework
HW 1
HW 2
HW 3
HW 4
>HW 5 ]
[ Misc. ]
In Homework 5, you will implement two simple programs to evaluate ranked retrieval and classification systems. In specific, you will create software to evaluate your own code as well (see Essay section below).
In Homework 1, you completed a URL classifier, which assigns one of three categories to a given URL. We can calculate precision, recall and F1 for performance on these categories individually, and also compile an average precision, recall and F1 over all n categories (in the case of Homework 1, n = 3). You will write code to compute these values given input from three separate files:
usually created manually using human judgments. This file should
contain one line per classification task, outputting the correct
class (e.g., Arts, Sports, News), followed by the text of the
classification task (e.g., the URL).urls.predict.txt
usually created by an automated system. This file is in exactly the
same format as the gold standard file, but will likely contain
different classes on some lines where the system may incorrectly
predict a different class than the gold standard.classes.txt
), which gives
the possible prediction classes. The file should have one class per
line (for our URL prediction task, this file should have three
lines, each containing one of "Arts", "Sports", and "News"; order
doesn't matter).Your code eval-c.py
should be invoked the following
eval-c.py -g urls.correct.txt -p urls.predict.txt -c
classes.txt -o output-statistics.txt
where -g is for the gold standard answers, -p for predicted answers, and -c for the available classes. As output, calculate each of the three metrics -- precision, recall, and F1 -- for an individual class, on a separate line. Output these sets of three lines for each class, in ascending lexicographical order, followed by the final n class average. Figures should have a 2 decimal place accuracy. For our URL classification task, your output file should resemble the following:
Precision of Arts: WW.WW Recall of Arts: XX.XX F1 of Arts: XX.XX Precision of News: YY.YY Recall of News: XX.XX F1 of News: XX.XX Precision of Sports: ZZ.ZZ Recall of Sports: XX.XX F1 of Sports: XX.XX Average Precision: AA.AA Average Recall: XX.XX Average F1: XX.XX
Note that there are two possible ways to calculate averages. Calculate the averages by averaging the individual metric values (e.g., to calculate Average Precision in the above, add together WW.WW + YY.YY + ZZ.ZZ, and divide by three).
In a ranked retrieval system, documents are returned in descending order of relevance, whether it is computed in a probabilistic, vector space or other model. You will also compute the interpolated precision/recall curve for the documents returned by a retrieval system, given input from two files:
), usually manually created
using human judgments. This file contains a set of lines, one for
each retrieval task (e.g., a query), and lists the document IDs that
are relevant, separated by spaces. There will be no extra space at
the end of the line. Irrelevant document IDs will not be listed;
they are all of the remaining documents that are not listed as
from HW4), usually created by an automated system. This file is in
exactly the same format as the gold standard file, but will likely
contain a different set of documents for lines (i.e., queries) where
the system may incorrectly predict some false positives or false
negatives.Your code eval-ir.py
should be invoked the following
eval-ir.py -l 1 -g correct-file-of-results -p
output-file-of-results -o output-statistics.txt
where -g is for the gold standard answers, -p for predicted answers, and -l for the line number. As output, calculate each of the three metrics -- interpolated precision, recall, and F1 -- for each rank, on a separate line for the retrieval task on line l. Output these sets of three lines for each class, in descending rank order. Figures should have a 2 decimal place accuracy.
In our vector space model retrieval system, for the first query, say the system returns five documents. Then your output file should resemble the following set of five lines (one set of precision, recall, and F1 for each of the five lines):
Precision at Rank 1: XX.XX Recall at Rank 1: XX.XX F1 at Rank 1: XX.XX Precision at Rank 2: XX.XX Recall at Rank 2: XX.XX F1 at Rank 2: XX.XX Precision at Rank 3: XX.XX Recall at Rank 3: XX.XX F1 at Rank 3: XX.XX Precision at Rank 4: XX.XX Recall at Rank 4: XX.XX F1 at Rank 4: XX.XX Precision at Rank 5: XX.XX Recall at Rank 5: XX.XX F1 at Rank 5: XX.XX
Please note that the above eval-ir.py
code is used to
evaluate an individual query, not all of the queries in the
You are required to submit eval-c.py
You are also asked to answer the following essay questions. These are to test your understanding of the lecture materials. Note that these questions may not have gold standard answers. A short paragraph or two are usually sufficient for each question.
task, we implicitly assume that each
class is equally important in calculating the average metrics.
If we knew that certain classes are more important than others
(e.g., News URLs are very important to not miss), suggest how
that could be best reflected in the averaging.eval-ir
task, we asked you to calculate
interpolated precision, as opposed to actual precision. If we
used average actual precision, explain whether your results
would change.
The instructions below are repeated for clarity sake. Instructions different from the previous Homework 4 are highlighted in red.
You are only allowed to do this assignment individually. For us to grade this assignment in a timely manner, we need you to adhere strictly to the following submission guidelines. They will help me grade the assignment in an appropriate manner. You will be penalized if you do not follow these instructions. Your matric number in all of the following statements should not have any spaces and any letters should be in CAPITALS. You are to turn in the following files:
These files will need to be suitably zipped in a single file called submission-<matric number>.zip. Please use a zip archive and not tar.gz, bzip, rar or cab files. Make sure when the archive unzips that all of the necessary files are found in a directory called submission-<matric number>. Upload the resulting zip file to the IVLE workbin by the due date: Monday 9 April 11:59:59 pm SGT. There absolutely will be no extensions to the deadline of this assignment. Read the late policy if you're not sure about grade penalties for lateness.
The grading criteria for the assignment is tentatively:
Disclaimer: percentage weights may vary without prior notice.
Min-Yen Kan <kanmy@comp.nus.edu.sg> Thu Mar 31 13:37:49 2011 | Version: 1.0 | Last modified: Sat Mar 24 12:39:42 2012