Redundant Email Elimination
Participants: LAN Jiang, NGUYEN The Huy, VO Phan Chuong, WU Xiandan,
Raymond KWOK, Limsoon WONG.
Objective
With the advent of the Internet more and more people are communicating
in electronic form such as via email messages, bulletin board systems
and USENET groups. With this increased use of electronic communications,
there has also become a greater degree of redundant messaging.
Not only is it irritating for a person to find multiple repeated
messages in their email folder, but it is a time consuming process for
the person to read through all the messages and sort the relevant information
from redundant messages.
Thus we would like to develop an efficient and effective
method and system to identify and eliminate such redundant messages.
History
The idea for this project was described by Raymond Kwok to Limsoon
Wong in January 2000. Several possible embodiments of this idea
were then sketched by Limsoon in April 2000. Patents were then applied
for October 2000, and granted in April 2005 in Europe and May 2005 in
Singapore.
Four different prototypes were then built over two semesters in
2006/7 by Lan Jiang, Nguyan The Huy, Vo Phan Chuong, and Wu Xiandan
as their honour-year projects in NUS School of Computing. Comments
on these prototypes are found below.
The "Best Redundant Email Elimination Project" award was won by Wu
Xiandan, for having built the prototype with (a) the highest degree of
robustness, sensitivity, and precision; and (b) a very decent GUI; as
well as demonstrating the best professional conduct during the project.
Publications
- Chong-See Kwok, Limsoon Wong.
A method for eliminating redundant email messages.
European Patent No. 1327192, 20 April 2005.
PDF
- Chong-See Kwok, Limsoon Wong.
A method for eliminating redundant email messages.
Singapore Patent No. 95931 [WO 00/33981], 31 May 2005.
PDF
Prototypes
- Wu Xiandan's
prototype and
report
Wu's approach is closest to that described in the
patents, which is based on sequence alignment.
But he has added a number of practical innovations
including robust interfaces to email systems, excellent GUI,
handling of attachments, as well as algorithmic improvements
for efficiency and for increased sensitivity and precision.
However, his prototype runs somewhat slower than others.
- Vo Phan Chuong's
prototype and
report
Vo's approach is also very close to that described
in the patents. However, the prototype and testing are not
as extensive as Wu's.
- Nguyen The Huy's
prototype and
report
Nguyen's approach is quite different from that described
in the patents. He uses a "fingerprint" method to identify
related emails that can made each other redundant. His prototype
runs really fast and worked well on many email messages. However,
it has more false negative problems on short messages and more
false positive problems on long messages. His prototype is text-based.
- Lan Jiang's
prototype,
poster, and
report
Lan's approach is also quite different from that described
in the patents. He uses subsequence matching, which is a lot
faster than the string alignment based algorithm in the patents.
This works well in typical cases, but has more false positives
on short messages. Lan has also built a nice GUI and a fully
functional email system.
Last updated: 3/5/07, Limsoon Wong.