My group does double-duty not just by performing research and
publication, but as aforementioned, releasing practical implementations of
systems that solve problems used by many researchers. I include emails below that testify the importance of our
research, both for academics (in Computer Science and other disciplines) as
well as for industry.
This
is a recent email that I received on our keyphrase dataset that was just
released the year before (2007). I
get emails like these periodically; this one was still in my inbox as I had not
finished with the deliverables that the writer asked for.
from Torsten
Zesch <zesch@tk.informatik.tu-darmstadt.de>
to kanmy@comp.nus.edu.sg
date Thu,
Aug 7, 2008 at 7:43 PM
subject Keyphrase
extraction dataset
Dear Min-Yen Kan,
I have read your excellent paper on
"Keyphrase Extraction in Scientific Publications", and I really like
your approach. Your keyphrase extraction dataset could be of great benefit to
my experiments.
I would like to ask whether the dataset is
also available as a single compressed file, since downloading every single file
via the web interface would take a while.
Thanks in advance,
Torsten Zesch
--
-------------------------------------------------------------------
Torsten Zesch
Doctoral Researcher
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-7433, fax -5455, room
S2/02/E226
zesch@tk.informatik.tu-darmstadt.de
Here’s another example,
where the discussion focuses on another tool, ParsCit, which was just released
earlier this year as joint work with IST PSU folks.
from Matteo
Romanello <matteo.romanello@gmail.com>
to kanmy@comp.nus.edu.sg
date Wed,
Jul 9, 2008 at 9:17 PM
subject [ParsCit]
Dear Min-Yen Kan,
I discovered today on the Web your
appreciable work ParsCit.
It is exactly what I was looking for. Indeed
what I want to avoid was to write new templates for a tool such as ParaCite.
Another reason is that I started believing in
NLP...
In my plans there is to use your software to
parse bibliographies in the field of Humanities (in particular Greek and Latin
literature and philology) in order to do some authomatic semantic tagging upon
the parsed bibliographical records.
Obviously I will have to train ParsCit to do
this.
And here comes to questions that are the main
reason of my email.
A set of 2-300 marked references could be enough to obtain good results?
The Bigger is the training data set, the
better the results?
Thank you very much for your attention!
If you are interested, as soon as I have some
results I could get you the data about the obtained performances (such as log
files..).
Matteo Romanello
Digital Philologist
University Ca' Foscari of Venice
We even have other folks
totally unrelated to our project defending our research output as a production
level solution to a real-world problem.
from Nate
Vack <njvack@wisc.edu>
reply-to Code
for Libraries <CODE4LIB@listserv.nd.edu>
to CODE4LIB@listserv.nd.edu
date Sat,
Jul 12, 2008 at 5:18 AM
subject Re:
[CODE4LIB] anyone know about Inera?
On Fri, Jul 11, 2008 at 3:57 PM, Steve Oberg
<steve@obergs.net> wrote:
> I fully realize how much of a risk that
is in terms of reliability and
> maintenance. But right now I just want a way to do this in bulk with a
high
> level of accuracy.
How bad is it, really, if you get some (5%?)
bad requests into your
document delivery system? Customers submit
poor quality requests by
hand with some frequency, last I checked...
Especially if you can hack your system to
deliver the original
citation all the way into your doc delivery
system, you may be able to
make the case that 'this is a good service to
offer; let's just deal
with the bad parses manually.'
Trying to solve this via pure technology is
gonna get into a world of
diminishing returns. A surprising number of
citations in references
sections are wrong. Some correct citations
are really hard to parse,
even by humans who look at a lot of
citations.
ParsCit has, in my limited testing, worked
as well as anything I've
seen (commercial or OSS), and much better
than most.
My $0.02,
-Nate
Below
I show emails that highlight the commercial interests that other corporations
have had with our research developed in-house. The first email concerns a baseline implementation of an
anaphora resolution algorithm we have developed (publication #48), the second
concerns an image classifier we developed (publication #38) and the third
concerns a URL classifier (publication #47).
Date: Wed, 26 Jan 2005 11:15:12 +0000
From: James Hammerton
<james.hammerton@gtnet.com>
To: Qiu Long <qiul@comp.nus.edu.sg>
Cc: Iain Mckay <iain.mckay@gtnet.com>
Subject: Using JavaRAP commercially.
Qiu,
I work for a company called Graham
Technology. I'm interested in
evaluating your JavaRAP anaphora resolution
software for use in a
product we're developing. What are the terms
for evaluating JavaRAP? And
what would be the terms for using it if we
decide to do so? Could we get
access to the source code in the latter case,
to help us adapt the code
for our purposes?
Yours Sincerely,
James Hammerton
from Yves
Dassas <yves.dassas@hightrack.co.uk>
to kanmy@comp.nus.edu.sg
date Fri,
May 19, 2006 at 11:19 PM
subject Image
categorization - NPIC
Dear Dr Min-Ken Kan,
I read a project report entitled
"Synthetic Image Categorization" from one of your students (Wang
fei).
I am currently working on a project that
would require, among other components, an image classification tool similar to
the one designed by your student.
Could you tell me whether the NPIC can be
tested and/or whether they are any license associated to it?
Regards
Dr Yves Dassas
Tel.: 44 (20) 7454 12 44
DDI : 44 (20) 7354 63 36
fax.: 44 (20) 7454 12 40
email: Yves.Dassas@hightrack.co.uk
from Danny818@aol.com
to kanmy@comp.nus.edu.sg
date Mon,
Dec 13, 2004 at 1:27 PM
subject Re:
MEurlin
Hello Min,
We operate abcsearch.com and what we would
like to do is recognize a domain name and automatically recognize a keyword
that matches the domain name and then show search results for that domain. Let
me know when your demo is back online. Also what is the cost for the source
code.
Thanks,
Dan.