|
|
|
Firstly, I would
like to introduce the baseline system and its performance of the task.
|
|
Vector-space model
is often used in conventional text categorization tasks. The basic idea is to
tokenize text data into a bag of words, and this bag of words will be a
feature vector representing the original text data.
|
|
In the baseline
system, we simply treat the JavaScript codes as plain texts and tokenize them
using blanks and punctuation symbols as delimiters. Then we pass the tokens
to the Weka SMO classifier to do classification.
|
|
As you can see, the
baseline is relatively high compared to many other classification tasks.
However, there is still a gap to improve. Given such a high baseline, we also
measured the error reduction rate in our evaluations to assess the
effectiveness of our techniques.
|