Firstly, I would like to introduce the baseline system and its performance of the task.
Vector-space model is often used in conventional text categorization tasks. The basic idea is to tokenize text data into a bag of words, and this bag of words will be a feature vector representing the original text data.
In the baseline system, we simply treat the JavaScript codes as plain texts and tokenize them using blanks and punctuation symbols as delimiters. Then we pass the tokens to the Weka SMO classifier to do classification.
As you can see, the baseline is relatively high compared to many other classification tasks. However, there is still a gap to improve. Given such a high baseline, we also measured the error reduction rate in our evaluations to assess the effectiveness of our techniques.