Compared to the baseline, using lexical analysis alone, we can reduce 17% of the errors. This indicates that a careful study of language features helps categorization.
As we have found earlier, using metrics or context-sensitive features alone does not give us a good performance, but when they are coupled with lexical features the overall performance can be increased. Using the final feature set we can perform the categorization task with an accuracy 93.95%, resulting in a 52% error reduction.
Our experiment also showed that syntax features are noisy when coupled with other features, we therefore excluded this feature set in subsequent evaluations.
As we can see, we have many weak classifiers, which can not perform well alone, but when they are combined together, a good classifier is built. This is considered ensemble method.