Next we try to investigate whether syntax features can enhance categorization performance.
For each JavaScript source code, we parse into a tree structure. Syntax features can then get extracted based on the tree.
In this particular example, a JavaScript function is parsed into a tree structure as shown below, and the syntax (structure) features can get extracted from the parse tree.
For example, a syntax feature can be extracted as a level 2 sub-tree like this.
Another syntax feature can be extracted as a level 3 sub-tree like this.
Such syntax features are then serialized as text tokens and get passed to the classifier.