Supervised Categorization of Javascript using Program Analysis Features

Wei Lu and Min-Yen Kan

AIRS 2005 (Jeju Island, Korea)

9/22

Syntax Analysis

function play(){

s = window.location;

window.location = “media/arbrav2.wav”;

}

JavaScript source code

FUNCTION [FUNNAME:play PARAMS:]

BLOCK

STMT

SETNAME

BINDNAME::s

GETPROP

NAME::window

STRING::location

STMT

SETPROP

NAME::window

STRING::location

STRING::“media/abrav2.wav”

RETURN

Syntax (structure) features are extracted from the parse tree

SETNAME[BINDNAMEàGETPROP] Syntax Feature

level=2

STMT[SETNAMEà[BINDNAMEàGETPROP]] Syntax Feature

level=3


	Next we try to investigate whether syntax features can enhance categorization performance.
	For each JavaScript source code, we parse into a tree structure. Syntax features can then get extracted based on the tree.
	In this particular example, a JavaScript function is parsed into a tree structure as shown below, and the syntax (structure) features can get extracted from the parse tree.
	For example, a syntax feature can be extracted as a level 2 sub-tree like this.
	Another syntax feature can be extracted as a level 3 sub-tree like this.
	Such syntax features are then serialized as text tokens and get passed to the classifier.