There are some limitations of our work.
Firstly, annotation agreement. There are several instances are ambiguous in their functionality and are therefore hard to assign an appropriate category.
Secondly, dynamic analysis is incomplete, because only one path is selected during execution. So some important runtime features may not be able to get extracted with a single run. We plan to look into this issue in future.
Thirdly, choice of classifier. We mostly used the SMO classifier provided by the Weka machine learning toolkit. We believe it may not be a best choice for some of our feature set.
In future, we plan to look at source code classification of other languages and we hope we can make our system a plug-in of Web browsers.
If you are interested, you can find our experimental dataset as well as the system prototype online.