Specifically, in this paper, we are going to present a case study on JavaScript categorization.
You probably want to know why we chose to categorize JavaScript, but not other languages.
There are few reasons for that.
As we all know, JavaScript is a language that is frequently used in Web pages. Current generation of Web pages largely rely on JavaScript for achieving certain functionality. For example, form processing, pop-up advertisement, page generation, page re-direction, and so on. These information sometimes are important for end-users of the Web. Therefore, JavaScript often convey crucial information. However, these information are often ignored by most crawlers and indexers. For example, when you issue a query to Google, it will return you a list of Web pages with a brief summary, but they fail to summarize the JavaScript information on the page. So if JavaScript information can also get summarized by the search engine, it will help user to predict the usefulness of the page. Also, people can also build applications to block unwanted JavaScript on the page.
So the question now becomes, can we have these information summarized automatically?
One way of doing so is to categorize JavaScript codes into a set of pre-defined categories.
But what would be the categories?