GS5002: Academic Professional Skills and Techniques, "Journal Club on Big Data and a Bewildered Lay Analyst", 2021 (AY21/22 Sem 1)

Session #2, 24 August 2021: Ginsberg et al. "Detecting influenza epidemics using search engine query data"

A decade ago, Ginsberg et al. made a remarkable claim that influenza epidemics could be predicted way before traditional health surveillance systems by mining the trends of search terms submitted to Google search engine. The intuition that there is a correlation between what people search for using search engines and influenza epidemics, consumer behavior, etc. has an enduring appeal, and can be of tremendous impact. But does it really work? Why? What have we learned after a whole decade?

Organization of the session

Part I, Background information:

This part deals with background knowledge of cancer biology and statistics that is needed to understand the paper. Some keywords are highlighted below to look up background literature, Wikipedia, etc.

Part II, The paper by Ginsberg et al.

This part presents the paper itself. We want to know the key technical details and the key messages.

Part III, Possible points for discussion

This part discusses the Ginsberg et al. paper, hopefully in depth. We want to know whether there is any methodological issue, any doubt on the conclusions/key messages, any suggestion for improving the paper. Some pointers for discussion include:

Instructions

The journal club has 4 sessions. We will discuss only 1 paper in each session. I will pick the paper for the 1st session, to set the scene. Hopefully, you will suggest the papers for the subsequent 3 sessions (we will choose by a simple vote from among the suitable papers you suggest.) Any paper can be suggested, so long as it (i) concerns data analysis (esp. big data) and (ii) contains “controversial” analysis or methodological issues that you think are worth for your classmates to appreciate.

For each paper, the presentation is divided into 3 parts: (i) background of the topic/paper – to help students who lacks domain knowledge, (ii) the paper itself - focusing on technical details, and (iii) discussion on the paper. And you will be divided into 3 teams, each team presents one part. The team will rotate through the 3 roles over the 4 sessions. This also mean everyone has to read every paper (plus some related papers/webpages which are helpful for understanding the paper being discussed.)

The grading will be based on presentation (50%), asking and answering questions during the discussion (50%).



Wong Limsoon
20 Aug 2021