GS5002: Academic Professional Skills and Techniques, "Journal Club on Big Data and a Bewildered Lay Analyst", 2021 (AY21/22 Sem 1)

Session #1, 17 August 2021: Venet et al. "Most random gene expression signatures are significantly associated with breast cancer outcome"

The session will discuss the paper by Venet et al. on the observation that most large random signatures appear to be as predictive as any reported breast cancer prognostic signatures. This is an interesting paper because it calls into question whether any of the reported signatures is more meaningful/useful than random ones. A corollary of his observation is that: If you write a paper that reports a breast cancer prognostic signature (or present a method to do so), and evaluates based purely on prediction performance, the journal/reviewer should reject the paper without review.

Organization of the session

Part I, Background information:

This part deals with background knowledge of cancer biology and statistics that is needed to understand the paper. Some keywords are highlighted below to look up background literature, Wikipedia, etc.

Part II, The paper by Venet et al.

This part presents the Venet et al. paper itself. We want to know the key technical details and the key messages.

Part III, Possible points for discussion

This part discusses the Venet et al. paper, hopefully in depth. We want to know whether there is any methodological issue, any doubt on the conclusions/key messages, any suggestion for improving the paper. Some pointers for discussion include:

Instructions

The journal club has 4 sessions. We will discuss only 1 paper in each session. I will pick the paper for the 1st session, to set the scene. Hopefully, you will suggest the papers for the subsequent 3 sessions (we will choose by a simple vote from among the suitable papers you suggest.) Any paper can be suggested, so long as it (i) concerns data analysis (esp. big data) and (ii) contains “controversial” analysis or methodological issues that you think are worth for your classmates to appreciate.

For each paper, the presentation is divided into 3 parts: (i) background of the topic/paper – to help students who lacks domain knowledge, (ii) the paper itself - focusing on technical details, and (iii) discussion on the paper. And you will be divided into 3 teams, each team presents one part. The team will rotate through the 3 roles over the 4 sessions. This also mean everyone has to read every paper (plus some related papers/webpages which are helpful for understanding the paper being discussed.)

The grading will be based on presentation (50%), asking and answering questions during the discussion (50%).



Wong Limsoon
4 Aug 2021