Gene Ontology analysis of continuous measures based on
Mann-Whitney U test with adaptive clustering of GO categories
Mikhail Matz
University of Texas at Austin, USA
Comprehensive and visually clear functional summaries of genome-wide
information remain a challenge in genome biology. Gene Ontology (GO)
annotations have been used for this purpose for more than 10 years,
but there is still no consensus as to how to best analyze the data
and present the results. Here I describe the GOMWU method that in our
experience generates the most statistically powerful, informative, and
visually understandable functional summaries based on GO annotations.
The advantage of my method over a typical "GO enrichment" analysis
(e.g., GeneMerge by Castillo-Davis, Hartl, 2003) are as follows. First,
the experimenter does not have to impose an arbitrary cutoff for initial
candidate gene selection, and thus the whole dataset can be used to gain
information. No preliminary statistical test is required prior to
the analysis. The method is best suited to analyze the distribution
of continuous measures, such as dN/dS values, fold-changes of gene
expression, or raw p-values (unadjusted for multiple comparisons).
It works particularly well for kME values obtained by the weighted
gene coexpression network analysis (WGCNA). The second advantage is
that the method pre-summarizes the GO hierarchy by clustering GO
categories based on gene sharing within the dataset being analyzed.
This generates biologically meaningful grouping of GO categories
tailored for the particular dataset and allows the analysis to be
more specific (i.e., involve lower GO hierarchy levels) than in most
other GO analysis methods. Third, the visual representation of the
results is compact and intuitive.