Methods: Creation of a theoretical transcriptional network: i) Position weight matrix data in the Jaspar database and sequence data of promoter regions of the RefSeq genes in hg19 (Human Genome version 19) were obtained. ii) Target genes of transcription factors were calculated exhaustively by using FIMO tool of the MEME suite. iii) By connecting all relation data (transcription factor -> target gene), the entire transcriptional network is created. iv) Nodes that are not included in microarray probes are deleted.
Selection of analyzing target samples: The GEO has more than 80,000 samples with Affymetrix Human Genome U133 Plus 2.0 Array. We performed self-organizing maps (SOM) calculation based on differences in the expression status of the probes to examine the classification of cellular phenotypes for these samples. In addition, target samples of the subsequent analyzes by the theoretical network are selected with SOM.
Results: A theoretical transcriptional network with 41 transcription factors and 5608 target genes (these were limited to genes contained in Affymetrix Human Genome U133 Plus 2.0 Array) was created. The difference of the distribution between cancer and non-cancer cell samples was clearly obtained by SOM.