Reading material
1. Some biology background
- Lecture 1 of CS5238
ps
pdf
- Chapter 1 of "Introduction to Computational Molecular Biology"
by Setubal/Meidanis
2. How to do experiment and feature generation
- Section 1 of Chapter 3 in
The Practical Bioinformatician
- John A. Swets,
"Measuring the accuracy of diagnostic systems", Science 240:1285--1293,
June 1988.
- Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical
Learning: Data Mining, Inference, and Predicition, Springer, 2001. Chapter 1, 7
- Lance D. Miller et al.,
"Optimal gene expression analysis by microarrays",
Cancer Cell, 2:353--361, November 2002.
3. Data mining techniques
4. DNA Feature Recognition
-
Chapters 4 and 7 in
The Practical Bioinformatician
- A. G. Pedersen, H. Nielsen,
"Neural network prediction of translation initiation sites in eukaryotes",
ISMB 5:226--233, 1997
- L. Wong et al.,
"
Using feature generation and feature selection for
accurate prediction of translation initiation sites",
GIW 13:192--200, 2002.
- A. Zien et al.,
"Engineering support vector machine kernels that
recognize translation initiation sites", Bioinformatics 16:799--807, 2000.
- A. G. Hatzigeorgiou,
"Translation initiation start prediction in human cDNAs with high accuracy",
Bioinformatics 18:343--350, 2002
- V.B.Bajic et al.,
"Computer model for recognition of functional transcription start sites
in RNA polymerase II promoters of vertebrates",
J. Mol. Graph. & Mod., 2003, in press.
- J.W.Fickett, A.G.Hatzigeorgiou, "Eukaryotic promoter recognition",
Gen. Res. 7:861--878, 1997.
- A.G.Pedersen et al.,
"The biology of eukaryotic promoter prediction---a review",
Computer & Chemistry 23:191--207, 1999.
- M.Scherf et al.,
"Highly specific localisation of promoter regions in
large genome sequences by PromoterInspector",
JMB 297:599--606, 2000.
- M. A. Hall,
"Correlation-based feature selection machine learning",
PhD thesis, Dept of Comp. Sci., Univ. of Waikato, New Zealand, 1998.
- U. M. Fayyad, K. B. Irani,
"Multi-interval discretization of continuous-valued attributes",
IJCAI 13:1022-1027, 1993.
- H. Liu, R. Sentiono,
"Chi2: Feature selection and discretization of numeric attributes",
IEEE Intl. Conf. Tools with Artificial Intelligence 7:338--391, 1995.
- C. P. Joshi et al.,
"Context sequences of translation initiation codon in plants",
PMB 35:993--1001, 1997.
- D. J. States, W. Gish,
"Combined use of sequence similarity and
codon bias for coding region identification", JCB 1:39--50, 1994.
- G. D. Stormo et al.,
"Use of Perceptron algorithm to distinguish
translational initiation sites in E. coli", NAR 10:2997--3011, 1982.
- J. E. Tabaska, M. Q. Zhang,
"Detection of polyadenylation signals in human DNA sequences",
Gene 231:77--86, 1999.
5. Sequence Homology Interpretation
-
Chapters 10 and 19 in
The Practical Bioinformatician
Function Assignment
-
S.E.Brenner. "Errors in genome annotation", TIG, 15:132--133, 1999.
-
T.F.Smith & X.Zhang. The challenges of genome sequence annotation or `The devil is in the details'", Nature Biotech, 15:1222--1223, 1997.
-
D. Devos & A.Valencia. "Intrinsic errors in genome annotation", TIG, 17:429--431, 2001.
-
K.L.Lim et al. "Interconversion of kinetic identities of the tandem catalytic domains of receptor-like protein tyrosine phosphatase PTP-alpha by two point mutations is synergist and substrate dependent", JBC, 273:28986--28993, 1998.
Alignment Applications
-
J. Park et al. "Sequence comparisons using multiple sequences detect three times as many remote homologs as pairwise methods", JMB, 284(4):1201-1210, 1998
-
J. Park et al. "Intermediate sequences increase the detection of homology between sequences", JMB, 273:349--354, 1997
-
Z. Zhang et al. "Protein sequence similarity searches using patterns as seeds", NAR, 26(17):3986--3990, 1996
-
M.S.Gelfand et al. "Gene recognition via spliced sequence alignment", PNAS, 93:9061--9066, 1996
-
S.F.Altschul et al. "Gapped BLAST and PSI-BLAST: A new generation of protein database search programs", NAR, 25(17):3389--3402, 1997.
Phylogeny Analysis
-
B. Sykes. "The seven daughters of Eve", Gorgi Books, 2002
6. Microarray Analysis
-
Chapters 14 in
The Practical Bioinformatician
-
E.-J. Yeoh et al., "Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling", Cancer Cell, 1:133--143, 2002
-
E.F. Petricoin et al., "Use of proteomic patterns in serum to identify ovarian cancer", Lancet, 359:572--577, 2002
-
U.Alon et al., "Broad patterns of gene expression revealed by clustering analysis of tumor colon tissues probed by oligonucleotide arrays", PNAS 96:6745--6750, 1999
-
J.Li, L. Wong, "Geography of differences between two classes of data", Proc. 6th European Conf. on Principles of Data Mining and Knowledge Discovery, pp. 325--337, 2002
-
J.Li, L. Wong, "Identifying good diagnostic genes or gene groups from gene expression data by using the concept of emerging patterns", Bioinformatics, 18:725--734, 2002
-
J.Li et al., "A comparative study on feature selection and classification methods using a large set of gene expression profiles", GIW, 13:51--60, 2002
-
M. A. Hall, "Correlation-based feature selection machine learning", PhD thesis, Dept of Comp. Sci., Univ. of Waikato, New Zealand, 1998
-
U. M. Fayyad, K. B. Irani, "Multi-interval discretization of continuous-valued attributes", IJCAI 13:1022-1027, 1993
-
H. Liu, R. Sentiono, "Chi2: Feature selection and discretization of numeric attributes", IEEE Intl. Conf. Tools with Artificial Intelligence 7:338--391, 1995
-
L.D. Miller et al., "Optimal gene expression analysis by microarrays", Cancer Cell 2:353--361, 2002
7. Machine Learning
8. Microarray and Proteomic data analysis
9. Sequence Similarity
10. Suffix Tree and its applications
11. Physical Mapping and Genome Sequencing
12. Structure Prediction and Comparison
13. Phylogeny Tree
- Please read the three chapters related to phylogeny tree on
CS5238