The emergence of a large scale profiling of patient samples using RNA sequencing provides an opportunity to study complex diseases such as cancers and Parkinson’s disease at a higher resolution that was previously unavailable. Molecular biomarkers have attracted significant interest as RNAseq data have been used to discover new molecular subtypes, and we were able to molecular biomarkers to classify patients into those subtypes or predict patient outcomes in response to a treatment. However, this large amount of data with high dimesionality such as RNAseq also poses significant computational and statistical challenges. Typical workflow for biomarker discovery includes preprocessing of data, feature selection, classifier design, performance evaluation and validation (internal and external). Additional challenge with biomarker discovery based on RNAseq data comes with the fact samples are collected from multiple sites as a large number of patient samples, which is required to provide adequate statistical power for discovery, are hard to come by, especially from a single site. I will discuss some of the best practices of biomarker discovery and pitfalls, with some examples from our ongoing projects including bladder cancer, pancreatic cancer and Parkison’s disease.
Dr. Seungchan Kim is a Chief Scientist and Executive Professor at the Department of Electrical and Computer Engineering and Director of the CRI Center for Computational Systems Biology at the Prairie View A&M University (PVAMU). Prior to this appointment, He was the Head of Biocomputing Unit and an Associate Professor at Integrated Cancer Genomics Division of Translational Genomics Research Institute (TGen). He was one of the founding faculty members of TGen, founded in 2002, by Dr. Trent, then-Scientific Director of the National Human Genome Research Institute at the National Institutes of Health. He had led computational systems biology research at the institute since 2003. He was also an Assistant Professor in the School of Computing, Informatics, Decision Systems Engineering (CIDSE) at the Arizona State University from 2004 till 2011. Dr. Kim received B.S. and M.S. degrees in Agriculture Engineering from the Seoul National University, and Ph.D. in Electrical Engineering from the Texas A&M University. He also got his post-doctoral training at the Cancer Genetics Branch of National Human Genome Research Institute.
Dr. Kim’s research interests include: 1) mathematical modeling of genetic regulatory networks, 2) development of computational methods to analyze multitude of high throughput multi-omics data to identify disease biomarkers, and 3) computational models to diagnose patients or predict patient outcomes, for example, disease subtypes or drug response. His studies have had a large influence on the development of computational tools to study underlying mechanisms for cancer development and better understand the molecular mechanisms behind cancer biology and biological systems.