Gaurav Kandoi - Final Oral Exam

Wednesday, June 5, 2019 - 10:00am to 12:00pm
Event Type: 

Major Professors:  Julie Dickerson and Carolyn Lawrence-Dill

Machine learning tools for mRNA isoform function prediction

This dissertation is focused on improving mRNA isoform characterization in terms of functional networks, function prediction and tissue-specificity. There are three major challenges in solving these problems. The first is the unavailability of mRNA isoform level functional data which is required to develop machine learning tools. However, the available data, even at the gene level doesn’t include all genes, further complicating the matter. The second challenge is the lack of information about tissue-specificity in functional databases such as Gene Ontology, Kyoto Encyclopedia of Genes and Genomes and UniProt. The third challenge is the lack of mRNA isoform level “ground truth” functional annotation data to evaluate prediction methods.

To address these challenges, this dissertation develops and describes two computational tools. The first is a supervised learning based machine learning framework for Tissue-spEcific mrNa iSoform functIonal Networks (TENSION). Next, we describe mRNA Function Recommendation System (mFRecSys), a recommendation system for making tissue-specific function recommendations for mRNA isoforms. By using explicit contexts for mRNA isoforms, Gene Ontology biological process terms and tissue-specific mRNA isoform expression, mFRecSys is able to make tissue-specific mRNA isoform function recommendations while TENSION predicts tissue-specific functional networks at mRNA isoform level.

This work emphasizes the significance of incorporating diverse biological context to develop better machine learning tools for biology. It also highlights the use of simplified supervised learning methods for biological network prediction. The machine learning models and recommendation systems developed as part of this work also draw attention to the power of simple mRNA isoform sequence based predictors to improve mRNA isoform function prediction. The methods developed have potential practical applications, for instance as predictive models for distinguishing the functions of different mRNA isoforms of the same gene or identifying tissue-specific functions of mRNA isoforms.