MicroRNA Prediction for Unannotated Genome-Wide and Transcriptomic Experiments

Peace, Robert

Download PDF

Resource Type

Thesis

Creator

Peace, Robert

Abstract

MicroRNAs (miRNAs) are short (18–23 nt), non-coding RNAs that play central roles in cellular regulation by modulating the post-transcriptional expression of messenger RNA (mRNA) transcripts. It has been previously estimated that 60-90% of all mammalian mRNAs may be targeted by miRNAs. Due to their biological importance, the ability to accurately predict miRNA sequences is of great importance. Computational prediction of miRNA are either genomic sequence-based (de novo) or analyze transcriptomic data arising from next generation sequencing (NGS) experiments. Unfortunately, existing methods of de novo miRNA prediction often fail when applied to non-model species, and are not well suited to genome-scale data sets. Furthermore, existing methods of NGS-based miRNA prediction do not incorporate all known lines of evidence for miRNA prediction, instead focussing on either sequence-based or expression-based features of putative miRNA.This thesis makes contributions to the state of the art of miRNA prediction which directly address the issues highlighted above. First, we develop a framework for the generation of species-specific training data sets. Three different forms of classifiers using diverse feature sets are trained and evaluated using the framework. Significant gains in precision and recall are achieved over existing methods, as measured using four diverse species from different phyla. Subsequently, the framework was applied to develop miRNA predictors in two successful genome-wide miRNA prediction studies, resulting in the discovery of 155 novel miRNA, thus verifying the real-world applicability of this work. Second, we introduce a genome-scanning miRNA prediction model which optimizes miRNA prediction for realistic experimental conditions. This model quantifies the performance of elements of the miRNA prediction pipeline, including pre-filtering stages, whose impact was previously ignored. This comprehensive evaluation framework has enabled significant increases in prediction performance over the state of the art through the use of updated RNA secondary structure parameters. Finally, we develop a NGS-based miRNA prediction method which improves on state-of-the-art performance through the integration of all known lines of evidence which discriminate miRNA from non-miRNA. This prediction method substantially outperforms two existing leading methods on data sets from five NGS experiments across three species, and is shown to generalize to hold-out data sets.

Subject

Language

English

Publisher

Carleton University

Thesis Degree Level

Doctoral

Thesis Degree Name

Doctor of Philosophy (Ph.D.)

Thesis Degree Discipline

Engineering, Electrical and Computer

Identifier

DOI: https://doi.org/10.22215/etd/2016-11462

Rights Notes

Copyright © 2016 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

2016

Relations

In Collection:

Theses and Dissertations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	peace-micrornapredictionforunannotatedgenomewide.pdf	2023-05-05	Public	Download

MicroRNA Prediction for Unannotated Genome-Wide and Transcriptomic Experiments

Downloadable Content

Relations

Items