Clustering of Short Read Sequences for de novo Transcriptome Assembly
(ندگان)پدیدآور
Saadat, SamanehSafikhani, ZhalehBadie, KambizSadeghi, Mehdiنوع مدرک
TextOriginal Research Papers
زبان مدرک
Englishچکیده
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts' lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.
کلید واژگان
De novoNext generation sequencing
RNA-Seq
transcriptome assembly
biotechnology
Molecular Biology
شماره نشریه
1تاریخ نشر
2014-05-011393-02-11
ناشر
University of Tehran Pressسازمان پدید آورنده
Department of Algorithms and Computation, University of Tehran, Tehran, IranInstitute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
National Telecom Research Center, Tehran, Iran
National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14155-6346, Iran.
شاپا
1016-10582228-7833




