Clustering of Short Read Sequences for de novo Transcriptome Assembly

Saadat, Samaneh; Safikhani, Zhaleh; Badie, Kambiz; Sadeghi, Mehdi

doi:https://dx.doi.org/10.22059/pbs.2014.50305

(ندگان)پدیدآور

Saadat, SamanehSafikhani, ZhalehBadie, KambizSadeghi, Mehdi

دریافت مدرک

FullText

اندازه فایل:

596.2کیلوبایت

نوع فايل (MIME):

PDF

نوع مدرک

Text
Original Research Papers

زبان مدرک

English

نمایش کامل رکورد

چکیده

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts' lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.

کلید واژگان

De novo
Next generation sequencing
RNA-Seq
transcriptome assembly
biotechnology
Molecular Biology

شماره نشریه

تاریخ نشر

2014-05-01
1393-02-11

ناشر

University of Tehran Press

سازمان پدید آورنده

Department of Algorithms and Computation, University of Tehran, Tehran, Iran
Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
National Telecom Research Center, Tehran, Iran
National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14155-6346, Iran.

شاپا

1016-1058
2228-7833

URI

https://dx.doi.org/10.22059/pbs.2014.50305
https://pbiosci.ut.ac.ir/article_50305.html
https://iranjournals.nlai.ir/handle/123456789/148056