Publications

This page shows all publications that appeared in the IASI annual research reports. Authors currently affiliated with the Institute are always listed with the full name.

You can browse through them using either the links of the following line or those associated with author names.

Show all publications of the year  2013, with author ALL, in the category IASI Research Reports (or show them all):


IASI Research Report n. 13-17  (Previous    Next)  

Weitschek E., Daniele Santoni, De Cola M C, Giovanni Felici

About similarity of DNA reads

ABSTRACT
The DNA assembly process consists in reconstructing a complete DNA sequence from a high number of reads - fragments of the complete original DNA sequence (the genome) - extracted in a sequencing procedure. The need for fast assembly methods has increased with next generation sequencing (NGS) machines, that extract a high number of short reads from a genomic source. A large class of DNA assembly methods rely on a read comparison step, where promising read pairs are separated from non-promising ones in order to reduce the computational burden of the main assembly algorithm and to speed up the reconstruction of sequenced DNA. A reads comparison method based on an alignment-free distance is proposed, where the similarity of two reads is computed by calculating the substrings of fixed dimensions (k-mers) frequencies. The alignmentfree distance is compared with the quality of the BLAST alignment and the Needleman-Wunsch edit distance. Additionally, an ideal distance is defined and considered for the comparisons: this distance is computed by knowing in advance the mapping of the reads on the reference genome and by extracting the reads overlapping positions. The distances are evaluated on three organisms Saccharomyces cerevisiae (Yeast), Escherichia coli (E.coli), and Homo sapiens (Human) and the prediction power of the distances is assessed by analyzing training and test sets composed of different reads pairs: the alignment-free distance is able to compete with the more computationally demanding alignment based distances. The effectiveness of the alignment-free distance for computing a sound read similarity is proven. The alignment-free read pairs comparison is successfully adopted for DNA reads classification.
back
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -