A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up

Delibas, Emre; Arslan, Ahmet; Seker, Abdulkadir; Diri, Banu

A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up

Tarih

2020

Yazarlar

Yayıncı

Elsevier Science Inc

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences. (C) 2020 Elsevier Inc. All rights reserved.

Anahtar Kelimeler

DNA sequence Similarity, Top-k n-gram, Alignment-free comparison

Kaynak

Journal of Molecular Graphics & Modelling

WoS Q Değeri

Q2

Scopus Q Değeri

Q2

Cilt

100

Bağlantı

https://doi.org/10.1016/j.jmgm.2020.107693
https://hdl.handle.net/20.500.12418/29691

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon