A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up

dc.authoridDelibas, Emre/0000-0001-7564-5020
dc.authoridSeker, Abdulkadir/0000-0002-4552-2676
dc.contributor.authorDelibas, Emre
dc.contributor.authorArslan, Ahmet
dc.contributor.authorSeker, Abdulkadir
dc.contributor.authorDiri, Banu
dc.date.accessioned2024-10-26T18:07:49Z
dc.date.available2024-10-26T18:07:49Z
dc.date.issued2020
dc.departmentSivas Cumhuriyet Üniversitesi
dc.description.abstractDNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences. (C) 2020 Elsevier Inc. All rights reserved.
dc.identifier.doi10.1016/j.jmgm.2020.107693
dc.identifier.issn1093-3263
dc.identifier.issn1873-4243
dc.identifier.pmid32805559
dc.identifier.scopus2-s2.0-85089343634
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1016/j.jmgm.2020.107693
dc.identifier.urihttps://hdl.handle.net/20.500.12418/29691
dc.identifier.volume100
dc.identifier.wosWOS:000569803400005
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherElsevier Science Inc
dc.relation.ispartofJournal of Molecular Graphics & Modelling
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectDNA sequence Similarity
dc.subjectTop-k n-gram
dc.subjectAlignment-free comparison
dc.titleA novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up
dc.typeArticle

Dosyalar