Efficient TF-IDF method for alignment-free DNA sequence similarity analysis

dc.authoridDelibas, Emre/0000-0001-7564-5020
dc.contributor.authorDelibas, Emre
dc.date.accessioned2025-05-04T16:47:09Z
dc.date.available2025-05-04T16:47:09Z
dc.date.issued2025
dc.departmentSivas Cumhuriyet Üniversitesi
dc.description.abstractThis study proposes a pioneering alignment-free approach for the analysis of DNA sequence similarity. The method employs the representation of DNA sequences as n-grams, a technique that involves the adaptation of the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to genomic data. The primary objective of this approach is to enhance the accuracy of the results while concomitantly reducing the computational costs of the process, by ascertaining the most informative n-grams. The approach adopted in this study successfully circumvents the limitations of both traditional alignment-based and alignment-free methods, thereby demonstrating a commendable level of performance. The proposed method was tested on three different datasets and achieved high agreement with reference phylogenetic trees in the AFProject benchmark system. The results demonstrate that TF-IDF-based similarity matrices effectively capture phylogenetic relationships and significantly reduce processing time. The high accuracy rates obtained prove that the method offers a scalable and robust alternative in large genomic datasets. The method demonstrates considerable potential in DNA sequence similarity analysis, exhibiting high accuracy and low computational cost.
dc.identifier.doi10.1016/j.jmgm.2025.109011
dc.identifier.issn1093-3263
dc.identifier.issn1873-4243
dc.identifier.pmid40107030
dc.identifier.scopus2-s2.0-105000152554
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1016/j.jmgm.2025.109011
dc.identifier.urihttps://hdl.handle.net/20.500.12418/35506
dc.identifier.volume137
dc.identifier.wosWOS:001453223200001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.institutionauthorDelibas, Emre
dc.language.isoen
dc.publisherElsevier Science Inc
dc.relation.ispartofJournal of Molecular Graphics & Modelling
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WOS_20250504
dc.subjectDNA sequence analysis
dc.subjectTF-IDF
dc.subjectAlignment-free method
dc.subjectGenomic data
dc.subjectPhylogenetic analysis
dc.titleEfficient TF-IDF method for alignment-free DNA sequence similarity analysis
dc.typeArticle

Dosyalar