Best practices of feature selection in multi-omics data

Küçük Resim Yok

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IGI Global

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

With the recent advances in molecular biology techniques such as next-generation sequencing, massspectrometry, etc., a large omic data is produced. Using such data, the expression levels of thousands of molecular features (genes, proteins, metabolites, etc.) can be quantified and associated with diseases. The fact that multiple omics data contains different types of data and the number of analyzed variables increases the complexity of the models created with machine learning methods. In addition, due to many variables, the investigation of molecular variables associated with diseases is very costly. Therefore, selecting the informative and disease-related molecular features is applicable before model training and evaluation. This feature selection step is essential for obtaining accurate and generalizable models in minimum time with minimum cost. Some current methods used for feature selection are as follows: recursive feature elimination, information gain, minimum redundancy maximum relevance (mRMR), boruta, altmann, and lasso. © 2024, IGI Global. All rights reserved.

Açıklama

Anahtar Kelimeler

Kaynak

Research Anthology on Bioinformatics, Genomics, and Computational Biology

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye