The Effect of Document Length on Machine Learning Success in Text-Based Data

dc.contributor.authorPolatgil, Mesut
dc.contributor.authorKekul, Hakan
dc.date.accessioned2024-10-26T17:51:06Z
dc.date.available2024-10-26T17:51:06Z
dc.date.issued2023
dc.departmentSivas Cumhuriyet Üniversitesi
dc.description2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023 -- 11 October 2023 through 13 October 2023 -- Sivas -- 194153
dc.description.abstractNatural Language Processing (NLP) is an important research area for artificial intelligence studies. In the process of processing textual data, feature extraction and the creation of the word-document vector are very important. Especially for machine learning algorithms, these numerical vectors play a critical role in the creation of the model. Textual data must be preprocessed to generate these vectors. There are common methods such as removing stopwords, converting text to lowercase, and cleaning punctuation marks. The effects of these methods on the created model have also been investigated in the literature. However, it has not been investigated how the length values of the text can affect the model created. So how does a document or text having less than 10 or 20 characters affect the machine learning model? This study was carried out in order to solve this problem and fill the gap in the literature. The effect of text length on text classification models has been tested with different feature extraction methods. © 2023 IEEE.
dc.identifier.doi10.1109/ASYU58738.2023.10296594
dc.identifier.isbn979-835030659-0
dc.identifier.scopus2-s2.0-85178317821
dc.identifier.urihttps://doi.org/10.1109/ASYU58738.2023.10296594
dc.identifier.urihttps://hdl.handle.net/20.500.12418/26017
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectfeature extraction; machine learning; text analysis; text classification; Text length
dc.titleThe Effect of Document Length on Machine Learning Success in Text-Based Data
dc.typeConference Object

Dosyalar