The Effect of Document Length on Machine Learning Success in Text-Based Data

Küçük Resim Yok

Tarih

2023

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Natural Language Processing (NLP) is an important research area for artificial intelligence studies. In the process of processing textual data, feature extraction and the creation of the word-document vector are very important. Especially for machine learning algorithms, these numerical vectors play a critical role in the creation of the model. Textual data must be preprocessed to generate these vectors. There are common methods such as removing stopwords, converting text to lowercase, and cleaning punctuation marks. The effects of these methods on the created model have also been investigated in the literature. However, it has not been investigated how the length values of the text can affect the model created. So how does a document or text having less than 10 or 20 characters affect the machine learning model? This study was carried out in order to solve this problem and fill the gap in the literature. The effect of text length on text classification models has been tested with different feature extraction methods. © 2023 IEEE.

Açıklama

2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023 -- 11 October 2023 through 13 October 2023 -- Sivas -- 194153

Anahtar Kelimeler

feature extraction; machine learning; text analysis; text classification; Text length

Kaynak

2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye