Bilgisayar Mühendisliği Bölümü Makale Koleksiyonu

Bu koleksiyon için kalıcı URI

Güncel Gönderiler

Listeleniyor 1 - 20 / 27
  • Öğe
    A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling
    (Springer, 2024) Arslan, Halil; Işık, Yunus Emre
    Nowadays, the level of invoice traffic between companies has reached enormous levels. Invoices are crucial financial documents for companies, and they need to extract this information from these documents to access and control them quickly when necessary. While electronic invoices can be easily transferred to the company’s ERP system with the help of integrators, information from printed invoices must be entered into the ERP system. Information entry is generally performed manually by company employees, so the probability of error is high. The automatic recognition of information in printed invoices will reduce the possibility of error. It will also save time and money by reducing workforce requirements. This study proposes a deep learning-based solution for detecting fields in image invoices that are in high demand among businesses. The system offers an end-to-end solution, which includes a novel method for generating synthetic invoices and automatic labeling. Three invoice templates were used to evaluate the usability of the system and an adaptive fine-tuning-based solution is proposed for newly coming invoice templates. Furthermore, 6 different object detection models were compared to find the most suitable one for our problem. The system was also tested with 1022 real invoice images that were manually labeled to test real-world usage. The results indicated that the fine-tuned model achieved an accuracy that was 8.4% higher than the baseline models. In tests performed on CPU, TOOD and Cascade-RCNN models were the most successful algorithms, while YOLOv5 was the fastest running algorithm. Depending on the priority of the needs, both algorithms can be preferred for real-time usage in the detection of invoice fields. The synthetic invoice generation code is available at https://github.com/SCU-CENG/Invoice-Generation.
  • Öğe
    Estimating vulnerability metrics with word embedding and multiclass classification methods
    (Springer, 2024) Kekül, Hakan; Ergen, Burhan; Arslan, Halil
    Cyber security has an increasing importance since the day when information technologies are an invariable part of modern human life. One of the fundamental areas of cyber security is the concept of software security. Security vulnerabilities in software are one of the main reasons for the exploitation of information systems. For this reason, it has been systematically reported, analyzed and classified for a long time, with a protocol established between the states and the stakeholders of the issue at the level. All these processes are carried out manually by humans today. This situation causes errors and delays caused by human nature. Therefore, the current study aims to help the experts and increase the accuracy of the analysis results by speeding up the processes. To achieve this goal, a model is proposed that uses technical explanations of security reports written in natural language. Our model basically proposes a method that uses word embedding approaches and multi-class classification algorithms from natural language processing techniques. In order to compare the proposed model more accurately, the NVD database, which is open to everyone and accepted as a reference, was chosen. In addition, previous studies in the literature and the model we propose were compared. In order for the results of the compared models to be analyzed more accurately, our model was trained with the data sets of the studies it was compared and the results were presented clearly. The proposed method showed estimation success in the range of 87.34–96.25% for CVSS 2.0 metrics, and in the range of 84–90% for CVSS 3.1. This study, in which different word embedding and classification algorithms are used together, is one of the limited studies on the latest version of the official scoring system used for classification of software security vulnerabilities. Moreover, it is the most comprehensive and original study in its field due to the size of the dataset it uses and the number of databases evaluated.
  • Öğe
    Machine Learning and Text Mining based Real-Time Semi-Autonomous Staff Assignment System
    (2024) Arslan, Halil; Işık, Yunus Emre; Görmez, Yasin; Temiz, Mustafa
    The growing demand for information systems has significantly increased the workload of consulting and software development firms, requiring them to manage multiple projects simultaneously. Usually, these firms rely on a shared pool of staff to carry out multiple projects that require different skills and expertise. However, since the number of employees is limited, the assignment of staff to projects should be carefully decided to increase the efficiency in job-sharing. Therefore, assigning tasks to the most appropriate personnel is one of the challenges of multiproject management. Assigning a staff to the project by team leaders or researchers is a very demanding process. For this reason, researchers are working on automatic assignment, but most of these studies are done using historical data. It is of great importance for companies that personnel assignment systems work with real-time data. However, a model designed with historical data has the risk of getting unsuccessful results in real-time data. In this study, unlike the literature, a machine learning-based decision support system that works with real-time data is proposed. The proposed system analyses the description of newly requested tasks using textmining and machine-learning approaches and then, predicts the optimal available staff that meets the needs of the project task. Moreover, personnel qualifications are iteratively updated after each completed task, ensuring up-to-date information on staff capabilities. In addition, because our system was developed as a microservice architecture, it can be easily integrated into companies’ existing enterprise resource planning (ERP) or portal systems. In a real-world implementation at Detaysoft, the system demonstrated high assignment accuracy, achieving up to 80% accuracy in matching tasks with appropriate personnel.
  • Öğe
    A novel target item-based similarity function in privacy-preserving collaborative filtering
    (Springer, 27.05.2024) Bilge, Alper
    Memory-based collaborative filtering schemes are among the most effective recommendation technologies in terms of prediction quality, despite commonly facing issues related to accuracy, scalability, and privacy. A prominent approach suggests an intuitively reasonable modification to the similarity function, which has been proven to provide more accurate recommendations than those generated by state-of-the-art memory-based collaborative filtering methods. However, this scheme exacerbates the scalability problem due to additional computational costs and fails to protect individual privacy. In this study, we recommend using a preprocessing method to eliminate relatively dissimilar items from the prediction estimation process, thereby enhancing the scalability of the proposed approach. We explore how to provide recommendations based on the previously proposed similarity function while preserving privacy and propose privacy-preserving schemes to accomplish this task. Additionally, we apply our preprocessing approach to our proposed privacy-preserving schemes to improve both scalability and accuracy. After analyzing our schemes with respect to privacy and additional costs, we conduct experiments with real data to examine the impact of our schemes on scalability and accuracy. The empirical outcomes indicate that our preprocessing scheme significantly alleviates scalability issues in both conventional and privacy-preserving environments and enhances accuracy within privacy-preserving frameworks.
  • Öğe
    Empowered chaotic local search-based differential evolution algorithm with entropy-based hybrid objective function for brain tumor segmentation
    (Elsevier, Kasım, 2024) Aydemir, Salih Berkan
    In neuro-oncology, the precise segmentation of brain tumors from Magnetic Resonance Images is crucial for diagnosis, treatment planning, and monitoring disease progression. Accurate segmentation helps determine the tumor’s size, location, and growth potential, which is essential for formulating effective treatment strategies. In response to this challenge, we developed a novel approach using Chaotic Local Search-Enhanced Differential Evolution (CJADE). CJADE, particularly its variant CJADE-M, which employs chaotic maps selected through a probability-based approach, has proven effective in optimizing brain tumor segmentation. Our study shows that CJADE-M outperforms traditional metaheuristic algorithms on various evaluation metrics. We further enhanced CJADE-M with an entropy-based hybrid objective function, which improved accuracy and reduced computational time in tumor segmentation compared to conventional methods like Minimum Cross-Entropy and Kapur. This makes our method suitable for real-time medical imaging analysis. Our findings indicate that CJADE-M, equipped with the hybrid objective function, achieves superior segmentation performance for both benign lobulated and malignant irregular tumors across metrics such as PSNR, FSIM, QILV, and HPSI. By providing a more accurate and efficient tool, our approach can significantly enhance the outcomes of brain tumor diagnosis and treatment, improving patient care in neuro-oncology.
  • Öğe
    Effects of Neighborhood-based Collaborative Filtering Parameters on Their Blockbuster Bias Performances
    (Ağustos, 2022) Yalcin, Emre
    Collaborative filtering algorithms are efficient tools for providing recommendations with reasonable accuracy performances to individuals. However, the previous research has realized that these algorithms are undesirably biased towards blockbuster items. i.e., both popular and highly-liked items, in their recommendations, resulting in recommendation lists dominated by such blockbuster items. As one most prominent types of collaborative filtering approaches, neighborhood-based algorithms aim to produce recommendations based on neighborhoods constructed based on similarities between users or items. Therefore, the utilized similarity function and the size of the neighborhoods are critical parameters on their recommendation performances. This study considers three well-known similarity functions, i.e., Pearson, Cosine, and Mean Squared Difference, and varying neighborhood sizes and observes how they affect the algorithms’ blockbuster bias and accuracy performances. The extensive experiments conducted on two benchmark data collections conclude that as the size of neighborhoods decreases, these algorithms generally become more vulnerable to blockbuster bias while their accuracy increases. The experimental works also show that using the Cosine metric is superior to other similarity functions in producing recommendations where blockbuster bias is treated more; however, it leads to having unqualified recommendations in terms of predictive accuracy as they are usually conflicting goals.
  • Öğe
    Determination of Photonuclear Reaction Cross-Sections on Stable P-shell Nuclei by Using Deep Neural Networks
    (Springer, 13 Mayıs 2023) Akkoyun, Serkan; Kaya, Hüseyin; Şeker, Abdulkadir; Yeşilyurt, Saliha
    Photonuclear reactions are widely used in investigations of nuclear structure. Thus, the determination of the cross-sections are essential for the experimental studies. In the present work, (γ, n) photonuclear reaction cross-sections for stable p-shell nuclei have been estimated by using the neural network method. The main purpose of this study is to find neural network structures that give the best estimations for the cross-sections, and to compare them with the available data. These comparisons indicate the deep neural network structures that are convenient for this task. Through this procedure, we have found that the shallow NN models, tanh activation function is better than the ReLU. However, as our models become deeper, the difference between tanh and ReLU decreases considerably. In this context, we think that the crucial hyperparameters are the size of the hidden layer and neuron numbers of each layer.
  • Öğe
    Zero-shot learning via self-organizing maps
    (Springer, 25.01.2023) İsmailoğlu, Fırat
    Collecting-labeled images from all possible classes related to the task at hand is highly impractical and may even be impossible. At this point, Zero-Shot Learning (ZSL) can enable the classification of new test classes for which there are no labeled images for training. The vast majority of existing ZSL methods aim to learn a projection from the feature space into the semantic space, where all classes are represented by a list of semantic attributes. To this end, they usually try to solve a complex optimization problem. Nevertheless, the semantic features (attributes) may not be suitable to represent the images because they are derived based on human knowledge and are, therefore, abstract. Alternatively, in this study, we introduce a novel ZSL method called SOMZSL, which has its roots in Self-Organizing Maps (SOM), a famous data visualization method. In particular, SOMZSL builds two SOMs of the same size and shape, one for the feature space and one for the attribute space, and then establishes a correspondence between them. Instead of considering a direct projection between the feature space and the attribute space, which is inherently different, SOMZSL connects them through comparable intermediate layers, i.e., SOMs. In terms of performance, SOMZSL can classify novel test classes as well or even better than existing ZSL methods without dealing with a complex optimization problem, thanks to the heuristic nature of SOM on which it is based. Finally, SOMZSL uses unlabeled test images in the construction of SOMs and can thus mitigate the domain shift problem inherent in ZSL.
  • Öğe
    LVQ Treatment for Zero-Shot Learning
    (Tubitak Academic Journals, 23.01.2023) İsmailoğlu, Fırat
    In image classification, there are no labeled training instances for some classes, which are therefore called unseen classes or test classes. To classify these classes, zero-shot learning (ZSL) was developed, which typically attempts to learn a mapping from the (visual) feature space to the semantic space in which the classes are represented by a list of semantically meaningful attributes. However, the fact that this mapping is learned without using instances of the test classes affects the performance of ZSL, which is known as the domain shift problem. In this study, we propose to apply the learning vector quantization (LVQ) algorithm in the semantic space once the mapping is determined. First and foremost, this allows us to refine the prototypes of the test classes with respect to the learned mapping, which reduces the effects of the domain shift problem. Secondly, the LVQ algorithm increases the margin of the 1-NN classifier used in ZSL, resulting in better classification. Moreover, for this work, we consider a range of LVQ algorithms, from initial to advanced variants, and applied them to a number of state-of-the-art ZSL methods, then obtained their LVQ extensions. The experiments based on five ZSL benchmark datasets showed that the LVQ-empowered extensions of the ZSL methods are superior to their original counterparts in almost all settings.
  • Öğe
    A Novel Contour Tracing Algorithm for Object Shape Reconstruction Using Parametric Curves
    (Tech Science Press, 2023) Gürkahraman, Kali
    Parametric curves such as Bézier and B-splines, originally developed for the design of automobile bodies, are now also used in image processing and computer vision. For example, reconstructing an object shape in an image, including different translations, scales, and orientations, can be performed using these parametric curves. For this, Bézier and B-spline curves can be generated using a point set that belongs to the outer boundary of the object. The resulting object shape can be used in computer vision fields, such as searching and segmentation methods and training machine learning algorithms. The prerequisite for reconstructing the shape with parametric curves is to obtain sequentially the points in the point set. In this study, a novel algorithm has been developed that sequentially obtains the pixel locations constituting the outer boundary of the object. The proposed algorithm, unlike the methods in the literature, is implemented using a filter containing weights and an outer circle surrounding the object. In a binary format image, the starting point of the tracing is determined using the outer circle, and the next tracing movement and the pixel to be labeled as the boundary point is found by the filter weights. Then, control points that define the curve shape are selected by reducing the number of sequential points. Thus, the Bézier and B-spline curve equations describing the shape are obtained using these points. In addition, different translations, scales, and rotations of the object shape are easily provided by changing the positions of the control points. It has also been shown that the missing part of the object can be completed thanks to the parametric curves.
  • Öğe
    A survey of smart home energy conservation techniques
    (Elsevier, Mart, 2023) Fakhar, Muhemmed Zaman; Yalcin, Emre; Bilge, Alper
    Smart homes are equipped with easy-to-interact interfaces, providing a more comfortable living environment and less energy consumption. There are currently satisfactory approaches proposed to deliver adequate comfort and ease to smart home inhabitants through infrared sensors, motion sensors, and other similar technologies. However, the goal of reducing energy consumption is always a significant concern for smart home stakeholders. A detailed discussion about energy management techniques might open new leads for advanced research and even introduce more ways to improve existing methods since a summary of effective energy conservation techniques are helpful to get a quick overview of the state-of-the-art techniques. This review study aims to provide an overview of previously proposed techniques for energy conservation and energy-saving recommendations. We identify various critical features in energy conservation techniques, i.e., user energy profiling, appliance energy profiling, and off-peak load scheduling to perform a comparative analysis among different techniques. Then, we explain various energy conservation techniques, describe common and rare evaluation metrics, identify several techniques for realizing synthetic smart home energy consumption datasets, and provide a statistical analysis of the existing literature. The survey finally points out possible research directions which might lead to new inquiries in energy conservation research.
  • Öğe
    Popularity bias in personality perspective: An analysis of how personality traits expose individuals to the unfair recommendation
    (Wiley, Şubat, 2023) Yalcin, Emre; Bilge, Alper
    Recommender systems are subject to well-known popularity bias issues, that is, they expose frequently rated items more in recommendation lists than less-rated ones. Such a problem could also have varying effects on users with different gender, age, or rating behavior, which significantly diminishes the users' overall satisfaction with recommendations. In this paper, we approach the problem from the view of user personalities for the first time and discover how users are inclined toward popular items based on their personality traits. More importantly, we analyze the potential unfairness concerns for users with different personalities, which the popularity bias of the recommenders might cause. To this end, we split users into groups of high, moderate, and low clusters in terms of each personality trait in the big-five factor model and investigate how the popularity bias impacts such groups differently by considering several criteria. The experiments conducted with 10 well-known algorithms of different kinds have concluded that less-extroverted people and users avoiding new experiences are exposed to more unfair recommendations regarding popularity, despite being the most significant contributors to the system. However, discrepancies in other qualities of the recommendations for these user characteristics, such as accuracy, diversity, and novelty, vary depending on the utilized algorithm.
  • Öğe
    A novel classification?based shilling attack detection approach for multi?criteria recommender systems
    (Wiley, Mayıs, 2023) Turkoglu Kaya, Tugba; Yalcin, Emre; Kaleli, Cihan
    Recommender systems are emerging techniques guiding individuals with provided referrals by considering their past rating behaviors. By collecting multi-criteria preferences concentrating on distinguishing perspectives of the items, a new extension of traditional recommenders, multi-criteria recommender systems reveal how much a user likes an item and why user likes it; thus, they can improve predictive accuracy. However, these systems might be more vulnerable to malicious attacks than traditional ones, as they expose multiple dimensions of user opinions on items. Attackers might try to inject fake profiles into these systems to skew the recommendation results in favor of some particular items or to bring the system into discredit. Although several methods exist to defend systems against such attacks for traditional recommenders, achieving robust systems by capturing shill profiles remains elusive for multi-criteria rating-based ones. Therefore, in this study, we first consider a prominent and novel attack type, that is, the power-item attack model, and introduce its four distinct variants adapted for multi-criteria data collections. Then, we propose a classification method detecting shill profiles based on various generic and model-based user attributes, most of which are new features usually related to item popularity and distribution of rating values. The experiments conducted on three benchmark datasets conclude that the proposed method successfully detects attack profiles from genuine users even with a small selected size and attack size. The empirical outcomes also demonstrate that item popularity and user characteristics based on their rating profiles are highly beneficial features in capturing shilling attack profiles.
  • Öğe
    Robustness of privacy-preserving collaborative recommenders against popularity bias problem
    (PeerJ, Temmuz, 2023) Gulsoy, Mert; Yalcin, Emre; Bilge, Alper
    Recommender systems have become increasingly important in today’s digital age, but they are not without their challenges. One of the most significant challenges is that users are not always willing to share their preferences due to privacy concerns, yet they still require decent recommendations. Privacy-preserving collaborative recommenders remedy such concerns by letting users set their privacy preferences before submitting to the recommendation provider. Another recently discussed challenge is the problem of popularity bias, where the system tends to recommend popular items more often than less popular ones, limiting the diversity of recommendations and preventing users from discovering new and interesting items. In this article, we comprehensively analyze the randomized perturbation-based data disguising procedure of privacy-preserving collaborative recommender algorithms against the popularity bias problem. For this purpose, we construct user personas of varying privacy protection levels and scrutinize the performance of ten recommendation algorithms on these user personas regarding the accuracy and beyond-accuracy perspectives. We also investigate how well-known popularity-debiasing strategies combat the issue in privacy-preserving environments. In experiments, we employ three well-known real-world datasets. The key findings of our analysis reveal that privacy-sensitive users receive unbiased and fairer recommendations that are qualified in diversity, novelty, and catalogue coverage perspectives in exchange for tolerable sacrifice from accuracy. Also, prominent popularity-debiasing strategies fall considerably short as provided privacy level improves.
  • Öğe
    Aggregating user preferences in group recommender systems: A crowdsourcing approach
    (Elsevier, 2022) Firat Ismailoglu
    We present that group recommendations are similar to crowdsourcing, where the responses of different crowd workers are aggregated in the absence of ground truth. With this in mind, we mimic the use of the EM algorithm as in crowdsourcing to aggregate the preferences of group members to estimate group ratings and the expertise levels the group members. Moreover, for the first time in the literature, we cast the problem of estimating group rating as an ordinal classification problem relying on the natural ordering between the ratings, which allows us to define the expertise levels of the members in terms of sensitivity and specificity. In fact, we impose priors on the sensitivity and the specificity scores corresponding to the members, taking a Bayesian approach. We validate the effectiveness of the proposed aggregation method using the CAMRa2011 dataset, which consists of small and established groups, and the MovieLens dataset, which consists of large and random groups.
  • Öğe
    End to End Invoice Processing Application Based on Key Fields Extraction
    (IEEE, 2022) Arslan, Halil
    In this paper, an automatic invoice processing system, which is in great demand among private and public companies, was proposed. The proposed system supports all invoice file types that can be submitted by companies. Companies can easily submit invoices to the system via the web interface or email, and all invoices submitted to the system are queued and processed sequentially. If the invoice is a text file, the invoice information is extracted from the text by using template matching. If the invoice is an image, the text and table areas are detected and extracted. For table detection, we used both image processing based and YOLOv5-based deep learning method. Cell extraction was then performed from the extracted table images. As a result of these processes, all text and table cells were obtained as images and these images were converted into machine-readable text using the open-source software Tesseract OCR. Tesseract already provides trained models for English and Turkish. However, these models do not provide successful results for invoices submitted by companies in Turkish. Therefore, the new fine-tuned model trained with invoices in Turkish was used for OCR. The experimental results showed that the trained Turkish model was more accurate than the Turkish and English models provided by Tesseract. In addition, the YOLOv5-based table detection model was more accurate than the image-processing-based table detection method.
  • Öğe
    Exploring potential biases towards blockbuster items in ranking-based recommendations
    (Springer, 2022) Yalcin Emre
    Popularity bias is defined as the intrinsic tendency of recommendation algorithms to feature popular items more than unpopular ones in the ranked lists lists they produced. When investigating the adverse effects of popularity bias, the literature has usually focused on the most frequently rated items only. However, an item's popularity does not always indicate that it is highly-liked by individuals; in fact, the degree of liking may even introduce biases that are more extreme than the famous popularity bias in terms of beyond-accuracy evaluations. Therefore, in the present study, we attempt to consider items that are both popular and highly-liked, which we refer to as blockbuster items, and to investigate whether the recommendation algorithms impose a considerable bias in favor of the blockbuster items in their ranking-based recommendations. To this end, we first present a practical formulation that measures the degree of the blockbuster level of the items by combining their liking-degree and popularity effectively. Then, based on this formulation, we perform a comprehensive set of experiments with ten different algorithms on five datasets with different characteristics to explore the potential biases towards blockbuster items in recommendations. The experimental outcomes demonstrate that most recommenders propagate an undesirable bias in their recommendations towards the blockbuster items, and such a bias is, in fact, not caused by the item popularity. Moreover, the observed biases to blockbuster items are more harmful and persistent than those to popular ones in terms of beyond-accuracy aspects such as diversity, catalog coverage, and novelty. The obtained results also suggest that conventional popularity-debiasing strategies are not so talented in treating the adverse effects of the observed blockbuster bias in recommendations.
  • Öğe
    Evaluating unfairness of popularity bias in recommender systems: A comprehensive user-centric analysis
    (Elsevier, 2022) Yalcin, Emre; Bilge, Alper
    The popularity bias problem is one of the most prominent challenges of recommender systems, i.e., while a few heavily rated items receive much attention in presented recommendation lists, less popular ones are underrepresented even if they would be of close interest to the user. This structural tendency of recommendation algorithms causes several unfairness issues for most of the items in the catalog as they are having trouble finding a place in the top- 𝑁 lists. In this study, we evaluate the popularity bias problem from users’ viewpoint and discuss how to alleviate it by considering users as one of the major stakeholders. We derive five critical discriminative features based on the following five essential attributes related to users’ rating behavior, (i) the interaction level of users with the system, (ii) the overall liking degree of users, (iii) the degree of anomalous rating behavior of users, (iv) the consistency of users, and (v) the informative level of the user profiles, and analyze their relationships to the original inclinations of users toward item popularity. More importantly, we investigate their associations with possible unfairness concerns for users, which the popularity bias in recommendations might induce. The analysis using ten well-known recommendation algorithms from different families on four real-world preference collections from different domains reveals that the popularity propensities of individuals are significantly correlated with almost all of the investigated features with varying trends, and algorithms are strongly biased towards popular items. Especially, highly interacting, selective, and hard-to-predict users face highly unfair, relatively inaccurate, and primarily unqualified recommendations in terms of beyond-accuracy aspects, although they are major stakeholders of the system. We also analyze how state-ofthe-art popularity debiasing strategies act to remedy these problems. Although they are more effective for mistreated groups in alleviating unfairness and improving beyond-accuracy quality, they mostly fail to preserve ranking accuracy.
  • Öğe
    IESR: Instant Energy Scheduling Recommendations for Cost Saving in Smart Homes
    (IEEE, 10.05.2022) Fakhar, Muhammad Zaman; Yalçın, Emre; Bilge, Alper
    The exponential increase in energy demands continuously causes high price energy tariffs for domestic and commercial consumers. To overcome this problem, researchers strive to discover effective ways to reduce peak-hour energy demand through off-peak scheduling yielding low price energy tariffs. Efficient off-peak scheduling requires precise appliance pro ling to identify a scheduling recommendation for peak load management. We propose a novel off-peak scheduling technique that provides instant energy scheduling recommendations by monitoring appliances in real-time following user-devised criteria. Once an appliance operates during a peak hour and fulfills the user criteria, a real-time scheduling recommendation is presented for users' approval. The proposed technique utilizes appliance energy consumption data, user-devised criteria, and energy price signals to identify the recommendation points. The energy cost-saving performance of the proposed technique is evaluated using two publicly available real-world energy consumption datasets with four price signals. Simulation results show a significant cost-saving performance of up to 84% for the experimented datasets. Moreover, we formulate a novel evaluation metric to compare the performance of various off-peak scheduling techniques on similar criteria. Comparative analysis indicates that the proposed technique outperforms the existing methods.
  • Öğe
    A multiclass hybrid approach to estimating software vulnerability vectors and severity score
    (Elsevier, 2021) Kekül, Hakan; Ergen, Burhan; Arslan, Halil
    Classifying detected software vulnerabilities is an important process. However, the metric values of security vectors are manually determined by humans, which takes time and may introduce errors stemming from human nature. These metrics are important because of their role in the calculation of vulnerability severity. It is necessary to use machine learning algorithms and data mining techniques to improve the quality and speed of vulnerability analysis and discovery processes. However, studies in this area are still limited. In this study, vulnerability vectors were estimated using the natural language processing techniques bag of words, term frequency–inverse document frequency, and n-gram for feature extraction together with various multiclass classification algorithms, namely Naïve Bayes, decision tree, k-nearest neighbors, multilayer perceptron, and random forest. Our experiments using a large public dataset facilitate assessment and provide a standard-compliant prediction model for classifying software vulnerability vectors. The results show that the joint use of different techniques and classification algorithms is a promising solution to a multi-probability and difficult-to-predict problem. In addition, our study fills an important gap in its field in terms of the size of the dataset used and because it covers a vulnerability scoring system version that has not yet been extensively studied.