Imbalanced Data Oversampling Technique Based on Convex Combination Method

elnahas, mohammed and Hussein, Mahmoud and Keshk, Arabi (2021) Imbalanced Data Oversampling Technique Based on Convex Combination Method. IJCI. International Journal of Computers and Information, 9 (1). pp. 15-28. ISSN 2735-3257

[thumbnail of IJCI_Volume 9_Issue 1_Pages 15-28.pdf] Text
IJCI_Volume 9_Issue 1_Pages 15-28.pdf - Published Version

Download (1MB)

Abstract

Classification process is the predicting a label for a specific set of inputs. In such process, it is difficult to classify given inputs when a dataset is imbalanced. Most of existing machine learning classifiers suffer from dealing with the imbalanced data, because it makes the classifiers highly biased towards the majority class. This bias may lead to less accuracy in minority class prediction. Data oversampling is one of the most important solutions used to balance the data particularly when dataset is small and/or imbalanced dataset. Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, Adaptive Synthetic (ADASYN) and Weighted SMOTE(W-SMOTE) are the most popular techniques used for data oversampling. However, the main drawback of SMOTE and ADASYN techniques is they increase the overlapping between classes and then the produced samples are not representative of the original data distribution. The Borderline-SMOTE may neglect some important samples to produce new samples. To overcome, the problems in the existing over-sampling techniques, in this paper, we propose a new data over-sampling method that depends on the convex combination method to generate new samples of the minority class. The convex combination allows us to produce new samples that have the same original data distribution. We evaluated our approach over four standard imbalanced datasets (Yeast, Glass Identification, Paw, and Wisconsin Prognosis Breast Cancer (WPBC)). The experimental results show that our proposed method gives better performance in terms of accuracy, precision, recall. F1-measure and Area under the curve (AUC).

Item Type: Article
Subjects: Archive Science > Computer Science
Depositing User: Managing Editor
Date Deposited: 22 Jun 2024 09:28
Last Modified: 22 Jun 2024 09:28
URI: http://editor.pacificarchive.com/id/eprint/1396

Actions (login required)

View Item
View Item