Abstract
Coronary Heart Disease (CHD) is one of the leading causes of death nowadays. Prediction of the disease at an early stage is crucial for many health care providers to protect their patients and save lives and costly hospitalization resources. The use of machine learning in the prediction of serious disease events using routine medical records has been successful in recent years. In this paper, a comparative analysis of different machine learning techniques that can accurately predict the occurrence of CHD events from clinical data was performed. Four machine learning classifiers, namely Logistic Regression, Support Vector Machine (SVM), K- Nearest Neighbor (KNN), and Multi-Layer Perceptron (MLP) Neural Networks were identified and applied to a dataset of 462 medical instances and 9 features as well as the class feature from the South African Heart Disease data retrieved from the KEEL repository. The dataset consists of 302 records of healthy patients and 160 records of patients who suffer from CHD. In order to handle the imbalanced classification problem, the K-means algorithm along with Synthetic Minority Oversampling TEchnique (SMOTE) was used in this study. The empirical results of applying the four machine learning classifiers on the oversampled dataset have been very promising. The results reported using different evaluation metrics showed that SVM has achieved the highest overall prediction performance.