Abstract
Most forms of human papillomavirus can create alterations on a woman's cervix that can lead
to cervical cancer in the long run, while others can produce genital or epidermal tumors.
Cervical cancer is a leading cause of morbidity and mortality among women in low- and
middle-income countries. The prediction of cervical cancer still remains an open challenge as
there are several risk factors affecting the cervix of the women. By considering the above, the
cervical cancer risk factor dataset from KAGGLE data warehouse is executed for predicting
the cervical cancer risk classes. The cervical cancer data set is normalised with incomplete
data and Pattern Calibration. Secondly, the interpretive data analysis is carried out, and the
target feature's dispersion of the cervical cancer risk is visualised. Thirdly, several classifiers
are fitted to the unprocessed data set, and the performance is measured with pre and post
feature scaling. Fourth, oversampling methodologies are applied to the pre - processed data
set. Fifth, the oversampled dataset by differment methods are applied to all the classifiers and
the performance is compared with pre and post feature scaling. Sixth, Precision, recall, Fscore, accuracy, and running time are some of the metrics used in performance analysis. The
code is written in Python and executed with Anaconda Navigator on the Spyder framework.
The findings of the experiments reveal that the Random forest classifier tends to sustain 96%
accuracy pre and post scaling for unporocessed dataset. Similarly the same classifier tends to
sustain 98% accuracy for all the oversampling techniques.