Objective: The scale and features of the medical database are also increasingly growing with the continued production of medical devices and advances in data collection. The majority of current cancer diagnosis, however, still depends on the examination and testing of doctors with samples of cell tissue who do not appear to use the medical database to its full potential. Therefore, the research focus for this thesis was to explore whether we can use artificial intelligence to recognise trends that can be predictive of cancer detection in medical databases. Therefore, different methods of machine learning (ML) have been investigated for their predictive efficiency with the goal of how they can assist doctors in diagnosing cancer.
Methods: This study uses 154,899 screening mammogram records from the Breast
Cancer Surveillance Consortium (BCSC) dataset. This dataset comprises 12
independent variables and 1 dependent variable in a labelled data format. Using
well-established machine learning algorithms, we developed four prediction
models: Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), and
Bayesian Network, by training them on 80% of the data and testing their
prediction output on the remaining 20% of data. We carried out a comparative
study of the results of these models of machine learning.
Results: Naïve Bayes showed the best prediction accuracy for the malignant
samples, which is predictive of the risk of cancer, among the four models. The
Bayesian Network model had the second best results. For predicting malignant
cases and thus the risk of breast cancer, both Logistic Regression and SVM
yielded poor prediction results.
Author (s) Details
Mochen Li
School of Engineering
Technology, Purdue University, West Lafayette, IN 47907, USA.
Dr. Gaurav Nanda
School of Engineering Technology,
Purdue University, West Lafayette, IN 47907, USA.
Raji Sundararajan
School of Engineering Technology,
Purdue University, West Lafayette, IN 47907, USA.
No comments:
Post a Comment