• Mayuri Vaish

Using Naïve Bayes to Predict Diabetes


Introduction: Diabetes affects approximately 1.25 million American adults and children [1]. Type II Diabetes, caused by a rise in insulin resistance[2], resulting in hyperglycemia (high blood-sugar), yielding symptoms such as excessive thirst, frequent urination, fatigue, dizziness, headache and nausea[3]. The risk of type II diabetes has been highly correlated with obesity[4], and it has been hypothesized that specific markers such as age[5], and even gender[6]. Naïve Bayes has been successfully applied in medical diagnoses before, with high accuracy rates[7], but not for diabetes. With these developments, it seemed possible that a mathematical classification model such as Naïve Bayes could be used to 'predict' diabetes based on probability outcomes.

research aims to achieve just that.


Aim: To determine whether the Naïve Bayes mathematical model serves as a suitable predictor for diabetes, given specific patient characteristics.


Method: The Naïve Bayes classifier was applied on a patient given three specific attributes of age, gender, and frame, based upon an existing Statcrunch[8] diabetes database. A code was then developed (see bit.ly/NaiveBayes) and applied using (1) 10, randomly selected training data and (2) 20 training data.


Results: Using only three of the 16 given attributes of patients, initial trials with 10 training data should a mild yield of 40%. However, upon increasing it to 20, the accuracy of the model's predictions increased by 80%.


Conclusion: If only three highly-simplified categories could yield an 80% accuracy rate using only 20 training data, the Naïve Bayes model shows immense promise for future development. Although by no means can this machine-learning algorithm replace physicians and tests, it can serve as a guide for them, and possibly uncover previously unnoticed trends. Above all, these results show the power of mathematical computation across all realms, including healthcare.