DOI: 10.30621/jbachs.1284274 ISSN: 2458-8938

Decision Tree-Based Classification Approach to Discover Factors Affecting Vitamin D Level with Machine Learning

Ceyda Ünal, Süleyman Albaş, Cihan Çılgın, Esra Meltem Koç
Purpose: Vitamin D level is emphasized as an important biomarker in determining risk factors for different diseases. Vitamin D is an important vitamin for human health and its deficiency is associated with serious health problems. Therefore, it is of great importance to detect vitamin D deficiency, which can be easily prevented and treated. The possible relationship between vitamin D deficiency and musculoskeletal pain, osteoporosis, diabetes mellitus, hypertension is frequently discussed in researches. In this research, it is aimed to analyze the factors in determining the vitamin D level and the decision rules related to it. Methods: A descriptive framework based on one of the machine learning techniques, that is decision tree is followed. The data used to create the decision rules were obtained from volunteers between the ages of 18-85 who applied to Izmir Katip Çelebi University Atatürk Training and Research Hospital Infectious Diseases and Family Medicine Polyclinics and agreed to participate in the study between 01.03.2017 and 01.09.2017. Results: It was observed that age, gender and laboratory test values are strong predictors for vitamin D level. As a result of two CART (Classification and Regression Trees) models, %90.47 and %95 predictive accuracies were observed respectively. In the first model, uric acid, age and creatine; in the second model TSH, ALP and smoking(yes) were the most important three biomarkers affecting vitamin D level. Discussion: The collected features give a comprehensive list of variables that have an effect on vitamin D in the dataset under consideration. Important findings of the study include not only the identification of these variables, but also the effective categorization determination procedures. In contrast to previous research, the Age variable is the most influential factor within the scope of this dataset, which includes demographic information on patients and their existing disorders.

More from our Archive