A new equation for estimating low-density lipoprotein cholesterol concentration based on machine learning
Lei Chen, Chen Rong, Peidu Ma, Yiyang Li, Xue Deng, Muxing Hua- General Medicine
Low-density lipoprotein cholesterol (LDL-C) is a crucial marker of cardiovascular system damage. In the Chinese population, the estimation of LDL-C concentration by Friedewald, Martin-Hopkins or Sampson equations is not accurate. The aim of this study was to develop a group of new equations for calculating LDL-C concentration using machine learning techniques and to evaluate their efficacy. A total of 182,901 patient samples were collected with standard lipid panel measurements. These samples were collated and randomly divided into a training set and a test set. In the training set, a new equation was constructed using polynomial ridge-regression and compared to the Friedewald, Martin/Hopkins and extended Martin/Hopkins, or Sampson equations in the test set. Subsequently, an additional set of 17,285 patient samples were collected to evaluate the performance of the new equation in clinical practice. The new equation, a ternary cubic equation, was accurate and easy to use, with a goodness-of-fit R2 of 0.9815 and an uncertainty MSE of 37.4250 on the testing set. The difference between the calculated value by the new equation and the measured value of LDL-C was small (0.0424 ± 5.1161 vs Friedewald equation: −13.3647 ± 17.9198, vs Martin/Hopkins and extended Martin/Hopkins equation: −6.4737 ± 8.1036, vs Sampson equation: −8.9252 ± 12.6522,