DOI: 10.1097/md.0000000000037766 ISSN: 0025-7974

A new equation for estimating low-density lipoprotein cholesterol concentration based on machine learning

Lei Chen, Chen Rong, Peidu Ma, Yiyang Li, Xue Deng, Muxing Hua
  • General Medicine

Low-density lipoprotein cholesterol (LDL-C) is a crucial marker of cardiovascular system damage. In the Chinese population, the estimation of LDL-C concentration by Friedewald, Martin-Hopkins or Sampson equations is not accurate. The aim of this study was to develop a group of new equations for calculating LDL-C concentration using machine learning techniques and to evaluate their efficacy. A total of 182,901 patient samples were collected with standard lipid panel measurements. These samples were collated and randomly divided into a training set and a test set. In the training set, a new equation was constructed using polynomial ridge-regression and compared to the Friedewald, Martin/Hopkins and extended Martin/Hopkins, or Sampson equations in the test set. Subsequently, an additional set of 17,285 patient samples were collected to evaluate the performance of the new equation in clinical practice. The new equation, a ternary cubic equation, was accurate and easy to use, with a goodness-of-fit R2 of 0.9815 and an uncertainty MSE of 37.4250 on the testing set. The difference between the calculated value by the new equation and the measured value of LDL-C was small (0.0424 ± 5.1161 vs Friedewald equation: −13.3647 ± 17.9198, vs Martin/Hopkins and extended Martin/Hopkins equation: −6.4737 ± 8.1036, vs Sampson equation: −8.9252 ± 12.6522, P < .001). It could accurately calculate LDL-C concentration even at high triglyceride and low LDL-C. Furthermore, the new equation could also precisely calculate LDL-C concentration in actual clinical use (R2 = 0.9780, MSE = 24.8482). The new equation developed in this study can accurately calculate LDL-C concentration within the full concentration range of triglyceride and LDL-C, and can serve as a supplement to the direct determination of LDL-C concentration for the prevention, treatment, evaluation, and monitoring of atherosclerotic diseases, compared to the Friedewald, Martin/Hopkins and extended Martin/Hopkins, or Sampson equations.

More from our Archive