Guzhong Chen, Zhen Song, Zhiwen Qi, Kai Sundmacher

A scalable and integrated machine learning framework for molecular properties prediction

  • General Chemical Engineering
  • Environmental Engineering
  • Biotechnology

AbstractThis work introduced a scalable and integrated machine learning (ML) framework to facilitate important steps of building quantitative structure–property relationship (QSPR) models for molecular property prediction. Specifically, the molecular descriptor generation, feature engineering, ML model training, model selection and ensembling, as well as model validation and timing, are integrated into a single workflow within the proposed framework. Unlike existing modeling methods relying upon human experts that primarily focus on model/hyperparameter selection, the proposed framework succeeds by ensembling multiple models and stacking them in multiple layers. The high efficiency and effectiveness of the proposed framework are demonstrated through comparisons with literature‐reported QSPR models using identical datasets in three property modeling case studies, that is, the flash point temperature, the melting temperature, and the octanol–water partition coefficients. While requiring much less modeling time, the resultant models by the proposed framework present better predictive performance as compared with the reference models in all three case studies.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

  • Web-based, modern reference management
  • Collaborate and share with fellow researchers
  • Integration with Overleaf
  • Comprehensive BibTeX/BibLaTeX support
  • Save articles and websites directly from your browser
  • Search for new articles from a database of tens of millions of references
Try out CiteDrive

More from our Archive