AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions
Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally- Computational Mathematics
- Computational Theory and Mathematics
- Computer Science Applications
- Molecular Biology
- Biochemistry
- Statistics and Probability
Abstract
Motivation
Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.
Results
We showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.
Availability and implementation
AbLEF codebase is available at https://github.com/merck/AbLEF.
Supplementary information
Supplementary data are available at Bioinformatics online.