DOI: 10.1093/bioinformatics/btae268 ISSN: 1367-4811

AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions

Zachary A Rollins, Talal Widatalla, Andrew Waight, Alan C Cheng, Essam Metwally
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Molecular Biology
  • Biochemistry
  • Statistics and Probability

Abstract

Motivation

Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (ie, developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby 3D conformational ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement.

Results

We showcase the AbLEF model on two developability properties: hydrophobic interaction chromatography retention time (HIC-RT) and temperature of aggregation (Tagg). We find that (1) 3D conformational ensembles that are generated from molecular simulation can further improve antibody property prediction for small datasets, (2) the performance benefit from 3D conformational ensembles matches shallow machine learning methods in the small data regime, and (3) fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties.

Availability and implementation

AbLEF codebase is available at https://github.com/merck/AbLEF.

Supplementary information

Supplementary data are available at Bioinformatics online.

More from our Archive