Chapter19: Deep Learning in Therapeutic Antibody Development

reading notes of《Artificial Intelligence in Drug Design》


1.Introduction

  • Within the scope of developing therapeutic monoclonal antibodies (mAbs), there are many steps that all contribute to the overall cost and time required to bring a biotherapeutic drug to a patient.
  • Antibody library-based discovery is frequently done in phage or yeast cell platforms. In vivo discovery platforms such as animal immunizations or even human B-cell panning utilize transient production methods to create material for testing which normally does not replicate the therapeutic development process.
    请添加图片描述
  • The bulk of commercial therapeutic antibody production utilizes Chinese hamster ovary (CHO) cells and there are significant differences between the cells and production methods.
  • While expression levels can be estimated from large yeast datasets, these estimations are only vaguely “directional” for expected expression in mammalian cell lines. The individual cellular mechanisms of a mammalian cell and a yeast cell are too different.
  • Even if one can train reliable machine-learned predictors for a behavior like solubility in a specific solvent or extent of molecule–molecule interactions, the bigger challenge is mapping these beha- viors to in vitro process development tasks for an antibody or its in vivo behaviors.
  • This final challenge is where deep learning may provide the most benefit, design of data that spans the complicated spaces of the following.
      1. Germ lines.
      1. CDR diversity.
      1. Antibody formats (e.g., scFv, full length, Fab, Fc-fusion, multispecifics).
      1. Specific sequence liabilities (e.g., deamidation, isomerization, glycosylation sites).
      1. In vivo immunogenicity and clearance likelihood.

2.Supervised Learning in Antibody Development

  • There are two predominant pathways to prediction of behavior from molecular features;

    • The most frequently attempted approach is to use some intermediate representation of the antibody structure generated from molecular modeling and using some frequently hand-picked set of features as inputs.
    • The second approach, which is gaining more traction from deep learning efforts, is prediction straight from amino acid sequence, frequently encoded in a one- hot-encoded (OHE) form for each residue.
  • The real key to antibody behavior predictions is more likely buried in small scale distances and interactions—a domain in which the AlphaFold models simply cannot yet contribute.

  • On the other hand, unlike the generalized protein problem being handled by the AlphaFold models, a significant portion of the antibody sequence and structure is so conserved that homology modeling (using preexisting known sequences and structures as a starting point) provides a very reasonable estimate of base structure. This high level of conservation also permits the use of structure-based residue alignment methods which greatly reduce the complexity of the latent space that must be inferred from sequence.

2.1.Biophysical Properties

  • The availability of large protein solubility datasets (over 100k protein sequences) has recently opened the door for deep learning solubility prediction.
  • The DeepSol algorithm uses a convolutional neural network (CNN) which takes as input an amino acid sequence and outputs a probability that the associated protein is soluble. The SKADE algorithm uses an attention-based deep learning model on the same task.
  • While these models are not immediately applicable to antibody engineering—the soluble vs. insoluble classification dataset does not likely encode the subtler patterns associated with small solubility changes for a small number of mutations—they demonstrate that solubility is predictable from primary sequence.
  • A predictive model has also been reported for antibody hydrophobicity, trained on hydrophobic interaction chromatography retention time (HIC RT) measurements for over 5000 antibody antigen binding fragments (Fabs). Jain et al. used this data set to create two traditional machine learning predictors to predict (1) solvent accessible surface area (SASA) from engineered sequence features and (2) HIC RT class from SASA.

2.2.Product Quality Attributes

  • Product quality attributes (PQAs)—especially posttranslational modifications such as deamidation, isomerization, and glycosylation—are intriguing targets for predictive modeling.
  • A recent publication on machine learning for deamidation prediction provides an illustrative example for the current state of supervised learning in PQAs.
  • There have also been examples reported of using machine learning to predict mAb glycoform distributions in CHO cells, most recently using artificial neural networks.
  • While promising, the results also need to be aligned to the type of cells, transfection method, media composition, and even production mode (batch vs. continuous perfusion) as these all can have an impact on PQA.

2.3.Process Behavior

  • Creative use of high-throughput “scale down models”—lab methods that work in multiwell plates at the dozens, hundreds, or even thousands scale—and hybrid in silico modeling approaches offer a glimmer of hope for process behavior predictive modeling.
  • Two promising corners in process behavior prediction are using mL scale bioreactors for collecting productivity data and small scale purification experiments.

3.Unsupervised Learning in Antibody Development

  • The goal of these generative models is to create diverse, hyperrealistic synthetic candidates given an example dataset of true sam- ples. Curated human repertoire datasets, like the Observed Anti- body Space (OAS), provide a rich data source of true human antibody sequence.
  • There have also been applications of models to generate libraries of binders to a particular target/antigen. Variational Autoencoders have been used in coordination with Gaussian Mixture Models to allow for latent space clustering of antibody CDRs for specific targets. The model allows the users to navigate within the clusters of the latent space to generate novel binders to a given target. This approach can be seen as a means of performing CDR affinity maturation in silico, given a set of hits to an antigen, postlibrary screening.
  • Masked Language Models (MLM) may be particularly useful in the antibody space due to the antibody’s comparatively long protein sequence and complex structure where long-range context matters significantly.

3.1.Transfer Learning of Unsupervised and Self-Supervised Models

  • While GAN and MLM models are powerful generative and qualitative assessment tools, the ability to use transfer learning to further adapt these models may be the true transformative power of these approaches. With a trained model that has captured the larger domain of antibody sequence relationships, we can apply transfer learning to focus these models down to subsets of antibody types.
  • The path of transfer learning these models also opens the door to generating highly diverse training data for supervised learning applications and thereby further refining the models’ predictive abilities and our understanding of the underlying biophysical beha- viors.

4.Conclusion

  • Each of these intermediate successes in deep learning is useful, but the path is long.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

_森罗万象

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值