分子对接常用于:(1)预测结合模式(pose/binding mode);(2)预测结合亲和力(binding affinity);(3)虚拟筛选(Virtual screening)。经常听到这样的吐槽:我用分子对接预测了化合物的活性,发现对化合物活性排序非常不准。这与分子对接预测结合亲和力相关,因此本文的主要目的是探讨分子对接能否精确地预测结合亲和力。分子对接预测结合模式预测与虚拟筛选的话题另有讨论,见本文的相关主题部分。
GOLD has been optimised for the prediction of ligand binding positions rather than the prediction of binding affinities. It is not recommended to use scoring function values to suggest accurate binding affinities because many factors are ignored or approximated when calculating a docking score (protein and ligand motion, and accurate treatment of waters to name two).
Sometimes a correlation between fitness score and binding affinity can be observed. If so, care must be taken to ensure this is not artifactual. This usually occurs when an equally good correlation is found between binding affinity and molecular weight. Both fitness score and binding affinity tend to correlate with molecular weight to some extent.
It is possible to develop Quantitative Structure Activity Relationships (QSAR) between docking data and binding affinity using GoldMine. These can be used to predict binding affinity for new members in the same series.
For further information please refer to the GOLD and GoldMine user manuals.
Schrodinger的knowledge base有两个关于如何用Glide预测结合亲和力的解答。其中一个是如何用GlideScoer预测结合亲和力:
Article ID: 572 – Last Modified: May 19, 2016 – 12:00am
How do I find the binding affinity from the GlideScore?
The GlideScore is an estimate of the binding energy, but it is only an estimate. Computing accurate absolute, or even relative, binding energies is an extremely challenging task. While the GlideScore has been fit to experimental binding energies (but for native redocking only), and therefore is on the scale of binding energies (in kcal/mol), it isn’t possible to use the GlideScore for rank-ordering actives. The GlideScore generally can distinguish actives from inactives (e.g., nanomolar vs micromolar) in a virtual screening application, but the use of a rigid receptor structure means that it is not possible to distinguish fine levels of activity, because the particular receptor conformation used for docking might not be optimal for binding all actives. GlideScore differences of a few kcal/mol really can’t be considered significant, which makes it problematic to compare actives.
Article ID: 144- Last Modified: May 19, 2016 – 12:00am
GlideScore/Docking Score doesn’t correlate with my known activities. What is wrong?Glide is primarily concerned with generating an accurate pose for each protein-ligand complex and separating ligands with appreciable binding affinity (generally < 10uM) from those that don't bind, in a ranked list. Extensive testing, both in-house and by third parties, has shown that Glide is very effective at pose prediction and enriching hit lists in active compounds.
The task of accurately estimating protein-ligand binding affinities remains beyond the capabilities of docking scoring functions. The rigid receptor approximation, limited estimate of the entropy gain or loss upon binding, and other approximations in GlideScore and all other empirical scoring functions omit essential thermodynamics of the free energy of binding. Thus, approaches other than correlation with GlideScore should be applied to predict relative or absolute binding affinities.
- For congeneric series, where a set of ligands share a common structure, free energy perturbation with FEP+ has been demonstrated to reliably produce binding affinity predictions with approximately 1 kcal/mol accuracy.
- End-point approaches have demonstrated in some cases the ability to yield reasonable correlations with experimental binding affinities. Prime MM-GBSA was designed to process output from Glide and returns relative free energy estimates of binding using a Generalized Born solvation model.
- QSAR-based approaches using 3D information from aligned ligand poses in the binding pocket (such as Field-Based QSAR) are sometimes able to identify and explain correlations with experimental binding affinities.
- Finally, with a set of actives and compounds that don’t bind it may be possible to create your own scoring function by fitting a model to experimental binding affinities using AutoQSAR.
- 对于一组具有相同结构的同系物,已证明用FEP+的自由能微扰可以可靠地预测结合亲和力,预测精度约为1kcal/mol。
- 在某些情况下End-point法预测的结合亲和力与实验的结合亲和力具有合理的相关性的。Prime MM-GBSA可以在Glide的输出基础上使用广义Born求解模型预测相对结合自由能。
- 用在结合口袋里的结合模式对配体进行叠合,用该3D信息进行QSAR分析(例如基于场的QSAR)有时能够识别并解释与实验结合亲和力的相关性。
- 最后,使用一组活性化合物物质和一组不结合的非活性化合物,用AutoQSAR将实验结合亲和力拟合于一个模型中,创建自己的打分函数。
总的来说,Schrodinger关于用分子对接预测结合化合物亲和力并排序的答复与GOLD基本一致:对接打分函数设计用于预测结合模式、而不是精确预测结合亲和力。如果要精确预测结合亲和力,需要另找其它方法,比如用对接产生的pose进行FEP计算,最为精确; MM-GBSA有时有相关性;用对接的pose进行叠合建立QSAR模型也行;还可以用AutoQSAR拟合自己的打分函数。
Enyedy & Egan等人(2008)1研究了分子对接打分值与IC50的关系,结果表明GLIDE对接打分值与IC50没有关系,见图1。这与GOLD,GLIDE的描述基本一致。

Figure 1. GLIDE SP与GLIDE XP打分值与IC50的关系

Figure 2. GLIDE_XP与Emodel打分函数与ClogP与MW的虚拟筛选性能比较
UCSF DODC6使用手册对Docking score的说明
UCSF DOCK6的常见问答(FAQ)里也回答了打分函数与分子结合亲和力这个问题:
What is a DOCK Score ?
DOCK and other docking algorithms are designed to rank libraries of molecules with respect to each other. If this is the level of accuracy you need, DOCK should work fine. If you would like to calculate the absolute binding free energy, you will need to move to much more sophisticated (and time intensive) calculations. Some of the more commonly used of these types of calculations include MM/PBSA, Free Energy Perturbation (FEP), and Thermodynamic Integration (TI). You should be able to find some literature reviews that cover the strengths and weaknesses of each technique.
DOCK的作者之一John Irwin对预测值与实验值有相关性这件事的回复原文:
Thanks for your contribution to the DOCK developers’ discussion group. We welcome all comments and opinions! Arguably this thread fits more in with the dock-fans mailing list, so I’ve copied them on this.
I wonder whether there is a misunderstanding about molecular docking and virtual screening lurking behind what you’ve written. In our experience, molecular docking (virtual screening in high throughput) is considered to be doing well retrospectively if it can a) enrich known binders 20 fold over random from a database of drug-like decoys and b) reproduce qualitatively the experimental binding geometries (McGovern & Shoichet, J Med Chem. 2003 Jul 3;46(14):2895-907.)
Prospectively, we consider docking a success if we purchase and test 50 compounds from among the top 500 of a database of purchasable, drug-like compounds (e.g. ZINC http://zinc.docking.org/ ) and find 3 previously unknown binders. That’s a pretty low bar, but it is considered the state of the art in this field. If someone shows me a quantitative comparison between docking energies and experimental binding affinities, unless it is within a narrow SAR series (and therefore not very interesting), my instinct is to believe it is an accidental correlation, and that people are fooling themselves intobelieving the correlation is significant.
You can list a dozen reasons why docking shouldn’t even work, much less provide good correlations with experimental binding affinities. Indeed, in our experience, 90+% of top docking hits are not actual binders. Correlate that! Hardly worth repeating to this audience, the reasons docking shouldn’t work include but are not limited to the approximations of the scoring function, the inadequate treatment of desolvation and entropy, and the rigid or incomplete sampling of receptor structure.
We think of docking as a screen, that sorts a database into “more likely” (top scorers) and “less likely” (the rest) to actually bind experimentally. Of course, we are actively working to improve docking, and there is reason to hope that docking can be improved. One way to do this is to focus on the decoys, and ask what makes molecules score well in the computer when they do not bind experimentally. This is one area of research in the lab, and the subject of a paper that will appear shortly from Graves and Shoichet 2005.
You are right to be cautious, and I encourage you to perform due diligence on DOCK5 or any other docking program you choose to use. We certainly do (see McGovern 2003 as above). But I think you also need to have realistic expectations of docking technology. As you point out, getting free energy perturbation calculations to correlate with experiment has been difficult enough. What do you expect with docking calculations that spend a few seconds or even a few minutes per molecule?
Best wishes,
John Irwin http://johnirwin.compbio.ucsf.edu
侯廷军课题组的Zhe Wang等人2也考察了10种分子对接软件,结果发现打分值与实验测得的结合亲合力之间的相关性还相当弱(Pearson相关性系数不到0.5,还可能反相关),这说明打分函数不太可靠、通用性也不足。详见博文十种分子对接软件的性能评估3。
D3R第2届竞赛结果表明4,所有参赛者提交的预测的结合亲和力与实验结合亲和力的Kendall’s Tau相关性系数不超过0.45(第一阶段竞赛) 与0.46(第二阶段竞赛),这说明预测结合亲和力或对化合物活性排序有很大的提升空间。而代表空假设(没用预测方法的)的油水分配系数cLogP对结合亲和力预测性能表现与参赛者相当,其Kendall’s Tau相关性系数为0.45。
- Enyedy, I. J., & Egan, W. J. (2008). Can we use docking and scoring for hit-to-lead optimization? Journal of Computer-Aided Molecular Design, 22(3–4), 161–168. https://doi.org/10.1007/s10822-007-9165-4
- Wang, Z.; Sun, H.; Yao, X.; Li, D.; Xu, L.; Li, Y.; Tian, S.; Hou, T. Comprehensive Evaluation of Ten Docking Programs on a Diverse Set of Protein–Ligand Complexes: The Prediction Accuracy of Sampling Power and Scoring Power. Phys. Chem. Chem. Phys. 2016, 18 (18), 12964–12975. https://doi.org/10.1039/C6CP01555G.
- 肖高铿.十种分子对接软件的性能评估. http://blog.molcalx.com.cn/2017/02/08/evaluation-ten-docking-program.html
- Gaieb, Z.; Liu, S.; Gathiaka, S.; Chiu, M.; Yang, H.; Shao, C.; Feher, V. A.; Walters, W. P.; Kuhn, B.; Rudolph, M. G.; et al. D3R Grand Challenge 2: Blind Prediction of Protein–Ligand Poses, Affinity Rankings, and Relative Binding Free Energies. J. Comput. Aided. Mol. Des. 2018, 32 (1), 1–20. https://doi.org/10.1007/s10822-017-0088-4.
- 分子对接假阳性、假阴性产生的原因
- 十种分子对接软件的性能评估
- FLARE案例 | WaterSwap计算BRD4抑制剂的结合自由能
- 认清虚拟筛选中的陷阱
本文通过比较高通量筛选(HTS)与虚拟筛选(VS),讨论了分子对接虚拟筛选产生假阳性与假阴性的原因, 发现不合适的分子内能是假阳性的一个主要原因;通过比较HTS与VS的命中化合物,发现两者可以分别发现不同的化合物,因此两者是互补的;通过对虚拟筛选假阴性化合物的复合物结构分析发现,发现不适合的蛋白结合位点是假阴性的一个重要原因;本文还比较了分子对接与2D相似性的虚拟筛选之间的关系,发现2D相似性方法不能发现竞争性抑制剂与基于结构的方法可以, 由此说明虚拟筛选不同于二维结构相似的方法,可以找到结构新颖的,可逆的竞争性抑制剂。