Spacial Scores: new metrics for measuring molecular complexity
Molecular complexity is one of the
theoretical underpinnings for fragment-based drug discovery. Mike Hann and
colleagues proposed two decades ago that very simple molecules may not have
enough features to bind tightly to any proteins, whereas highly functionalized
molecules may have extraneous spinach that keeps them from binding to any
proteins. Fragments, being small and thus less complex, are in a sweet spot:
just complex enough.
But what does it mean for one
molecule to be more complex than another? Most chemists would agree that pyridine
is more complex than methane, but is it more complex than benzene? To decide,
you need a numerical metric, and there are plenty to choose from. The problem,
as we discussed in 2017, is that they don’t correlate with one another, so it
is not clear which one(s) to choose. In a new (open access) J. Med. Chem.
paper, Adrian Krzyzanowski, Herbert Waldmann and colleagues at the Max Planck Institute
Dortmund have provided another. (Derek Lowe also recently covered this paper.)
The researchers propose the Spacial
Score, or SPS. This is calculated based on four molecular parameters for each
atom in a given molecule. The term h is dependent on atom hybridization:
1 for sp-, 2 for sp2-, 3 for sp3-hybrized atoms, and 4
for all others. Stereogenic centers are assigned an s value of 2, while
all other atoms are assigned a value of 1. Atoms that are part of non-aromatic
rings are also assigned an r value of 2; those that are part of an aromatic
ring or linear chain are set to 1. Finally, the n score is set to the
number of heavy-atom neighbors.
For each atom in a molecule, h
is multiplied by s, r, and n2. The SPS is calculated
by summing the individual scores for all the atoms in a molecule. Because there
is no upper limit, and because it is nice to be able to compare molecules of
the same size, the researchers also define the nSPS, or normalized SPS, which
is simply the SPS divided by the number of non-hydrogen atoms in the molecule. Although SPS can
be calculated manually, the process is tedious and the researchers have kindly
provided code to automate the process. Having defined SPS, the researchers
compare it to other molecular complexity metrics, including the simple fraction
of sp3 carbons in a molecule,
Fsp3, which we wrote about in 2009.
The researchers next calculated nSPS for four sets of molecules including drugs, a screening library from Enamine, natural
products, and so-called “dark chemical matter,” library compounds that have not
hit in numerous screens. The results are equivocal. For
example, the nSPS for dark chemical matter is very similar to that for drugs.
On the other hand, natural products tend to have higher nSPS scores than drugs,
as expected. Interestingly, the average nSPS score for compounds in the GDB-17
database, consisting of theoretical molecules having up to 17 atoms, is also quite
high.
The researchers assessed whether nSPS
correlated with biological properties, and found that compounds with lower nSPS
tended to have lower potencies against fewer proteins, as predicted by theory.
That said, this analysis was based on binning compounds into a small number of
categories, and as Pete Kenny has repeatedly warned, this can lead to spurious
trends.
The same issue of J. Med.
Chem. carries an analysis of the paper by Tudor Oprea and Cristian Bologa,
both at University of New Mexico. This contextualizes the work and confirms
that drugs do not seem to be getting more complex over time, as measured by nSPS.
This may seem odd, though Oprea and Cristian note that by “normalizing” for
size, nSPS misses the increasing molecular weight of drugs.
This observation also raises
other questions, such as the fact that SPS explicitly excludes element identity.
Coming back to benzene and pyridine, both have identical SPS and nSPS, which
does not seem chemically intuitive. One could quibble more: why square the
value of n in the calculation of SPS? Why allow s to be only 1
and 2, as opposed to 1 and 5?
In the end I did enjoy reading
this paper, and I do think having some metric of molecular complexity might be
valuable. I’m just not sure where SPS will fit in with all the existing and conflicting
metrics, and how such metrics can lead to practical applications.