Automated design of potent antimalarials

The OpenSource Malaria project team used a tool called Frobenius to design and synthesize novel anti-malarial compounds.

This blog is a follow up to previous work and an update on Evariste’s contribution to the OpenSource Malaria project. Everything described below is a summarised version of this GitHub thread used for communication between Evariste and the project team.

To summarise the previous blog post, we used our in-house platform Frobenius and designed some novel analogues of a set of pre-clinical anti-malarial compounds. When we discussed further with the chemists working on this project, we decided to refine this list and focus on a different chemotype. To do this, we utilised a compound designer built into Frobenius which automates the construction of virtual libraries.

Library chemistry is an effective tool for rapidly interrogating SAR. With some synthetic chemistry know-how or a half decent retrosynthesis predictor you can break any compound into a set of building blocks and the reactions required to correctly combine them. By linking this to commercial (or in-house) databases of building blocks you can then reconstitute a large library of novel molecules accessible by the same route. This approach ‘guarantees’ synthetic accessibility and allows direct cost comparison of each compound, sufficient that the building block libraries have this information attached. An example based on the chemistry used in this project is given below.

DiagramDescription automatically generated
Simple retrosynthetic analysis of starting compound


Once we’d encoded the route used to construct the scaffold of interest, Frobenius took a stored building block collection and generated the virtual library. In this case we used the boronic acid equivalents available in Enamine’s in stock catalogue to generate 2,828 novel molecules. These were then scored by our modelling and, in discussion with the project team, a set of molecules were selected for synthesis. At Evariste our quants model the correlation of properties of similar compounds. This modelling facilitates compound selection, and results in maximal diversity, probability of success and the information gained with the synthesis of each compound.

The figure below shows these compounds (plus a bonus impurity generated in one of the reactions) along with the predicted potencies and the 50% confidence interval. These numbers may differ from those in the GitHub thread as our modelling has improved in the time since these were initially picked. We were glad to see 50% of our predictions were within the 50% confidence interval, exactly what you would expect if the modelling was functioning properly.

SchematicDescription automatically generated with medium confidence
Round 1 molecules with measured pIC50 data and predicted 50% confidence interval


Based on this we designed a second round of suggestions, taking advantage of both the new data and the knowledge that including aryl chlorides in the starting material may lead to ‘bonus’ compounds. Thanks to some excellent purification from Edwin (a postdoc in Mat’s lab) we ended up with 10 new molecules and one repeat already present in the dataset. The set below are the new compounds which fall within the domain-of-applicability of the model.

MapDescription automatically generated
Round 2 molecules with measured pIC50 data and predicted 50% confidence interval


When tested, these molecules showed some really promising activity with the best compound (12) having a pIC50 of 6.8. Pleasingly, this is also one of the least lipophilic compounds in the set. Slightly over half of our potencies (6/8) fall within the modelled 50% CI.

It's worth noting that some of these compounds are quite similar to each other (due to the multiple by-products purified and tested) and thus their potencies are highly correlated. As a result, the compounds are not truly independent draws from their modelled distributions and we are more likely to see a deviation from the expected number of compounds that fall within their respective confidence intervals.

You can also see from the confidence intervals how Frobenius balances exploration and exploitation when scoring new molecules. Compound 11 has the highest overall mean prediction but a narrow confidence interval, essentially exploiting a well understood area of chemical space. Compound 12 actually has a relatively low mean prediction but a large confidence interval. The fact that this ended up being the best compound illustrates the value of making low mean, high variance compounds, something great medicinal chemists do intuitively and Frobenius replicates accurately.

Overall, we’re delighted with the latest set of results and very grateful to Edwin, Mat, and the team involved in making and testing the compounds. We also want to highlight the contribution of building block supplier Enamine who were kind enough to waive all shipping and handling fees as it was an anti-malarial project. Enamine are a wonderful company to work with, and our thoughts are very much with them at the present time given the situation in Ukraine, and particularly the Eastern regions where most of the Enamine chemists are based.


DDR Conference Poster

News & updates

Stay updated with Evariste's blogposts

News & updates

Stay updated with Evariste's blogposts

News & updates

Stay updated with Evariste's blogposts

News & updates

Stay updated with Evariste's blogposts

News & updates

Stay updated with Evariste's blogposts

News & updates

Stay updated with Evariste's blogposts