Machine learning, AI & drug development

Organic life depends on proteins, complex molecules, which perform many functions. Proteins are made up of amino acids combinations. Amino acids are simple molecules paired and wrapped around each other in three-dimensional configurations. Mis-folded proteins can cause horrible conditions like Alzheimer, Parkinson's Disease, Huntington's Disease, etc. 

Developing new drugs, and tackling such conditions would receive a huge boost if we understood protein-folding processes. This is among the biggest research challenges. Protein folding seems to have simple rules, and we can list the amino acids sequences in a given protein. But the number of folding possibilities are very large, making it impossible to list all 3D configurations, even with super-computers. There are about 20,000 genes which can malfunction in multiple ways, and therefore, huge numbers of possible interactions between resultant proteins. 

Researchers use algorithms to compute the likely 3D structure of proteins, inputting amino acids sequence data. They also use methods like X-ray crystallography and nuclear magnetic resonance to image protein structures. But this is an expensive hit or miss process and we still don’t know what the structure of a new sequence could be. 

In many ways, protein-folding resembles chess, Shogi (a Japanese version of chess) and Go. These games have simple rules, which a child can learn in a few minutes. Each game has complete information. Yet, chess contains more possibilities than atoms in the universe; Shogi has even more possibilities than chess; and Go has orders of magnitude more possibilities than Shogi. Programs have to find heuristics (strategic rules of thumb) to play these games well. 

A program that’s good at such games can be adapted for protein-folding. We’re seeing demonstrations of this, with DeepMind’s AlphaFold. The UK-based artificial intelligence company, DeepMind created a sensation when its “Alpha” algorithms learnt chess, Go and Shogi. In 2016, the first iteration, AlphaGo beat Go world champion, Lee Seedol. A second iteration, AlphaZero was given the basic rules and played millions of games with itself to develop its heuristics. It beat AlphaGo, and thrashed other top chess-playing and Shogi programs. 

The self-learning methods DeepMind developed work for other systems of complete information. In December 2018, its program, AlphaFold, was the top performer at a very prestigious competition — the 13th edition of the Critical Assessment of Structure Prediction (CASP). 

CASP is a biennial competition that aims to predict 3D protein structures. Competing teams were given the linear sequence of amino acids for 90 proteins, where the 3D shape is known, but has not yet been published. The physical properties of protein molecules include the linked pairs of amino acids, the distances between linked pairs, and the angles of the chemical bonds connecting the pairs. If you know these, you can work out 3-D shapes.  

The CASP competitors compute how those 90 sequences “should” fold and match those against known structures. The debutant, AlphaFold, far outperformed the other 97 entrants. In 43 sequences where nothing beyond linear sequences was known, AlphaFold made the most accurate predictions 25 times. Second place finisher, Zhang Group, succeeded three out of 43 times. 

Dr John Moult, CASP’s lead organiser, a computational biologist at the University of Maryland in Rockville, says AlphaFold was on average, 15 per cent more accurate than the others.  DeepMind’s team refined two algorithms pioneered by others.

DeepMind uses “deep neural networks” to learn correlations between the shape of a protein molecule and its amino acid sequence. The AlphaFold model uses two algorithms. It computes a score that estimates the accuracy of a proposed structure. Then it uses “gradient descent” — an algorithm to find the minimum value of a function — to optimise that score. 

AlphaFold compared genomic data on other proteins to derive probabilities for which pairs would end up folding close to each other. “Fold” also worked out probable distances between neighbouring pairs and likely angles at which they joined.  This approach combines clever engineering design and vast computer resources to make a contribution to fundamental science. 

While DeepMind CEO, Demis Hassabis, says this demonstration is just a beginning, it does marks a shift in research into this key area. Instead of academics with biochemistry backgrounds, or conventional pharma companies, there could be more IT companies moving into studying protein-folding, using artificial intelligence (AI). Facebook has also started up research in this area, publishing a recent paper from its R&D group. That hasn’t yet been peer-reviewed. 

Academics who were tracking CASP seemed to be stunned by AlphaFold’s results, as it seems research into protein folding could be taken over by AI. Machine learning and AI could make a big contribution to accelerating drug development methods and research into many diseases.