From drugs to galaxy hunting, AI is elbowing its way into boffins' labs
Machine learning is cropping up more and more in research papers – does it work?
Feature Powerful artificially intelligent algorithms and models are all the rage. They're knocking it out of the park in language translation and image recognition, but autonomous cars and chatbots? Not so much.
One area machine learning could do surprisingly well in is science research. As AI advances, its potential is being seized by academics. The number of natural science studies that use machine learning is steadily rising.
Two separate papers that show how neural networks can be trained to pinpoint when the precise shuffle of particles leads to a physical phase transition – something that could help scientists understand phenomena like superconductivity – were published on the same day earlier this month in Nature Physics.
Chemistry is a game for AI to play
Science has had an affair with AI for a while, said Marwin Segler, a PhD student studying chemistry under Professor Mark Waller at the University of Münster, Germany. However, until now, the relationship hasn’t been terribly fruitful.
Segler is interested in retrosynthesis, a technique that reveals how a desired molecule can be broken down into simpler chemical building blocks. Chemists can then carry out the necessary reaction steps to craft the required molecule from these building blocks. These molecules can then be used in drugs and other products.
A good analogy would be something like a “cooking recipe,” Segler told The Register. “Imagine you’re trying to make a complicated cake. Retrosynthesis will show you how to make the cake, and the ingredients you need.”
In the 1990s, before the deep learning hype kicked off, expert systems were used to perform retrosynthesis. Rules for reactions had to be manually programmed in: this is tedious work, and it never delivered any convincing results.
Now things are starting to look more promising with modern AI techniques. Retrosynthesis has strong analogies to puzzle games, particularly Go. Software can attempt to solve retrosynthesis problems in the same way it solves Go challenges: splintering the problem ahead into component parts and finding the best route to the solution.
All the viable moves in a Go match can be fanned out into a large search tree and the winning moves are identified using a Monte Carlo Tree Search – an algorithm used by AlphaGo to defeat Lee Sedol, a Korean Go champion.
Just like how AlphaGo was trained to triumph in Go games, Segler’s AlphaChem program is trained to determine the best move to find the puzzle pieces that fit together to build the desired molecule. The code is fed a library containing millions of chemical reactions to obtain the necessary bank of knowledge to ultimately break down molecules into building blocks.
“Chemists rely on their intuition, which they master during long years of work and study, to prioritize which rules to apply when retroanalyzing molecules. Analogous to master move prediction in Go, we showed recently that, instead of hand-coding, neural networks can learn the ‘master chemist moves’,” the AlphaChem paper [PDF], submitted to AI conference ICLR 2017 in January, reads.
There are thousands of possible moves per position to play on the Go board, just as there are multiple pathways to consider when trying to break down a molecule into simpler components.
AlphaGo and AlphaChem both cut down on computational costs by pruning the search tree, so there are fewer branches to consider. Only the top 50 most-promising moves are played out, so it doesn’t take a fancy supercomputer packing tons of CPU cores and accelerators to perform the retrosynthesis – an Apple MacBook Pro will do.
During the testing phase, AlphaChem was pitted against two other more-traditional search algorithms to find the best reactions for 40 molecules. Although AlphaChem proved slower than the best-first search algorithm, it was more accurate solving the problem up to 95 per cent of the time.
Segler hopes AlphaChem will one day be used to find new ways of making drugs more cheaply or to help chemists manufacture new molecules. It is possible the software will, in future revisions, reveal reactions and techniques humans had not considered.
It’s true that using AI is fashionable right now, and interest has piqued in science because of the hype, he said. “But on the other hand, it’s getting used more because it’s producing better results.”
“I coded the Monte Carlo Tree Search algorithm myself, but for the neural network stuff I used Keras,” Segler told us.
Although AI has been used in chemistry for over 40 years, it’s more challenging to apply it in chemistry compared to other subjects, Segler said. “Gathering training data is very expensive in chemistry, because every data point is a laboratory experiment. We cannot simply annotate photos or gather lots of text from the internet, as in computer vision or natural language processing.”
For one thing, a lot of medical-related data is kept confidential, and companies don’t generally share this information to chemists and biochemists for training systems.
From chemical labs to galaxies
Data is fuel for AI: the more information you can throw at a machine-learning package, the better it will perform. Unlikely the tightly regulated health world, astrophysics is more open with its data. The biggest space agencies on Earth, such as NASA or ESA, happily share details of discoveries made on their missions to propel space exploration and understanding of the cosmos.
This is all good news for engineers and scientists using AI software to tackle astrophysics challenges.
Space instruments take in massive amounts of data, too. The Large Synoptic Survey Telescope under construction in Chile is expected to haul in 3TB of data per night – a small amount compared to the Square Kilometre Array project at 15TB per second, Kevin Schawinski, professor at the department of physics at ETH Zurich, Switzerland, said.
“Astrophysics is a very fertile place as the large datasets are well suited for machine learning,” he told The Register.
Schawinski has long been fascinated with galaxies. He founded the Galaxy Zoo project, a crowdsourced project that invites people to help classify the shape of galaxies, with the help of Chris Lintott, presenter of the BBC’s The Sky at Night TV show.
The sight of millions of bright stars littered among swirls of interstellar dust, all set against a backdrop of obsidian nothingness, is beautiful to look at. We're used to seeing stunning photographs from the heavens – these are often digitally remastered from grainy, noisy originals snapped by telescopes. It is difficult to capture clear images, and not every snap from a telescope is a Kodak moment.
“It’s like having two worlds. One where the universe is perfect and you can see galaxies with infinite resolution, and the other where it’s imperfect to look at due to the noise and distortion,” Schawinski told The Register.
If only there were some software to intelligently clean up images from space, and pick out the features – such as star systems and galaxies – for boffins to study.
Schawinski stumbled across a paper about general adversarial networks (GANs) written by Ian Goodfellow, a researcher at OpenAI, who is credited with inventing the model along with his colleagues at the University of Montreal.
GANs work by using two neural networks – a generator and discriminator – that compete against each other. The system is fed an input of images and the generator tries to recreate the pictures, while the discriminator tries to distinguish if the computer-made images are real or generated.
It allows the researchers to tweak the generator to get images past the discriminator. Eventually, the real and fake images are very similar.
Schawinski realized GANs could get him closer to the idea of the perfect universe. “It clicked almost right away, that it could be used to bridge the gap between both worlds.” Working together with colleagues from his university's computer science department, Schawinski and his team began training his GAN-based GalaxyGan software on images of galaxies from the Sloan Digital Sky Survey. A paper on GalaxyGan [PDF] has also been submitted to ICLR.
Pairs of galaxy images are used during training. One image is high quality, and the other is degraded by blurring the image and adding noise to it. The GAN then tries to learn to recover the botched image by minimizing the difference between the recovered and non-degraded image. The idea is to train the software to fix up photographs distorted by physical effects and other interference.
A demonstration of GalaxyGan recovering a blurry image and a comparison with a deconvolution technique to restore the blurry picture ... Photo credit: Schawinski et al
“The GAN-recovered picture will never be better than or equal to the original unspoiled image,” Schawinski warned. “On one hand it’s great that you can make out some of the small features that were lost, but on the other it’s just a statistical reconstruction.”
It fails on rare features it hasn’t encountered much before, as it cannot reconstruct objects it hasn’t learned to recognize. Although using a GAN provides better results than using traditional deconvolution methods to correct distortion and noise, the network's performance is limited by its training set.
If it’s trained on spiral galaxies, then it won’t do so well on elliptical or irregular galaxies. “It’s a new method, and we need to calibrate it and see how it it behaves under different circumstances. It has the potential to be transformational, but we need to do more research,” Schawinski said.
The trend of using AI in science is growing, Schawinski told us. He has noticed interest growing among physics students who are keen to learn how to use the powerful algorithms. He plans to continue using machine learning, and has even set up a website that includes the code for the GalaxyGan and will list future projects for other researchers who are interested.
“AI provides another set of tools that we need to handle to make new discoveries. I think it’ll stick around, but it’ll disappear below the surface. Right now it’s still shiny and new, but after a while it’ll just be another tool people use,” he said. ®