Automated Hypothesis GenerationPosted: November 12, 2014
Hypothesis-driven analysis and improvement lies at the very heart of process excellence. Advances in computing are facilitating the creation of software that will digest huge quantities of data — studies, trials, even recipes — and identify paths of inquiry and tests that humans might never have identified because of the sheer quantity of information available. The Economist recently described these developments:
Apples, mushrooms and pork sounds a promising recipe for a kebab, but the average barbecuer might balk at adding strawberries. According to John Gordon of IBM, however, the result is delicious. Dr Gordon is one of the leaders of that firm’s cognitive-computing team, responsible for a machine called Watson which is able to digest and analyse large amounts of English text and then draw inferences from it. When, in March, Watson was fed reams of recipes and texts about food, it reasoned that these four ingredients would complement each other, based on their sharing a number of flavoursome chemical compounds. And Dr Gordon, at least, thinks Watson’s suggestion is a winner.
Devising new recipes sounds a trivial use for a multimillion-dollar piece of kit. But Dr Gordon’s culinary experiment neatly demonstrates the idea of automated hypothesis generation—and the possible uses of that are certainly not trivial. More than 90 groups of scientists are now developing hypothesis-generation software. They hope to use it not on recipe books but on the vast corpus of scientific literature (by one tally at least 50m scientific papers) that has piled up in public databases.
The power of the technique was demonstrated by research published in August by Olivier Lichtarge of Baylor College of Medicine, in Houston, Texas, and his colleagues. In collaboration with Dr Gordon’s group, they employed it to hunt for proteins called kinases that activate another protein, p53, which curbs the growth of cancers. They used the software to read the abstracts of 186,879 papers and produced a list of the most promising kinases for experiments. The twist was that the papers in question were all published before 2003. That meant Dr Lichtarge could check to see if the Watson-based approach came to the same conclusions as those arrived at by human researchers over the subsequent ten years. And it did. Of the top nine kinases the software picked, seven have subsequently been shown to activate p53.