Wednesday, September 28, 2011

Proteins, Puzzles, and Perjury

There was a bit of news last week that generated headlines such as “Gamers Solve Problem that Stumped Scientists.” As always with science by press release, the reality is cool but not that cool.

The “Protein Folding Problem” is one of the most damnable problems facing biology. It would really be nice to reliably predict protein structures. Knowing the structure of a protein allows us to understand how the protein works, so we can do useful things like design effective drugs. However, precisely determining the 3-D structure of a protein is extremely time-consuming, fiddly work that has a low probability of success. So, there’s a lot of interest in using computers to predict the 3-D structure of a protein.

The problem is this: genes encode proteins, and we can easily “read” a gene to predict the linear sequence of amino acids in a protein. However, a linear sequence of amino acids is useless: it must fold on itself in an often-incredibly complicated structure to make a functional protein. Starting with a linear sequence—basically a string—there’s a nearly infinite number of three-dimensional structures that are possible. Some possible shapes can be eliminated, since certain amino acids in the string don’t want to be near each other or near water. Some other possible shapes are more likely, since certain amino acids in the string want to be near each other, or near water.

In principle, those simple rules should make it possible to predict how a linear sequence of amino acids will fold to make a protein. However, a typical protein is made of several hundred amino acids. So, while computers are OK at predicting structures of very short fragments of proteins, predicting the structure of a protein requires more power. Lots of power—the number of possible ways a typical protein can fold far exceeds the number of possible moves in a game of chess (about 1046), so IBM built a successor to the chess-playing “Deep Blue” supercomputer and called it “Blue Gene,” intending it to work on this problem. Blue Gene has been among the most powerful supercomputers for several years, but it still is far from efficient at predicting protein structures

A somewhat more effective approach to “the protein problem” has been to use distributed computing—borrowing time on hundreds or thousands of networked PCs when their owners are not using them. SETI@home, which screens huge amounts of radio telescope data for potential signals of extraterrestrial life, is a famous example of this. Biochemists have Rosetta@home, which uses the same approach to predict protein structure. This venture has actually produced some predictions which jibed pretty well with the actual structures. But Rosetta is still limited; being a computer program, it relies on brute force and wastes resources looking at possibilities that are “stupid.”

One way to get around this problem is to borrow from humans something that computers lack: intuition. This has been the approach of the creators of “Foldit,” a program that turns the protein folding problem into a game. Players are given a snippet of a protein, and (not needing to understand anything about Van der Waals forces or acid-base interactions), jiggle it around until it reaches a very stable conformation—which corresponds to a high score. As the authors of the paper that made the headlines say, this program uses the power of games…

“to channel human intuition and three-dimensional pattern-matching skills to solve challenging scientific problems. Although much attention has recently been given to the potential of crowdsourcing and game playing, this is the first instance that we are aware of in which online gamers solved a longstanding scientific problem. These results indicate the potential for integrating video games into the real-world scientific process: the ingenuity of game players is a formidable force that, if properly directed, can be used to solve a wide range of scientific problems.”

So what did the gamers actually do? They started with a bunch of predicted structures for one protein, generated by Rosetta@home, and tweaked them. Once the actual protein structures were experimentally determined (again, a terribly painful and difficult task), the gamers’ predictions were noticeably better than Rosetta’s. Here’s a picture comparing their results with the actual structure—the linear string of amino acids is sometimes presented as a flat ribbon, sometimes as a noodle; it can curl up like a telephone cord, or lie flat in a sheet, but this picture shows one string.

The red ribbons represent the predictions of Rosetta; the yellow represent the predictions of the gamers; and the blue is the real structure of the protein. All three are superimposed. In almost all parts of the protein, the yellow, gamers’ structure is closer to the real, blue structure than the red, Rosetta structure. Bravo gamers! However, it is worth noting that the gamers started from structural predictions by Rosetta, and there are still places where neither Rosetta nor the gamers predicted reality very well.

This result leaves the protein structure problem in an interesting place. On the one hand, progress could be made by using more of that intangible, unquantifiable whatzit, human intuition. However, this is not intellectually satisfying; it would be nice to say that we really understood the rules of protein folding—and if we could understand them, we could teach these rules to a sufficiently powerful computer. After all, a computer has no intuition, but then again, nor does a string of amino acids, which just follows the rules of physical law. So, clearly, we need bigger more powerful computers which can more closely simulate reality.

This seemed like an insurmountable challenge—only so many people will join with a distributed network such as Rosetta@home, and machines much bigger than Blue Gene are prohibitively expensive. However, Felix Balatro and his coworkers at Miskatonic University and in the Ukraine arrived at a devious solution to the problem. In a series of stunning papers starting in the December 2011 issue of the (admittedly rather obscure) Ukrainskii Zhurnal Tsilkovita Durnitsya , Balatro predicted the structure of a half-dozen difficult proteins with unprecedented accuracy.

These results were not widely reported in the popular news, but they raised a lot of questions in academia. After all, Miskatonic was not known as a computer science powerhouse, and the Ukrainian group seemed suspiciously difficult to contact for discussion about methods. Nonetheless, the results kept coming in the early part of 2012, and the predictions only gained in sophistication. In fact, one of the predictions was actually used to develop an anti-retroviral drug.

The curtain was finally lifted on the mystery by the German weekly der Zwiebel. The elusive Ukrainians were a front group for an organized crime syndicate that rented out time on the botnet of more than seven million computers infected with the “Conficker” worm. Balatro realized that this botnet was by far the world’s largest distributed computing network, and that its masters—although very punctilious about their payment schedule—were essentially in the business of renting computing power. Granted, nearly all of their other customers were criminals, and the power was typically used for card-hacking and DDOS attacks, but the rates were very cheap and the programmers very clever. Balatro arrived at the conclusion that this was the best way he could use his insubstantial research funding.

This disclosure left the scientific community, and society as a whole, in a quandary. Some demanded that Balatro’s papers should be retracted—but they couldn’t say exactly why, since the results were valid and there were no obvious conflicts of interest. Some prosecutors wanted to bring suit—but there really weren’t any injured parties, and no US laws were broken. An intriguing new avenue for drug design had been suggested by some of his results—but would such a drug be ethically tainted?

Although the scientific worth of Balatro’s results remains unchallenged, the ethical clouds surrounding the results continue to gather. An anonymous whistleblower recently revealed to der Zwiebel that DARPA actually considered and partially developed a worm that would allow it to run simulations of atomic weapon tests at low cost. Balatro himself provides the most recent puzzle; he was unexpectedly absent for the first day of his own class in the summer 2012 session at Miskatonic University, and the university administration has not been able to get in contact with him for over a month. There is concern that the Ukrainians did not appreciate the attention he drew to them, or worse—that he failed to make a payment.

Allen, F., et al (2001). Blue Gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal 40: 310-327.

Firas Khatib, Frank DiMaio, Foldit Contenders Group, Foldit Void Crushers Group,

Seth Cooper, Maciej Kazmierczyk, Miroslaw Gilski, Szymon Krzywda, Helena Zabranska, Iva Pichova, James Thompson, Zoran Popović, Mariusz Jaskolski, David Baker (2011). Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural and Molecular Biology. Published online 18 September 2011; doi:10.1038/nsmb.2119.

Balatro, Felix, and Українська асоціація обманщики (2012). You shouldn’t believe everything you read. український журнал цілковита дурниця 22: 18-41.

Balatro, Felix, and Українська асоціація обманщики (2012). It’s probably a good idea to run these author names through Google translate. український журнал цілковита дурниця 22: 138-141.

Balatro, Felix, and Українська асоціація обманщики (2012). Miskatonic University may ring a bell for sci-fi fans. український журнал цілковита дурниця 23: 77-91.

No comments:

Post a Comment