Saturday, August 21, 2010


Every couple of weeks I get a phone call from pollsters who say they want to know my views about radio stations or stores or political candidates. Hard though it may be on my ego, the pollster really doesn’t care about my personal views about these things. The pollster is interested in the average views of the population, and my views are a fraction of a percent of that bigger picture. A minimal knowledge of statistics will tell you that such census data can lead to error if you try to predict individual characteristics from the averages of a population. Very few families actually have 2.2 children, or 1.7 cars.

Despite this, microbiologists have relied on averages from census data for decades. Really, most of our understanding of the cell is based upon taking the output of trillions of cells, and making a picture of an imaginary “average” cell. There are practical reasons for this, mainly, the difficulty of measuring femtograms (10-15 grams) of product or proteins that are present in only a couple of copies per cell. As we'll see, there are good scientific reasons as well.

I am interested in studying gene expression—how genes in DNA get transcribed to make a messenger RNA, and how that RNA gets translated to make a functional protein that does stuff for the cell. I have never looked at how an individual bacterial cell does this. I have always studied trillions of cells. If a gene is being highly expressed, this isn’t a problem. A gene could be transcribed on average a hundred times per minute per cell—a cell-to-cell variation of five transcripts per minute is negligible. But many genes are weakly expressed, transcribed on average once every half hour. Now cell-to-cell variation becomes an enormous factor: one cell could transcribe a gene four times in one hour, another cell not at all in the same time.

These differences become more pronounced as we follow the process of gene expression. The product of transcription, messenger RNA, persists in the cell for only a few minutes, during which it can be translated a dozen times. The protein that is produced by translation is generally stable, and persists for over an hour. So, in the previous example where one cell has transcribed a gene four times in an hour while another hasn’t transcribed it at all, the first cell could have hundreds of copies of the resulting protein, while the second cell will have none. The average, in this case, is completely unrepresentative of either real cell.

Recently, thanks to innovations in microscopy and biotechnology, it’s become possible to examine transcription and translation in single bacterial cells. We can now get past the distortions of the “average” cell, and see how real cells express genes.

First off, expressing a gene requires transcription: making an RNA copy of the gene’s DNA. The presence of a specific RNA within a cell can now be detected by using a fluorescent dye attached to a short snippet of DNA that will specifically attach to the RNA in question. If the RNA is present—if the specific gene has been transcribed—the cell containing it will glow, and if the RNA is not present—the specific gene has not been transcribed, or the RNA has decayed—then the cell will not glow. This method is sensitive enough that we can tell the difference between cells with one or two or ten copies of the RNA by measuring differences in how brightly the cells glow.

Second, expressing a gene requires translation: making a protein using the instructions in the RNA. To see when protein was made, researchers use a clever trick (a trick whose invention won the Nobel Prize). As the protein of interest is made in the cell, it gets combined with a molecule of a protein called “Yellow Fluorescent Protein”, or YFP. True to its name, YFP glows yellow, so more of the specific protein means a brighter, yellower cell. Just like with RNA, this yellow glow can be measured, indicating anywhere from a few copies of the specific protein to thousands of copies.

These techniques had already been used to examine expression of single genes in a cell; in a new study, researchers examined expression of over a thousand genes in single cells of the common bacterium E. coli. To do this, they used a over a thousand different lineages of E. coli, each with one gene modified so its protein would be attached to YFP. They then built a special microscope slide that would send each of these bacteria past a detector, one by one, over and over again. This is amazing. It was all done robotically—moving the cells, counting the cells, measuring fluorescence, calculating the number of fluorescent RNAs or proteins per cell, everything—at about 8,000 cells per minute.

Their data is more reassuring than it is surprising; this paper is more of a technological tour de force than a leap into the unknown. They confirm that a completely “average” cell is no more a reality than a completely average human. Instead, the number of copies of each protein per cell varied according to a mathematical distribution. Some proteins are well represented, with thousands of copies per cell, others a scarcely found in any cells at all:

The “average” cell has 500 copies of the Adk protein, 300, copies of the AtpD protein, and 1 copy of the YjiE protein. But, less than one cell in 10 is average for the Adk or AtpD proteins, and only a third of cells are average for the YjiE protein.

It’s unlikely for a cell to be “average” for even a single trait, much less for every trait. The tendency to be different from average is technically called “noise,” and what the researchers found was that for relatively scarce proteins (less than about 10 copies per cell, such as YjiE), less expression meant more noise. For very rare proteins, there was enough noise that the idea of an “average” cell became meaningless. Interestingly, with more common proteins, noise didn’t diminish as proteins became much more common. Rather, the noise became a constant:

These results provide a useful reminder for microbiologists assaying trillions of cells (as well as for telephone pollsters trying to tap the pulse of the populace). The large population of bacteria I experiment upon may be genetically identical. However, their protein composition is extremely diverse. This means that the individuals in a large, genetically identical population will respond differently to their environment, and could conceivably start down completely different evolutionary paths as a result. This is all due to noise in gene expression, which is ultimately due to the chance events of molecules bumping into each other.

There is another useful reminder in these results, and paradoxically it has to do with the great value of that mythical average. Remember that expressing a gene requires transcription, to make a molecule of RNA. For a specific gene, this can happen frequently or rarely. Then, the RNA must be translated to make a protein, which will last for hours. A single RNA can be translated dozens of times, but most RNAs spontaneously decay within a couple of minutes. Thus, it’s quite possible to find a cell that has transcribed a gene to make RNA, and that RNA has been translated to make protein—and after a few minutes, the RNA has all decayed, but the protein is still around. This study found plenty of such cases; the data for one is shown here:

Each dot in this picture represents the RNA and protein for a single gene in a single cell; the cell represented by the dot with a square around it had 10 copies of the RNA from this gene, and two thousand copies of the protein. The cell represented by a dot with a circle around it had no copies of the RNA from this gene—they had all decayed—but nearly ten thousand copies of the protein. In individual cells, there is no—zero—correlation between RNA and protein.

The concept of gene expression requiring transcription and then translation to go from information in DNA to functional protein is fundamental to our understanding of biology—so much so that it is referred to as the “Central Dogma of Biology.” It is also a concept that would be completely unattainable if one only had this sort of data from individual cells. It’s only by looking at the averages that we can see these general themes. Indeed, for someone who makes a living teaching the central dogma, it’s reassuring to see this graph, comparing average levels of specific RNAs and their corresponding proteins over thousands of genes in thousands of cells:

The central dogma works! There is a correlation between RNA and protein!

So census data that averages out the characteristics of millions is useless for understanding individuals, but essential for understanding underlying truths. Data squeezed out of a single individual tells you the state of that individual, but can be misleading when used to derive underlying truths. This applies to bacterial cells and voters and consumers alike.

This notion has another application, one close enough to home to give me pause. The bacteria used in this study were genetically identical. Noise, ultimately traceable to the random jiggling of molecules in the cell, took this genetically identical population of bacterial cells and made them into a widely diverse population, each behaving in a unique, idiosyncratic manner. The same thing happens in your brain. A thought is no more than the ebb and flow of chemicals between trillions of genetically identical neurons. We can take the average behavior of all neurons, and predict that a certain stimulus will lead to a certain response in an individual cell. But noise has veto power over this program. What I do, what I think, what I think of as my personality—the noise of random collisions between molecules leaves its mark on all of these.

Yuichi Taniguchi, Paul J. Choi, Gene-Wei Li, Huiyi Chen, Mohan Babu, Jeremy Hearn, Andrew Emili, X. Sunney Xie (2010). Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells. Sicence 329: 533-538.

No comments:

Post a Comment