Selection in human and primate genomes

Several years ago, I was discussing possible research projects with a creationist who happened to be quite taken with the issue of natural selection. I thought the evidence of seletion in the human and chimp genomes would make an excellent topic to evaluate, specifically by expanding the sample of genomes (to gorilla or orang) to see if the number of genes exhibiting evidence of selection holds up. My suggestion was not well received, but maybe I didn't communicate it very well. With a few recent papers touching on this topic, it seemed like a good topic for a post. Maybe I can explain why these types of study have value.

How do we detect natural selection in genome sequences? One method (by no means the only one) is to look for genes that have an unusual number of changes to the coding sequence (estimated by the dN/dS ratio). Basically, any mutation in a protein-coding gene can change the protein sequence (a nonsynonymous mutation) or not (a synonymous mutation). This happens because just about all amino acids can be coded by more than one codon (a triplet of nucleotides that specifies an amino acid, the building block of proteins). For example, if you have a mutation that changes a codon from CAA to CAG, the protein sequence will not change since CAA and CAG both specify the amino acid glutamine. That's a synonymous mutation. Mutate CAA to CAT, and the amino acid changes from glutamine to histidine, a nonsynonymous mutation. Now let's say you're comparing human and chimp genes and find two that have very similar DNA sequences (say 95%) but surprisingly different proteins sequences (say only 80%). That means that the nucleotide differences are predominantly nonsynonymous, which is surprising. This is interpreted as evidence that either the human or chimp has experienced a kind of positive selection for mutations that actually change the protein sequence.

There have been a number of studies of this since the chimp genome became available, and lots of genes have been found that are very similar in the DNA but produce proteins that are much less similar. Why would this be important? Well, if selection of genes is the mechanism of evolution, and if humans are supposed to be the product of evolution, then we should not be surprised to find evidence of selection in the human genome. If humans are a special creation unrelated to primates, how would a creationist explain this evidence of positive selection? And don't say, "God made it that way," because that's a cop out. God could have made humans and chimps some other way, but obviously He chose to make us with these genes. This is a corollary to the general problem of why the human and chimp genomes are so similar, which is itself a corollary of the even more general problem of biological similarity (AKA homology). It's unlikely we're going to get any shocking breakthroughs on biological similarity studying selection in humans and chimps, but perhaps we can clarify the magnitude of the problem of selection evidence in these two genomes.

Two recent papers have done just that, and the outcome is kind of what I expected. Since these kind of genome surveys are still relatively new, I expected that some of the evidence would turn out to be false positives: genes that met the statistical criteria for positive selection but did so for reasons other than positive selection. That's not to say that I expected all of these genes to turn out to be bogus, but I figured the problem might get smaller as more analyses and better data became available. In a paper in the January PLoS Biology, Berglund et al. reported that some genes that look like they underwent positive selection in the human genome display a "biased substitution pattern" that they argue was caused by some kind of recombination. They conclude "This process [biased substitution] may have led to the increased fixation of replacement amino acid changes on the human lineage, and may bias tests of positive selection."

In the May 2009 issue of Genome Research, Mallick et al. looked at claims that the chimp genome has experienced more positive selection than the human genome. They especially focused on genome sequence quality, which is a pretty important topic. When dealing with the human and chimp genomes, the similarity is so high that even relatively minor sequencing errors can really mess up your statistical tests. As Mallick et al. put it,
Genome-wide scans search tens of thousands of genes for unusual patterns, and then focus on the extreme tail of outlying genes as candidates for selection. However, this strategy can be confounded by even a tiny rate of error in the underlying data, if the error can masquerade as the signal that is being sought. A small error rate can produce enough genes with apparently unusual signals, to outnumber true signals.

They used a stringent method to create new sequence alignments that excluded regions of dubious alignment or sequence quality, and they found that only a few genes previously identified as positively selected in the chimp genome passed their test for positive selection (1 of 49 in one sample and 5 of 10 in another sample).

Please understand that I'm NOT saying that all evidence of positive selection in the human genome is just error. (I write that sentence knowing full well that some slacker will read this and say just the opposite anyway. Sigh.) What studies like these do is refine the evidence for positive selection by eliminating false positives. Specifying just which genes have the best evidence of positive selection will help creationists devise alternative explanations. We can avoid running down rabbit trails, trying to explain positive selection in genes for which the evidence of positive selection is dubious.

Doesn't that seem important to you? Me too.

Berglund et al. 2009. Hotspots of biased nucleotide substitutions in human genes. PLoS Biology 7(1):e1000026. doi:10.1371/journal.pbio.1000026.

Mallick et al. 2009. The difficulty of avoiding false positives in genome scans for natural selection. Genome Research 19:922-933.