Tuesday, November 23, 2010

Testing universal common ancestry?

There's a new paper in Biology Direct from Koonin and Wolf that aims to refute Theobald's assertion that he's formally tested the hypothesis of common ancestry (see his Nature paper for details). Here's my original assessment of Theobald's paper:
In the end, I think Theobald has actually shown that different proteins (that are not significantly similar) are more likely to have different origins than to have a common origin. He has tested the hypothesis of universal common ancestry of all proteins. ...we may infer that protein similarity is the result of common ancestry because protein similarities form a single, sensible tree. Just like Theobald showed that protein evolution models that include all similar proteins on a single tree are preferred to those that put them on separate trees.
Now Koonin and Wolf say basically what I said: Theobald did not test common ancestry independently of significant sequence similarity. Instead,
Alignments of statistically similar but phylogenetically unrelated sequences successfully mimic the purported effect of common origin. Thus, the nature and origin of the similarity between the aligned sequences are irrelevant for the prediction of "common ancestry" of proteins under Theobald’s approach. Accordingly, common ancestry (or homology, in the modern, post-Darwinian sense) of the compared proteins remains an inference from sequence similarity rather than an independent property demonstrated by the likelihood analysis.
How did they reach this conclusion? Here's a nutshell version as I understand it. They began with an alignment of universal protein sequences (i.e. sequences found in all organisms). From each column of the alignment, representing a conserved amino acid position, they calculated amino acid frequencies. They then generated a random column for a new alignment based on those amino acid frequencies. The result, they say, is statistically significant sequences with no true phylogenetic signal. In principle, this sounds like it would work, but they don't actually show that their random sequences are significantly similar and without phylogenetic signal. So I have that reservation at least. But again, in principle, this method should indeed generate significantly similar protein sequences without phylogenetic signal.

Using these randomized sequence alignments, they then generated trees using PhyML and compared them with the likelihood ratio test leading to the extremely important finding that significantly similar protein sequences are much more likely to "share a common ancestor" (i.e., fit a tree) than not (i.e. fit better on separate trees). The log likelihood differences are on the order of 500, which is enormously significant.

Koonin and Wolf close their paper with some interesting qualitative evidences of common ancestry, and they conclude that "formal demonstration of UCA [Universal Common Ancestry], independent of the assumption that universally conserved orthologous proteins with highly similar sequences actually originate from common ancestral forms, remains elusive and might not be feasible in principle."

I can't emphasize enough how enormously important this result is. I'm actually going to test this for myself to confirm their result. It seems to me that the question for creationists really is what is our alternative model for explanation of significant sequence similarity? Since the test of one tree vs. separate trees works for proteins that don't even belong on a tree, Theobald's test really does not test common ancestry. That opens the field again for creationists to devise an alternative hypothesis.

But then again, seems like I said that already...

Koonin and Wolf. 2010. The common ancestry of life. Biology Direct 5:64.

Theobald. 2010. A formal test of the theory of universal common ancestry. Nature 465:219-222.

Feedback? Email me at toddcharleswood [at] gmail [dot] com.