Thursday, May 13, 2010

Testing Woese's hypothesis

John Wilkins took exception to my characterization of Theobald's paper as a test of creationist claims. He wrote,
It might be thought that the target here is creationism, and so it is taken by at least one “baraminologist” (a made-up term for creationist “taxonomy”), but actually it is a test of competing hypotheses in actual science, such as the claim made a lot lately, for example by Carl Woese and Mark Ragan among others, that the treelike structure of evolution is broken by lateral genetic transfer.
As I see it, though, Theobald hasn't tested Woese's hypothesis either. As I understand Woese's genetic annealing hypothesis, early cellular evolution was characterized by prominent horizontal transfer as much as by vertical transmission of genes. As time went on, vertical transmission came to dominate over horizontal, and what we recognize as phylogenetic lineages emerged. As a result, the very concept of "ancestor" as defined in a world of largely vertical transmission becomes dubious in a world of horizontal transfer.

In Woese's model, it's not the proteins that independently evolve de novo; it's the organismal lineages that evolve independently by putting together a metabolism out of modular units (genes and sets of genes) that have independent phylogenetic histories. I don't think Woese would deny the idea that significant sequence similarity indicates common ancestry. Quite the contrary, the fact that these proteins exist across all domains of life and that they give contradictory phylogenies would seem to be essential to his case. Theobald explicitly tested models where different proteins are allowed to have different phylogenies, and he found that these models were "greatly preferred" to models that constrained all proteins to have the same phylogeny. That result would seem to support Woese's model.

In this comment at Panda's Thumb, Nick Matzke claims, "Theobald’s main research is in fact exactly on detecting remote homologs, distinguishing convergence from homology in these kinds of remote cases, etc." That can't be the case either, since his test set is a group of proteins that share significant sequence similarity; they were identified by BLAST searches. They aren't remote homologues (proteins that are so divergent, they no longer share significant similarity). In fact, given that Theobald tests scrambled sequences (sequences that do not share significant sequence similarity) and finds that they are best described by independent trees, his results seem to be the opposite of Matzke's claim.
CORRECTION: Nick Matzke informs me that he was talking about Theobald's career not this paper specifically. I apologize for the mistake.

As I understand his work, Theobald has shown that proteins that share significant sequence similarity are better described by a single tree than by multiple trees. He has also tested the notion that all the proteins of his test set are best described by the same tree (which is rejected) and the idea that sequences that are not significantly similar are best described by a single tree (which is also rejected). He has given a formal likelihood analysis to Fitch's work from 40 years ago. Those are very important results and will be significant. This paper is likely to replace Fitch's paper as the go-to citation to support the inference of homology (common ancestry) from significant sequence similarity.

But I do not believe he has tested the idea of universal organismal common ancestry, simply because the model of independent origin that he tests is not a model that anyone seriously advocates.

If I am wrong, I'm sure I'll be corrected.

Feedback? Email me at toddcharleswood [at] gmail [dot] com.