Chimp genome again
Over at UD, a poster called niwrad thinks he's demonstrated that the human and chimp genomes are not really as similar as they really are. I briefly contemplated getting his code and showing how he was wrong, but I don't think that's a profitable use of my time, given my current schedule. So I'll just say that from his description, he never indicates that he actually aligned the chromosomes that he's testing (maybe he did, but I didn't see it in the description), which will inevitably lead to rubbish results. Further, the distance metric he uses doesn't appear to model indels, so that will also lead to further degradation of the similarity. If he actually used a real sequence alignment algorithm (like Smith-Waterman), he'd find the same thing the rest of us have found: the chimp and human genomes are >97% identical.
But what about that paper by Britten? The one where he showed that the 99% identity figure had ignored important indel data and therefore should be revised down to 95% identity? Britten was wrong. His strategy of counting indels doesn't actually make any sense at all. Consider a simple example. Say you have two sequences, one 50,000 nucleotides long and the other 55,000 nucleotides long. The only difference between them is a single insertion of 5,000 nucleotides. Otherwise, the sequences are identical. What then should the percent identity be? Should it be 90%, counting the 5000 nucleotide difference as 10% of the smaller sequence? Or should it be 91%, counting the 5000 nucleotide difference as 9% of the total sequence in comparison (55,000)? Neither one makes any sense, since the reality is that there is only one difference between the sequences. It's a single insertion or deletion, representing one mutation. Why should we count that as 5000 differences when there's only one mutation?
Once you realize that we're only talking about one difference, then the problem becomes essentially intractable. If there's only one difference between the sequences, what is it a percent of? The total number of nucleotides? That doesn't make sense. It's apples and oranges. That's why you don't see actual genome researchers getting that excited about boiling down the similarity between two genomes to a single number. It's just not possible.
On the other hand, if you specify precisely what you mean, you can talk about the number of nucleotide mismatches between two genome sequences at some kind of optimal alignment (which, of course, is debatable as to how you get that optimal alignment). When you do that with the human and chimp genomes, the percent identity is well north of 95%. When you realize that there is no single human genome and start discounting polymorphisms from your counts, then the actual fixed nucleotide mismatches between humans and chimps are probably less than 1%, making a percent identity of >99%.
Why all the hubbub? Got me. There seems to be this powerful myth among antievolutionists that somehow the 99% identity figure is just propaganda. As niwrad said,
Finally, that claim about "strong ideological motivation for minimizing the differences between humans and chimps" is not true in my case since I'm a creationist. How do you explain my confidence, as an expert in comparative genomics, that humans and chimps really do have nearly identical genomes? I'll tell you how I explain it: It's a creationist coming to grips with reality.
Don't even get me started on the strong ideological motivation of ID advocates for denying the demonstrated similarity between humans and chimps...
Feedback? Email me at toddcharleswood [at] gmail [dot] com.
But what about that paper by Britten? The one where he showed that the 99% identity figure had ignored important indel data and therefore should be revised down to 95% identity? Britten was wrong. His strategy of counting indels doesn't actually make any sense at all. Consider a simple example. Say you have two sequences, one 50,000 nucleotides long and the other 55,000 nucleotides long. The only difference between them is a single insertion of 5,000 nucleotides. Otherwise, the sequences are identical. What then should the percent identity be? Should it be 90%, counting the 5000 nucleotide difference as 10% of the smaller sequence? Or should it be 91%, counting the 5000 nucleotide difference as 9% of the total sequence in comparison (55,000)? Neither one makes any sense, since the reality is that there is only one difference between the sequences. It's a single insertion or deletion, representing one mutation. Why should we count that as 5000 differences when there's only one mutation?
Once you realize that we're only talking about one difference, then the problem becomes essentially intractable. If there's only one difference between the sequences, what is it a percent of? The total number of nucleotides? That doesn't make sense. It's apples and oranges. That's why you don't see actual genome researchers getting that excited about boiling down the similarity between two genomes to a single number. It's just not possible.
On the other hand, if you specify precisely what you mean, you can talk about the number of nucleotide mismatches between two genome sequences at some kind of optimal alignment (which, of course, is debatable as to how you get that optimal alignment). When you do that with the human and chimp genomes, the percent identity is well north of 95%. When you realize that there is no single human genome and start discounting polymorphisms from your counts, then the actual fixed nucleotide mismatches between humans and chimps are probably less than 1%, making a percent identity of >99%.
Why all the hubbub? Got me. There seems to be this powerful myth among antievolutionists that somehow the 99% identity figure is just propaganda. As niwrad said,
Supporters of the neo-Darwinian theory of evolution have a strong ideological motivation for minimizing the differences between humans and chimps, as they claim that these two species evolved from a common ancestor, as a result of random mutations filtered by natural selection.In reality, that also makes no sense. There is no motivation to exaggerate the similarity of chimps and humans, because the actual percent identity (whatever that is) makes no difference at all. The only reason chimps get the position as humanity's sister taxon among the apes is the repeated observation that they are the most similar of the apes to humans. It doesn't matter what the particular similarity actually is, chimps are the most similar to humans. Furthermore, it doesn't even matter that chimps are the most similar. If gorillas were the most similar, everyone would be talking about human-gorilla genome comparisons.
Finally, that claim about "strong ideological motivation for minimizing the differences between humans and chimps" is not true in my case since I'm a creationist. How do you explain my confidence, as an expert in comparative genomics, that humans and chimps really do have nearly identical genomes? I'll tell you how I explain it: It's a creationist coming to grips with reality.
Don't even get me started on the strong ideological motivation of ID advocates for denying the demonstrated similarity between humans and chimps...
Feedback? Email me at toddcharleswood [at] gmail [dot] com.