Ape Genome Phylogeny Figure
To estimate which percentage of the genome is concordant with the established (((human, chimp), gorilla), orangutan) tree, I needed to infer phylogeny from multiple sites spread throughout the genome. I downloaded the multiple alignment of vertebrate genomes from UCSC Genome Browser, wrote a Perl script to grab regions of the chromosomes that were greater than 8000 bp long (a previously-determined cutoff that should result in about 27 million basepairs analyzed total), and aligned the sequences of human (hg19), chimp (panTro2), gorilla (gorGor1), orangutan (ponAbe2), and a macaque outgroup (rheMac2). I then used RAxML to infer the phylogeny for each region. Another Perl script grabbed the best trees from the RAxML outputs and sorted them into tree topology categories. Finally, an R script drew the corresponding color coded bands on a chromosome ideogram, seen below. The R script makes use of the GenomeGraphs package in Bioconductor (user guide [PDF], academic paper).
2299 out of 2319 loci support some variation of African ape monophyly. The color legend is as follows:
- Red = (((Human, Chimp), Gorilla), Orangutan)
- Blue = (((Human, Gorilla), Chimp), Orangutan)
- Green = (((Gorilla, Chimp), Human), Orangutan)
- Black = Regions that do not support African ape monophyly