The beginning of a long quest
It was the year 1856 when few limestone excavators working near Düsseldorf, Germany, unveiled bones that resembled to humans and initial analysts inferred them as belonging to a deformed human, citing their oval shaped skull, with a low, receding forehead, distinct brow ridges, and bones that were unusually thick. It was only subsequent studies that revealed that the remains belonged to a previously unknown species of hominid, or early human ancestor, that was similar to our own species, Homo sapiens. In 1864, the specimen was dubbed Homo neanderthalensis, after the Neander Valley where the remains were found.Neanderthals rose to prominance around 200,000 and 250,000 years ago and ruled the hills and grasslands of europe till extiction around 30000 years ago. The exact date of their extinction had been disputed but in 2014, a team led by Thomas Higham of the University of Oxford used an improved radiocarbon dating technique on material from 40 archaeological sites to show that Neanderthals died out in Europe between 41,000 and 39,000 years ago, with the last group disappearing from southern Spain 28,000 years ago.
Similarity of Neanderthals with Rhodesian Man (Homo rhodesiensis) made early investigators infer that they share similar ancestor but comparison of the DNA of Neanderthals and Homo sapiens suggests that they diverged from a common ancestor between 350,000 and 400,000 years ago, which some argue might be Homo rhodesiensis but this argument assumes that H. rhodesiensis goes back to around 600,000 years ago. However one can not rule out convergent evolutionary paths for the two hominids displaying feathres such as distinct brow ridges. Neanderthals settled in Eurasia, but not extending beyond modern day Israel. No neanderthal sites were observed in the African continent and Homo sapiens appears to have been the only human type in the Nile River Valley because of the warmer climate present in that period.
Are Neanderthals really extinct?
Sudden disappearnce of Neanderthals from Europe co-incides with the arrival of H. sapiens and this information prompted many scientists to suspect that the two events are closely linked, and humans contributed to the demise of their close cousins, either by outcompeting them for resources or through open conflict. The hypothesis that early humans violently replaced Neanderthals was first proposed by French palaeontologist Marcellin Boule (the first person to publish an analysis of a Neanderthal) in 1912. However according to a 2014 study by Thomas Higham and colleagues based on organic samples suggest that the two different human populations shared Europe for several thousand years. Therefore outright violent extinction seems less plausible and leads to the formation of two scenarios for Neanderthal extinction.
Possible scenarios for the extinction of the Neanderthals are:
Ancient DNA to the rescue
DNA sequence analysis of the fossils can reveal an entirely new world of information to us, but recovering DNA from samples that are fossilized thousands of years ago, is a daunting task in itself making ancient DNA research far from routine. The samples are prone to degradtion and contamination by DNA from other sources, and retriving data out of the ancient material is costly and painstaking work. At a more fundamental level, it requires determining whether the necessary samples even exist and, if so, how to get access to them.
An international group of Anthropologists from Max Planck Institute for Evolutionary Anthropology, Cold Spring Harbour Laboratories and Cornell University using various different methods of DNA analysis estimated an interbreeding to have happened less than 65,000 years ago, around the time that modern human populations spread across Eurasia from Africa. They reported evidences for a modern human contribution to the Neanderthal genome.
Martin Kuhlwilm, co-first author of the new paper, identified the regions of the Altai Neanderthal genome sharing mutations with modern humans. They found evidences of gene flow from descendants of modern humans into the Neanderthal genome to one specific sample of Neanderthal DNA recovered from a cave in the Altai Mountains in southern Siberia, near the Russia-Mongolia border.
Earlier studies have observed that DNA of modern humans contains 2.5 to 4 percent Neanderthal DNA. However studies conducted by Mendez et. al. revealed that no Neanderthal Y chromosomal DNA was ever observed in any human sample they have tested. Contemplating upon the observations they initially felt that the Neanderthal Y chromosome genes could have drifted out of the human gene pool by chance over the millennia, or there are possibilities that the Neanderthal Y chromosomes include genes that are incompatible with other human genes. Mendez, and his colleagues have found evidence supporting this idea, and they think that the two groups may have been reproductively isolated unlike thought earlier. Their study identified protein-coding differences between Neandertal and modern human Y chromosomes. Changes included potentially damaging mutations to PCDH11Y, TMSB4Y, USP9Y, and KDM5D, and three of these changes are missense mutations in genes producing male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation.
It is possible that these incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups. Thus Y-chromosomal studies have re-drawn the time-line of divergence of the two species ~4 million years ago, which according to previous estimates based on mitochondrial DNA put the divergence of the human and Neanderthal lineages at between 400,000 and 800,000 years ago.
New data emerging out of GWA studies could shed further light on the evolutionary history of the two hominids. In my opinion the image could resolve better if we look into the pathogen associated and immune response genes that we might have inherited or acquired during our evolutionary journey.
It was Charls Darwin in 1859 who first sketched the evolutionary tree in his book The Origin of Species, and since then trees have remained a central metaphor in evolutionary biology even the present day. Today, phylogenetics (Greek: phylé, phylon = tribe, clan, race +genetikós = origin, source, birth)– is the study of the evolutionary history and relationships among individuals or groups of organisms therefore evolutionary trees—have permeated within and increasingly outside evolutionary biology and fostering skills in reading and interpreting trees are therefore a critical component of biological education. Conversely, misconceptions and erroneous understanding of the evolutionary trees can be very detrimental to one’s understanding of the patterns and processes that have occurred in the history of life.
This article is aimed as an aide to students and enthusiasts to read and interpret a phylogenetic tree, however it does not intend to teach how to create one. We can discus that in a separate article later.
So what is an Evolutionary Tree anyway?
In the most simplistic terms, an evolutionary tree—also known as a phylogenetic tree/ cladogram is a 2D graph or diagram depicting biological entities (sequences or species) that are connected through common descent (i.e. their evolutionary relationship). Thus evolutionary trees provide us some basic information regarding: historical pattern of ancestry, divergence, and descent, by depicting a series of branches that merge at points representing common ancestors, which themselves are connected through more distant ancestors. Consider the tree shown below, here you and your siblings share a common ancestor (your parents) and your parents and aunt with their parents, however you and your cousins share the same ancestry but have divergent origins.
Components of a tree
A typical phylogenetic tree as shown above consists of the following components
What's the difference between a dendogram, a phylogenetic tree, and a cladogram?
For general purposes, not much, and many biologists, often use these terms interchangeably. However in the most general terms, tree diagrams are known as “dendrograms” (after the Greek for tree), cladogram only represent a branching pattern; i.e., its branch spans do not represent time or relative amount of character change. While in contrast,trees known as phylograms or phylogenetic trees present branch lengths as being proportional to some measure of divergence between species and typically include a scale bar to indicate the degree of divergence represented by a given length of branch.
Homology Vs Similarity
Now you may say that since closely related species share a common ancestor and often resemble each other, it might seem that the best way to uncover the evolutionary relationships would be with overall similarity? Surprisingly the answer would be No, and to understand why is it so? we will have to look deeper into the difference between similarity and homology.
Similarity may be misleading as because when unrelated species adopt a similar way of life, their body parts may take on similar functions and end up resembling one another due to convergent evolution and result in the formation of analogous features. One classical example is the wings of birds and bats. However when two species have a similar characteristic because it was inherited by both from a common ancestor, it is called a homologous feature (or homology). For example, the even-toed foot of the deer, camels, cattle, pigs, and hippopotamus is a homologous similarity because all inherited the feature from their common paleodont ancestor.
How to read a Phylogenetic tree?
Phylogenetic trees contain a lot of information which can be both qualitative and quantitative, and decoding them is not always straightforward and requires understanding of the above basic facts. Consider the hypothetical tree of different viruses shown below:
Qualitatively here the length of the branches in horizontal dimension gives the amount of genetic change, thus the longer the branch is, larger is the amount of change. While the quantitative information regarding the amount of genetic change is given by the bar at the bottom of the figure which acts as a scale for this. In this case the line segment with the number '0.07' shows the length of branch that represents an amount genetic change of 0.07. The units of branch length are nucleotide substitutions per site – that is the number of changes or 'substitutions' divided by the length of the sequence. The scale may also sometimes represent the % change, i.e., the number of changes per 100 nucleotide sites.
However the vertical lines joining the nodes has no meaning and is used simply to lay out the tree for better visual understanding.
Different presentation schemes of evolutionary trees
Unless indicated otherwise, a phylogenetic tree only depicts the branching history of common ancestry. The pattern of branching (i.e., the topology) is what matters here. Branch lengths are irrelevant. Thus, the three trees shown in here all contain the same information.
This might seem confusing to you at first, but however do remember that that the lines of a tree represent evolutionary lineages--and evolutionary lineages do not have any true position or shape. Therefore it doesn't matter whether branches are drawn as straight diagonal lines, or are kinked to make a rectangular tree, or are curved to make a circular tree.
To further simplify the concept, consider them as flexible pipes rather than rigid rods; similarly, nodes as swivel joints rather than fixed welds. The basic rule is that if you can change one tree into another tree simply by twisting, rotating, or bending branches, without having to cut and reattach branches, then the two trees have the same topology and therefore depict the same evolutionary history.
Most of us grow up listening, reading and learning the fact that we "Human" beings can boast of being the most evolved and a higher organism. Earlier than 1960, the image below would have been correct and sensible. However our idea of supremacy in terms of genome size and number of genes takes a flak when we first started looking at the complexity of genome size, it was soon realized that the large genomes were often composed of huge chunks of repetitive DNA, while only a few percent of the genome in these organisms were unique.
Now lets have a look at the past and try to figure out the origin of this concept.
Classically biologists recognize that the living world comprises two types of organisms. Prokaryotes and Eukaryotes. Assuming that you already know what prokaryotes and eukaryotes are, I am not going to dive into the difference between the two.
So what is C-value?
'C-value ', of an organism is defined as the total amount of DNA contained within its haploid chromosome set. Prokaryotic cells typically have genomes smaller than 10 megabases (Mb), while the genome of single cell eukaryote is typically less than 50Mb. Therefore for simplicity's sake here we are not comparing the genomes from two classes of organisms together.
However eukaryotes alone show immense diversity among their genome sizes, from the smallest eukaryote being less than 10 Mb in length, and the largest over 100 000 Mb, and all these observation seems coinciding to a certain extent with the complexity of the organism, the simplest eukaryotes such as fungi having the smallest genomes, and higher eukaryotes such as vertebrates and flowering plants having the largest ones.
So it seems fair to think that, complexity of an organism is related to the number of genes in its genome - higher eukaryotes need larger genomes to accommodate the extra genes. However, in fact this correlation is far from being precise: if it were, then the nuclear genome of the yeast S. cerevisiae, which at 12 Mb is 0.004 times the size of the human nuclear genome, would be expected to contain 0.004 × 35 000 genes, which is just 140. In fact the S. cerevisiae genome contains about 5800 genes! Therefore for many years this lack of precise correlation between the complexity of an organism and the size of its genome was looked on as a bit of a puzzle, and called as C-value paradox/C-value enigma.
Questions raised by C-value paradox
The C-value paradox not only represents one question, but it rather raises three of them, as suggested by T. R. Gregory(2007), 1) the generation of large-scale variation in genome size, which may occur by continuous or quantum processes, (2) the non-random distributions of genome size variation, whereby some groups vary greatly and others appear constrained, and (3) the strong positive relationship between C-value and nuclear and cell sizes and the negative correlation with cell division rates. Therefore any proposed solution must try to solve these three problems as well.
Now we biologists are pretty good at dividing ourselves among different school of thoughts (remember the RNA and DNA world!) and the C-value paradox wasn't an exception either. Nonetheless two school of thoughts emerged here too, one proposing Mutation pressure theories and the other proposing Optimal DNA theories.
The table shown below summarizes the theories proposed along with their proposed mechanism. Each theory can be classified according to its explanation for the accumulation or maintenance of DNA (MP,mutation pressure theory ; OD, optimal DNA theory) and according to its explanation for the observed cellular correlations (CN, coincidental, CE, coevolutionary, CA, causative). Note that these theories are not necessarily mutually exclusive in all respects, since the optimal DNA theories do not specify the mechanism(s) of DNA content change and can include those presented by both mutation pressure theories. (ref: GREGORY, T. R. (2001),page # 69)
So what is the most plausible explanation of C-value paradox?
In 1980s, two landmark papers, by Orgel and Crick and by Doolittle and Sapienza, established a strong case against 'selfish DNA elements' which we better know as transposons. They proposed that ‘selfish DNA’ elements, such as transposons, essentially act as molecular parasites, replicating and increasing their numbers at the (usually slight) expense of a host genome, i.e these elements functions for themselves while providing little or no selective advantage to the host. Computational genomic studies have shown that transposable elements invade in waves over evolutionary time, sweeping into a genome in large numbers, then dying and decaying away leaving the 'Junk DNA' in its trail. 45% of the human genome is detectably derived from these transposable elements. Therefore we can say that C-value paradox is mostly (though not entirely) explained by different loads of leftovers from transposable elements and larger the genomes longer is the trail leftover by transposons.
So, if the C-value paradox is explained and rested for good, why dig it up again?
Recent publications discussing the outcome of ENCODE (Encyclopedia Of DNA Elements) project suggest that 80% of the human genome is reproducibly transcribed, bound to proteins, or has its chromatin specifically modified. Moreover the previously considered junk DNA is found to be biochemically active disapproving the 'Junk DNA' theory.
Now in the light of ENCODE data it is pertinent that scientists need to come up with a alternative hypothesis capable of explaining C-value paradox, for mutational load, and for how a large fraction of eukaryotic genomes is composed of neutrally drifting transposon-derived sequences.