Morning Overview

Ghost lineages: the ancient DNA hiding in our genes today?

Fragments of DNA from long-extinct human relatives still circulate in modern genomes, and in some cases they do more than linger. They actively shape how people survive in extreme environments. The clearest example comes from the Tibetan Plateau, where a gene variant inherited from Denisovans helps millions of people breathe at altitudes that would sicken most lowlanders. But the story of these “ghost lineages” extends well beyond a single gene or a single population, and new computational tools are now pulling additional archaic sequences out of our collective genetic record.

How Denisovan DNA Helps Tibetans Breathe

The strongest documented case of ancient DNA conferring a modern survival advantage involves the EPAS1 gene. This gene encodes a transcription factor that regulates the body’s response to low oxygen levels, essentially acting as a biological altitude gauge. In most human populations, EPAS1 variants can trigger overproduction of red blood cells at high elevation, a response that thickens blood and raises the risk of chronic mountain sickness. Tibetans, however, carry a distinctive haplotype in the EPAS1 region that blunts this dangerous overreaction, allowing hemoglobin concentrations to stay closer to sea-level norms even above 4,000 meters. This physiological profile lets highland residents work, reproduce, and raise children in thin air that leaves many visitors short of breath after just a few days.

What makes this finding remarkable is the origin of that haplotype. Research published in a Nature study demonstrated that the adaptive EPAS1 variant entered the modern Tibetan gene pool through introgression from Denisovans or a closely related archaic population. The haplotype is virtually absent in Han Chinese and other lowland East Asian groups, yet it reaches high frequency among Tibetans, a pattern consistent with strong positive selection over thousands of years. In practical terms, an extinct human cousin handed modern Tibetans a genetic tool that made permanent settlement of one of Earth’s harshest environments possible, illustrating how hybridization can leave a legacy that is still measurable in blood chemistry and oxygen saturation today.

This is not a case of inert fossil DNA drifting through generations. The selection signature around EPAS1 is among the strongest detected anywhere in the human genome, which means individuals who carried the Denisovan-derived variant had a measurable reproductive advantage at altitude. Families whose members could avoid chronic mountain sickness would have been more likely to survive harsh winters, support pregnancies, and raise children to adulthood on the plateau. The finding reframes how scientists think about interbreeding between archaic and modern humans: rather than a genetic dead end, hybridization sometimes delivered ready-made adaptations that would have taken far longer to evolve from scratch.

At the same time, EPAS1 is not the only locus under selection in high-altitude populations, and focusing too narrowly on a single gene can obscure a broader polygenic landscape. Other genetic variants, some of them likely of purely modern human origin, also contribute to vascular regulation, lung capacity, and metabolism in low-oxygen settings. The Denisovan-derived haplotype appears to be one powerful component in a larger network of adaptations, a reminder that even dramatic examples of introgression operate within the complex architecture of the human genome.

Machine Learning Scans for Hidden Archaic Sequences

Identifying archaic introgression used to depend on having a reference genome from the donor species, the way Denisovan remains from Siberia’s Denisova Cave provided a comparison template for the EPAS1 discovery. The problem is that most archaic human lineages left no known fossils. Their DNA, if it persists at all, hides inside modern genomes with no external reference to match against. This is where computational methods have started to change the field. Deep learning algorithms trained on known patterns of archaic sequence divergence can now flag suspicious stretches of DNA that look older than expected, even when no fossil genome exists for comparison.

A review of molecular archaeology and machine learning methods published in a PMC overview highlights how these algorithms help uncover ancient genes and clarify their evolutionary role. The review also notes that adaptive introgression assumes yet another dimension for microorganisms, at least bacteria, where horizontal gene transfer can preserve ancient adaptations that had been lost in evolution. In other words, the phenomenon of borrowing survival tools from distant relatives is not unique to humans. It operates across the tree of life, and machine learning is making it visible at a scale that manual comparison never could. By simulating demographic histories and training on synthetic genomes, researchers can test how well their models distinguish introgressed segments from ordinary variation, gradually refining a toolkit that is now being deployed on large human datasets.

These approaches are particularly powerful for detecting “ghost” introgression events, where the donor lineage is unknown. Instead of asking whether a sequence matches Neanderthal or Denisovan DNA, models look for segments that are unusually divergent from the rest of the genome yet shared among certain populations. If such segments cluster in functional regions—genes involved in immunity, for example—that pattern can hint at past selection. Still, the statistical signals are subtle, and distinguishing ancient introgression from other processes, such as long-term population structure within modern humans, remains a central challenge. As a result, computational predictions are increasingly paired with laboratory experiments that test how candidate archaic variants affect gene expression or cellular behavior.

Beyond humans, similar pipelines are being used to scan wildlife genomes for introgressed alleles that might underlie local adaptation, such as tolerance to extreme temperatures or unusual diets. In microbes, where DNA can move horizontally across distantly related species, machine learning helps track how ancient resistance genes flow through bacterial communities. The same conceptual framework—detecting segments that look older or more divergent than expected—applies across these systems, underscoring how methods developed to study human origins are feeding back into broader evolutionary biology and even public health.

Ghost Lineages Beyond Eurasia

The Denisovan–EPAS1 story is well established, but it sits within a Eurasian frame. Some of the most tantalizing hints of ghost lineages come from African populations, where modern humans spent the vast majority of their evolutionary history. Preliminary genomic analyses have suggested that certain sub-Saharan African groups carry DNA segments that appear to derive from unknown archaic hominins, species that diverged from the modern human lineage hundreds of thousands of years ago and have never been identified from fossil remains. These signals remain difficult to confirm precisely because no reference genome exists for the putative donor populations, and because Africa’s deep and complex population structure can mimic some of the same patterns.

This gap matters for a reason that goes beyond academic curiosity. If archaic introgression in Eurasia could produce something as consequential as high-altitude adaptation, then uncharacterized archaic DNA in African genomes may well influence traits related to immune function, metabolism, or disease susceptibility. The hypothesis that ghost lineages from unidentified archaic hominins could enhance modern resilience to infectious diseases is plausible on evolutionary grounds, but testing it will require targeted functional studies, potentially using gene-editing tools like CRISPR, combined with epidemiological data from genetically diverse communities. For now, the evidence is suggestive rather than definitive, and any strong claims about specific health benefits would outrun the available data. As more African populations are sequenced with high coverage and included in global reference panels, researchers expect to refine their models and either substantiate or revise current estimates of archaic contribution.

There is also a social dimension to this work. Genomic research has historically focused on European and some East Asian populations, which means the strongest statistical tools and the most detailed evolutionary reconstructions have been built around those groups. Expanding sampling in Africa and other underrepresented regions is not only a matter of equity; it is scientifically essential for understanding how introgression shaped our species. Without that breadth, the global map of ghost lineages will remain skewed, and the most informative signals—perhaps involving adaptations to tropical pathogens or local diets—may continue to be overlooked.

What Current Studies May Oversimplify

There is a tendency in popular coverage of archaic introgression to present it as a clean narrative: ancient humans mated, useful genes were passed along, natural selection did the rest. The reality is messier. Environmental context shapes whether an introgressed variant is beneficial, neutral, or harmful. The same Denisovan-derived EPAS1 haplotype that protects Tibetans at altitude has no obvious advantage at sea level, which likely explains its absence in closely related lowland populations. Gene–environment interactions like these mean that labeling a stretch of archaic DNA as simply “adaptive” can flatten a more complicated picture. A variant that once helped fight local parasites, for example, might predispose carriers to autoimmune disorders in a modern urban setting with very different exposures.

Similarly, the machine learning tools now used to detect archaic sequences carry their own limitations. These algorithms are trained on existing reference datasets, which are still heavily biased toward European and East Asian genomes. Faint archaic signals in underrepresented populations may be harder to detect or easier to misclassify. The field is aware of this problem, but institutional records documenting the accuracy limits of these detection methods in diverse global genomes remain sparse. Until training datasets become more representative, some ghost lineages will stay hidden not because they do not exist, but because the tools designed to find them were calibrated on a narrow slice of humanity. Recognizing these blind spots is crucial as researchers move from identifying archaic segments to interpreting their medical and evolutionary significance.

Another source of oversimplification lies in how introgression is framed in relation to human identity. Stories that emphasize “percent Neanderthal” or “percent Denisovan” risk implying that individuals or populations are mosaics of distinct species, when in fact all living humans belong to a single, highly interbred lineage. Archaic contributions are best understood as threads woven into a shared tapestry rather than as separate blocks of ancestry. As genomic technologies and analytical methods continue to advance, the challenge will be to communicate the nuance: extinct relatives did leave functional traces in our DNA, those traces sometimes matter for health and adaptation, and yet they are part of a continuum of variation that links all people to one another and to a much deeper human past.

More from Morning Overview


*This article was researched with the help of AI, with human editors creating the final content.