When linguists want to tell one vowel from another, they measure the peaks of acoustic energy that the human vocal tract sculpts into each sound. Now a peer-reviewed study published in May 2026 in Open Mind reports that sperm whale clicks contain the same kind of peaks, arranged in patterns that mirror the vowel and diphthong structures of human speech. The finding does not mean whales are talking, but it does reveal a layer of internal complexity in their communication that scientists had never formally documented.
Two “vowel” types hidden inside whale clicks
A research team led by Pratyusha Sharma at MIT’s Computer Science and Artificial Intelligence Laboratory, working with biologist Shane Gero of the Dominica Sperm Whale Project and marine scientist David Gruber of the City University of New York, applied a technique called linear predictive coding (LPC) to recordings of sperm whale codas. Codas are the short, patterned bursts of clicks that whales exchange during social encounters.
LPC is the same mathematical tool linguists use to measure human vowel formants. By ignoring click timing and focusing purely on spectral energy, the team found that codas consistently sort into two categories. One type, which they labeled “a-coda,” shows a broad, open spectral envelope resembling the formant shape of the vowel in “father.” The other, “i-coda,” displays a narrower, higher-frequency peak closer to the vowel in “see.”
Some codas did something more surprising: their spectral profiles shifted from one state to the other within a single sequence, tracing trajectories the authors describe as diphthong-like patterns. In English, a diphthong is a vowel that glides between two qualities inside one syllable, as in the word “coin.”
A communication system built on combinable parts
The spectral discovery arrives alongside earlier work that mapped the broader architecture of coda exchanges. A 2024 study in Nature Communications, also involving Sharma and Gero, analyzed 8,719 codas recorded off the coast of Dominica and found that each coda can be broken into combinable features: tempo, rhythm, rubato (subtle timing variations), and ornamentation (extra clicks inserted into a sequence).
Those features varied systematically during real-time exchanges between whales, suggesting the animals adjust multiple acoustic parameters at once when communicating, much as human speakers combine phonemes into words. Individual whales and family units favored specific subsets of these features, reinforcing the idea that codas function as socially meaningful signals rather than random noise.
Foundational research published in Proceedings of the Royal Society B by Luke Rendell and Hal Whitehead had already shown that sperm whale populations organize into vocal clans defined by shared dialects. Whales within a clan share a common coda repertoire; clans in the same ocean basin can have entirely different sets. Because calves learn their clan’s codas from adults, the system runs on cultural transmission rather than genetic inheritance, paralleling the way human children acquire language from their community.
Machine learning flags the same features humans found
A third line of evidence comes from a preprint posted on arXiv (not yet peer-reviewed). Researchers developed an interpretability method called CDEV and applied it to an unsupervised deep-learning model trained on sperm whale codas. Without being told which aspects of the sounds might matter, the model independently flagged click count, timing regularity, and spectral features as candidate meaningful properties.
That convergence between human-guided analysis and data-driven discovery reduces the chance that the patterns researchers identified are artifacts of how scientists segment or label recordings. The model does not reveal what the patterns communicate, but it narrows the search space for future experiments.
The gap between structure and meaning
Identifying structure is not the same as decoding content. Human vowels carry meaning because they distinguish words: “bat” differs from “bit” because the vowel changes. No study has yet shown that the difference between an a-coda and an i-coda triggers a different behavioral response in a listening whale.
Controlled playback experiments, in which researchers broadcast modified codas to free-ranging groups and measure reactions, would be needed to test that link. As of the studies reviewed here, no such results have been published.
The relationship between spectral features and the timing-based features from the Nature Communications study also remains unclear. Tempo, rhythm, rubato, and ornamentation describe when clicks happen; the vowel-like patterns describe what frequencies they contain. Whether these two layers operate independently, interact, or encode different types of information is an open problem. One possibility is that timing marks social context or group identity while spectral shifts fine-tune meaning within that context, but that idea is speculative.
Cross-species comparisons need careful framing, too. The term “proto-language” sometimes surfaces in popular accounts, yet the primary papers use more cautious phrasing, describing “vowel-like” and “diphthong-like” patterns rather than claiming whales possess language in any technical sense. Without evidence that codas can be combined to express novel meanings, or that whales can refer to specific objects or events, the system does not meet established linguistic criteria for syntax, semantics, or compositionality.
There is also the question of how whales physically produce these spectral patterns. In humans, vowel formants arise from shaping airflow through the vocal tract. Sperm whales generate clicks using a complex set of nasal structures in the head, including the spermaceti organ and phonic lips. The Open Mind study infers that whales must exert some control over resonant properties to create consistent a-coda and i-coda types, but direct anatomical measurements linking specific structures to specific spectral outcomes do not yet exist.
Strength of each evidence layer for whale vocal structure
The spectral analysis in Open Mind is the most direct piece of the puzzle. It applies a well-validated acoustic method to whale recordings and reports a measurable, repeatable result: two distinct formant patterns and transitional trajectories between them. Readers can treat this as a solid empirical finding about the physical properties of whale clicks. It does not, by itself, prove anything about meaning, but it establishes that sperm whale codas have internal structure analogous in key respects to human vowels.
The combinatorial analysis in Nature Communications is also empirically grounded, drawn from a large dataset with clearly described methods and statistical tests. Together, the two papers show that sperm whale codas carry structured acoustic information at both the temporal and spectral levels, a necessary condition for any communication system that encodes meaning.
The machine-learning work on arXiv validates the analytical approach rather than the biological claim. It is best read as a tool for generating hypotheses, not as evidence that whales “speak” in a human-like way.
Taken together, the picture is this: sperm whales use a complex, culturally transmitted communication system whose building blocks show striking parallels to elements of human speech. The evidence that those building blocks function as language, carrying referential meaning or enabling open-ended expression, remains unproven. Closing that gap will require connecting acoustic categories to behavior, testing how whales respond to systematic manipulations of codas, and mapping how timing and spectral features interact during real social encounters beneath the surface.
More from Morning Overview
*This article was researched with the help of AI, with human editors creating the final content.