New protein sequencing method offers clues to life’s early chemistry

A team of researchers has built a new protein sequencing workflow that pairs mirror proteases with deep learning software to read peptide sequences with far greater accuracy than previous methods. Published in Nature Communications, the tool called DiNovo combines an experimental enzyme strategy with computational algorithms to fill gaps that have long plagued efforts to decode proteins without a reference genome. The advance arrives alongside a separate reverse translation technique from Stanford bioengineers and growing interest in using protein chemistry to probe how life’s earliest molecular machinery took shape.

How Mirror Proteases Close Sequencing Gaps

Traditional protein sequencing relies heavily on trypsin, an enzyme that cuts peptide chains at specific amino acid sites. The problem is that trypsin leaves blind spots: regions of a protein where fragments are too short, too long, or chemically modified in ways that mass spectrometry struggles to interpret. The DiNovo workflow addresses this by adding a second enzyme, LysargiNase, which mirrors trypsin’s cleavage behavior but targets the opposite end of the same amino acid residues. Where trypsin cuts on one side of lysine and arginine, LysargiNase cuts on the other, generating a complementary set of fragments from the same protein, an approach first characterized in detail in a protease study that demonstrated its utility for mass spectrometry.

By running both digestions in parallel, DiNovo captures overlapping peptide fragments that together cover stretches of sequence neither enzyme could resolve alone. The Nature Communications paper describes how the software then uses mirror-spectra recognition, deep learning, graph-theory sequencing, and a target-decoy confidence evaluation to assemble and score those fragments into full peptide reads. The result is a de novo method, meaning it does not depend on matching spectra to a known protein database, a feature that matters enormously when studying organisms, tumors, or ancient molecules for which no genome exists.

In practice, the mirror-protease strategy changes the information content of each spectrum. Instead of a single, sometimes ambiguous fragmentation pattern, the paired digests produce two complementary views of the same sequence. The DiNovo algorithms search for these mirrored signatures, then stitch them together into candidate peptide paths on a graph where nodes represent fragment ions and edges represent plausible amino acid steps. A target-decoy scheme estimates the false discovery rate, giving researchers statistical confidence in each reconstructed sequence.

Deep Learning’s Role in Reading Proteins

DiNovo did not emerge from a vacuum. The application of neural networks to peptide sequencing traces back to DeepNovo, a deep learning approach published in the Proceedings of the National Academy of Sciences that showed a model could learn to predict amino acid order directly from fragmentation patterns. That 2017 work, available via biomedical archives, established baseline capabilities and is frequently cited as pioneering work in subsequent studies, including the DiNovo paper itself.

Since then, the field has accelerated. A separate tool called InstaNovo, described in Nature Machine Intelligence, introduced an iterative refinement diffusion model called InstaNovo+ and reported gains in peptide-spectrum matches at fixed false discovery rates compared to earlier software like Casanovo. These advances reflect a broader trend: instead of treating each spectrum as an isolated puzzle, modern models learn the statistical structure of peptide fragmentation across millions of examples, allowing them to infer missing ions or resolve ambiguous mass differences.

What DiNovo adds to this lineage is not just a better algorithm but a tighter integration between wet-lab biochemistry and computation. The mirror-protease strategy generates richer input data, which gives the neural network more signal to work with, rather than asking software alone to compensate for incomplete experimental coverage. In effect, the chemistry and the model are co-designed: the enzymes produce spectra that are especially amenable to pattern recognition, and the software is tuned to exploit the mirrored relationships between them.

Why Protein Sequencing Needed a New Chemistry

For decades, the standard chemical approach to reading proteins was Edman degradation, a method that strips amino acids one at a time from a peptide chain. Eric Anslyn, Welch Regents Chair in Chemistry at UT Austin, has noted a central limitation: Edman chemistry uses strong acid that damages functional groups important for modern protein studies. In a university report, he explained that harsh conditions make many biomolecules vulnerable to destruction.

That incompatibility has been especially problematic for attempts to combine protein sequencing with DNA barcoding, where each amino acid would be tagged with a unique oligonucleotide for high-throughput readout. Researchers at UT Austin recently demonstrated a new sequencing chemistry that avoids those destructive conditions, opening the door to methods that keep proteins intact while still enabling base-by-base readout via attached DNA tags. Earlier work from the same institution showed that more sensitive protein sequencing was achievable by rethinking both the surface chemistry and detection strategies.

The recurring theme across these efforts is that chemistry, not just computation, has been the bottleneck. Deep learning can only interpret the signals that experiments generate; if key residues never produce detectable fragments, no amount of algorithmic sophistication can recover them. DiNovo’s mirror-protease approach and Stanford’s reverse translation technique both represent attempts to generate better raw data before algorithms ever touch it.

Reverse Translation Converts Proteins to DNA

Stanford bioengineers announced a separate advance in March 2026 that takes a fundamentally different approach. Their reverse translation technique converts protein sequences into DNA, effectively translating the language of amino acids back into nucleotides that can then be read using well-established DNA sequencing platforms. According to a Stanford release, the method enables detection at unprecedented scale and sensitivity.

The chemistry works by tagging individual amino acids within a peptide using DNA barcodes specific to each residue. This is the same conceptual strategy that Edman degradation’s harsh acid had previously made impossible. By sidestepping that chemical conflict, the Stanford method can piggyback on the massive throughput and falling costs of DNA sequencers, potentially making single-molecule protein analysis routine rather than exceptional.

In a complementary report, researchers emphasized that the new chemistry tackles longstanding obstacles to analyzing proteins in complex samples and rare cell types. As summarized in a news article, the technique could enable detailed surveys of proteins in early embryos, tiny biopsies, and elusive immune cell subsets that previously yielded too little material for conventional mass spectrometry.

Peering Back Toward Life’s Origins

Beyond medical diagnostics and drug discovery, these advances are stirring interest among scientists who study how life began. Proteins sit at the heart of modern biology, but they are built from just 20 standard amino acids, and it remains an open question how that alphabet emerged from the broader chemical possibilities available on the early Earth. Biophysicist Stephen Fried has argued that the modern set may not be uniquely special, suggesting that alternative collections of amino acids might also support robust folding and catalysis. In one interview, he noted that other alphabets might not behave all that differently from today’s proteins.

Testing those ideas requires tools that can read and compare vast numbers of synthetic and natural proteins with minimal bias. De novo mass spectrometry workflows like DiNovo are well-suited to exploring unfamiliar sequence space, because they do not rely on matching spectra to existing databases rooted in modern biology. Reverse translation, meanwhile, could allow researchers to generate DNA libraries that encode entire “alternative biochemistry” proteomes and then track how those proteins evolve under laboratory selection.

Other scientists emphasize that evolution itself requires not just molecules but heritable information. As one origin-of-life researcher put it, to have evolution in the Darwinian sense, there must be a system that can store, copy, and vary instructions over time. In coverage of this work, experts stressed that informational polymers capable of replication are central to any plausible scenario for life’s emergence, whether those polymers are DNA, RNA, or some ancestral analog.

New protein sequencing chemistries intersect with these questions in two ways. First, they make it feasible to catalog the full diversity of peptides that arise in prebiotic simulation experiments, including short, heterogeneous chains that defy database-based identification. Second, by linking proteins to DNA barcodes, reverse translation blurs the line between sequence storage and function, allowing researchers to track how random peptide libraries give rise to catalytic activity across many generations of selection.

A Convergence of Chemistry and Computation

Taken together, DiNovo’s mirror-protease workflow and Stanford’s reverse translation strategy illustrate a broader convergence in molecular science. Rather than treating chemistry and computation as separate domains, researchers are designing experimental protocols with algorithms in mind, and vice versa. Enzymes are chosen not only for their biological relevance but also for the clarity and redundancy of the signals they produce; neural networks are trained not just to classify spectra but to exploit structural symmetries like mirrored cleavages.

As these techniques mature, they could transform how laboratories approach everything from clinical proteomics to astrobiology. High-confidence de novo sequencing may reveal unexpected protein variants in tumors or pathogens that standard database searches miss. DNA-barcoded proteins might enable massive screens for enzyme activity or drug binding, compressing years of trial-and-error into weeks of automated selection. And in the background, origin-of-life researchers will gain sharper tools for asking whether our particular protein alphabet is a historical accident or a near-inevitable outcome of chemistry and physics.

For now, DiNovo and reverse translation remain specialized methods, requiring careful optimization and sophisticated analysis pipelines. But they point toward a future in which reading proteins becomes as routine, and as information-rich, as reading DNA, closing long-standing gaps in our view of the molecular world.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X