Columbia study predicts RNA activity to guide new drug designs

Columbia University researchers have built a deep-learning model that predicts how CRISPR guide RNAs will behave when targeting messenger RNA inside human cells, a capability that could reshape how scientists design RNA-based drugs. The model, called TIGER, forecasts both intended and unintended effects of guide RNAs used with the Cas13d enzyme, giving drug developers a computational tool to screen for safety and precision before running costly lab experiments. Separately, a Columbia biochemistry lab is attacking a related problem from the structural side, modeling how RNA molecules shift shape in three dimensions to identify new druggable targets.

How TIGER Predicts Guide RNA Behavior

Most CRISPR research has focused on editing DNA. Cas13d works differently: it targets RNA, which means it can reduce a gene’s output without permanently altering the genome. The challenge is that guide RNAs, the short sequences that direct Cas13d to the right transcript, do not always hit their intended target. Some miss entirely. Others trigger collateral damage to nearby RNA molecules. Until now, researchers lacked a reliable way to predict which guides would work well and which would cause problems.

TIGER addresses this gap directly. The team generated and tested roughly 200,000 guides with systematic mismatches and insertions/deletions in human cells, then trained a deep-learning model on the resulting data. The model predicts on-target knockdown efficiency and, just as critically, off-target activity, meaning it can flag guides likely to damage RNA transcripts the researcher did not intend to affect.

One feature that sets TIGER apart from earlier prediction tools is its ability to forecast partial knockdown. Rather than simply switching a gene off, partial knockdown dials down a transcript’s output by a controlled amount. This matters for diseases where a gene is overactive but not entirely harmful. Reducing its expression by, say, 40 percent might relieve symptoms while preserving the gene’s necessary functions. Columbia Engineering has framed this as tuning transcript levels, a concept with direct therapeutic relevance for conditions where complete gene silencing would cause its own side effects.

The model itself is a deep neural network that ingests the guide sequence, key features of the target RNA, and the pattern of mismatches or small insertions and deletions between the two. From those inputs, TIGER outputs a quantitative prediction of knockdown efficiency and a risk estimate for off-target effects. By learning from both successful and failed guides, the system can distinguish subtle sequence patterns that favor productive binding from those that tend to produce noise or collateral activity.

The Data Behind the Model

The processed screen data from the TIGER study is publicly available through the NCBI Gene Expression Omnibus under accession GSE232228. The repository includes downloadable count tables, BioProject linkage, and raw-data routing through the SRA Run Selector, making it possible for independent researchers to verify the results or build on them. The experimental design covered on-target and off-target screens across multiple cell lines, with guides carrying deliberate mismatches and indels to map how sequence imperfections affect activity.

Open data access is not a trivial detail here. RNA therapeutics are advancing rapidly, and the field has been held back partly by the difficulty of predicting how guide sequences will perform in different cellular environments. By depositing the full dataset at the National Library of Medicine, which is part of the U.S. biomedical information infrastructure, the TIGER team created a shared resource that other groups can use to benchmark competing models or test new hypotheses about Cas13d behavior. That infrastructure sits within the broader mission of the National Institutes of Health, which funds and coordinates much of the country’s basic and translational biomedical research.

Accessibility of these resources also matters. The National Library of Medicine maintains formal accessibility guidance to ensure that web-based scientific repositories can be used by people with disabilities, helping to widen participation in computational biology and data-driven medicine. For a field as specialized as CRISPR guide design, lowering technical and accessibility barriers can influence who gets to build the next generation of tools.

Building on Earlier Cas13 Screening Work

TIGER did not emerge in isolation. A 2020 paper in Nature Biotechnology established the foundational principles for Cas13 guide RNA design at scale, demonstrating pooled screening strategies and measurable rules for guide efficacy. That earlier work gave the field its first systematic look at which sequence features make a Cas13 guide effective, but it stopped short of building a predictive model that could generalize to new targets.

A separate 2023 study published in Nature Communications tackled the same prediction problem using machine learning to model Cas13d on-target and off-target effects, including screen design choices like non-essential gene controls. That work confirmed that collateral RNA damage from Cas13d can be modeled computationally, but TIGER’s training dataset of roughly 200,000 guides represents a significant jump in scale and, by extension, in the model’s ability to generalize across different guide-target combinations.

Columbia’s contribution fits into a broader push to use AI to read out and predict cellular behavior. A recent Columbia report on cell-level prediction describes how machine-learning systems can infer complex intracellular states from high-dimensional data, underscoring how similar approaches are permeating everything from gene regulation to signaling networks. TIGER can be seen as one specialized instance of this trend, focused tightly on the sequence determinants of RNA targeting.

RNA Structure Prediction Opens a Second Front

While TIGER focuses on predicting guide RNA performance, a different Columbia lab is working on the structural side of the RNA drug discovery problem. AI tools can now predict three-dimensional protein structures from amino acid sequences with high accuracy. RNA has resisted similar treatment because it does not fold into a single stable shape. Instead, RNA molecules exist as dynamic ensembles, constantly shifting between multiple conformations.

The Al-Hashimi lab at Columbia’s medical center has experimentally determined the ensemble of TAR, an HIV RNA element, as a step toward building a predictive model for RNA activity. The lab is developing a dynamic-ensemble based virtual screening platform for RNA and DNA-targeted drug discovery, which accounts for the full range of shapes an RNA molecule adopts rather than treating it as a frozen snapshot. In practical terms, that means docking potential small-molecule drugs against many conformations at once and ranking compounds by how well they stabilize or disrupt functionally important states.

This structural approach and TIGER’s sequence-based predictions attack the same bottleneck from opposite directions. TIGER tells you which guide RNA to use; ensemble modeling tells you what the target RNA actually looks like when the guide arrives. No published study has yet combined the two systematically, but the conceptual synergy is clear. A future pipeline could first use structural ensembles to identify vulnerable regions in a disease-relevant RNA, then feed those regions into TIGER to design guides that selectively modulate expression with minimal off-target effects.

Therapeutic and Research Implications

In the near term, TIGER is likely to influence how labs design Cas13d experiments. Instead of testing dozens or hundreds of guides empirically, researchers can use the model to prioritize a small subset predicted to deliver strong, clean knockdown. That efficiency gain matters for basic research, where budgets and time are limited, and even more for preclinical programs that must meet stringent safety standards.

Therapeutically, partial knockdown opens possibilities for diseases driven by dosage imbalances rather than all-or-nothing gene defects. For example, haploinsufficiency and dosage-sensitive pathways might benefit from fine-tuned reductions in specific transcripts. Because Cas13d-based interventions do not alter DNA, they may also be better suited to conditions where reversible modulation is preferable to permanent edits, such as in certain neurodegenerative or developmental disorders.

On the structural side, dynamic-ensemble modeling could expand the universe of RNA targets considered druggable. Many RNA elements involved in splicing, viral replication, or translation regulation have historically been viewed as too flexible or transient to hit with small molecules. By capturing their full conformational landscape, ensemble-based screening may reveal pockets or transient motifs that only exist in a subset of states but are nonetheless exploitable for therapy.

Taken together, these advances suggest a future in which RNA-targeted interventions are designed with the same level of rational precision that small-molecule chemists now expect for protein targets. Deep-learning models like TIGER supply the sequence rules for manipulating RNA levels, while structural AI provides a three-dimensional map of where and how to intervene. As these approaches mature and begin to intersect, they could turn RNA from a challenging moving target into a programmable substrate for a new generation of medicines.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X

Columbia study predicts RNA activity to guide new drug designs

How TIGER Predicts Guide RNA Behavior

The Data Behind the Model

Building on Earlier Cas13 Screening Work

RNA Structure Prediction Opens a Second Front

Therapeutic and Research Implications

Author

Get weekly updates with the latest news and tips!

More in Health

IG

FB

PIN

LI

X