Model Evolution Helps to Decipher the Process of Transcriptional Regulation in Human Cells

March 10, 2025 By Stuart P. Atkinson

Multi-panel figure describing the data and training used for the model, and headline results
Motif masking and fill-in strategies coupled with an F/p model integrate chromatin accessibility and expression to infer TF–TF interactions and build a structural interaction catalog. Scatterplots show predicted versus observed gene expression, cell-type prediction accuracy, and correlations with TSS accessibility and gene activity (R² and r values). From Fu, Mo, and Buendia et al.

Model Evolution Helps to Decipher the Process of Transcriptional Regulation in Human Cells

While transcriptional regulation underpins the diversity of biological and pathological processes, our understanding of this process remains surprisingly incomplete. Cell-specific transcriptional profiles arise from myriad protein-protein and protein-DNA interactions taking place in a background of differing epigenetic conditions. The reported clustering of transcription factor (TF) binding motifs (Vierstra et al.) highlights the homology of DNA-binding domains and the low combinatorial variability regarding regulatory interactions; however, our understanding of transcription regulation remains limited to specific cell types. Furthermore, we still do not know if combinatorial interactions of TFs determine cell-specific gene expression profiles. Overall, we appear to understand little regarding the critical process of transcriptional regulation.

Developing fine-tuned prediction methods based on sequence data and trained on specific human cell types (Zhou et al., Kelly 2020, and Zhang et al.) represented a first step towards an improved understanding of transcriptional regulation; recently, the implication of foundation models machine/deep learning models trained on vast datasets for application to range of use cases have fostered generalizability and improved utility (OpenAI. GPT-4 technical report and Lin et al.). More recent developments in single-cell Theodoris et al., Cui et al., and Hao et al.) have reported the encoding of transcriptomic profiles within single models to enable downstream tasks. Will the report of a foundation model describing how transcription emerges from the chromatin landscape represent the next evolutionary step?

Understanding relationships between chromatin and transcriptional output remains tricky, given that approaches generally require separate epigenetic and transcriptomic assays in distinct cell samples and reintegrating these diverse datasets. Parallel analysis of individual cells for RNA expression and DNA from targeted tagmentation by sequencing or " Paired-Tag Epigenome Technologies generates joint epigenetic and gene expression profiles at the single-cell resolution and detects histone modifications and RNA transcripts in individual nuclei with efficiencies similar to single-nucleus RNA-seq/ChIP-seq assays. Paired-Tag technology can enable researchers to advance our understanding of transcriptional regulation and improve disease management.

Researchers from the laboratories of Xi Fu, Raul Rabadan (Columbia University), and Eric P. Xing (MBZUAI/Carnegie Mellon University) knew that current computational models of transcriptional regulation lacked the generalizability required to support extrapolation to "unseen" cell types and conditions, which would aid the appreciation of gene regulation at a deeper level and perhaps extend to identifying new targets in a variety of diseases/disorders. In their recent Nature study, the authors introduce GET - general expression transformer Fu, Mo, and Buendia et al.). In part one of this series, Epigenome Technologies offers a brief overview of the development of GET before describing the practical applications of this exciting technological advance.

Cartoon of masking and training protocol
Schematic of GET’s input matrix: each gene is represented by fixed TF-motif binding scores across 200 gene-proximal ATAC peaks (aggregated into 282 clusters, grey) combined with cell-type-specific pseudobulk ATAC signal (orange) to predict gene expression.

Developing and Benchmarking the General Expression Transformer

Explication of the GET transformer architecture
Transformer architecture for GET: a 200×283 input is linearly expanded, processed through 12 self‐attention layers (12 heads, dₖ=dᵥ=64) with GeLU‐activated feed‐forward blocks, then linearly collapsed with softplus to produce a 200×1 output.

Can Chromatin Accessibility Alone Suffice When Predicting Gene Expression?

While GET leverages chromatin accessibility as a primary proxy for active regulatory regions, this approach prompts a pivotal question: can chromatin accessibility data explain gene expression levels, or do we require additional epigenetic information, such as histone modification profiles? Said modifications function directly by recruiting transcriptional activators or indirectly through chromatin remodeling complexes, suggesting that accessibility lays the groundwork for gene activation, but the full orchestration of transcription depends on a complex interplay of structural/chemical signals.

Multi-panel figure showing results of expression prediction versus a lentiviral assay
Scatterplots comparing predicted versus observed lentiMPRA log₂(RNA/DNA) for promoters, enhancers, repressive/quiet regions, and all sequences: GET-MPRA (top row) versus Enformer-MPRA (bottom row), with Pearson’s r, slope, and n indicated. From Fu, Mo, and Buendia et al.

General Expression Transformer The Highlights

Overall, the authors report how GET provides experimental-level accuracy in gene expression prediction in seen and unseen cell types employing chromatin accessibility data and sequence information, displays adaptability across sequencing platforms and assays, offers zero-shot prediction of reporter assay readouts, and outperforms previous state-of-the-art models when identifying cis-regulatory elements.

Future studies may be supported by robust single-cell datasets provided by Paired-tag an analytical platform that creates joint epigenetic and gene expression profiles at single-cell resolution and detects histone modifications and RNA transcripts in individual nuclei. The Bing Ren lab developed Paired-tag, and Epigenome Technologies offers optimized Paired-tag kits and services to epigenetics researchers an exclusive license from the Ludwig Institute for Cancer Research.

See Nature, January 2025 for more on the development of GET as an interpretable foundation model for transcriptional regulation, and stay tuned to Twitter, Bluesky, and LinkedIn to keep up to date with all the new epigenetics studies and the Epigenome Technologies website for part 2. In the meantime, check out our Products and Services pages to see how Epigenome Technologies can elevate your research today.