The Applications of the General Expression Transformer

The Applications of the General Expression Transformer
In the first part of this blog series, Epigenome Technologies offered a brief overview of the development of GET - general expression transformer Nature study from researchers led by Xi Fu, Raul Rabadan (Columbia University), and Eric P. Xing (MBZUAI/Carnegie Mellon University) (Fu, Mo, and Buendia et al.).
We previously noted that understanding relationships between the chromatin landscape and transcription remained challenging, as most approaches employ separate epigenetic and transcriptomic assays performed in distinct cell samples that require the reintegration of diverse datasets, which may reduce robustness. Parallel analysis of individual cells for RNA expression and DNA from targeted tagmentation by sequencing or " Paired-Tag Epigenome Technologies generates joint epigenetic and gene expression profiles at single-cell resolution and detects histone modifications and RNA transcripts in individual nuclei with an efficiency similar to single-nucleus RNA-seq/ChIP-seq assays. Paired-tag technology enables researchers to advance their understanding of transcriptional regulation and improve disease management.
In this second part of this series, Epigenome Technologies summarizes the applications of GET: predicting regulatory activity and recognizing regulatory elements/physical interactions between transcription factors (TFs) supports the identification of unreported distal regulatory regions in erythroblasts and a TF-TF interaction in B cells that explain the significance of a leukemia-associated germline mutation risk and the construction of a TF/coactivator interaction catalog.

Investigating Fetal Hemoglobin-Regulating Loci
- The team employed genome-base-editing data from fetal erythroblasts to investigate fetal hemoglobin-regulating loci (BCL11A, NFIX, KLF1, and HBG2) (Cheng et al.) to demonstrate the identification of cis-regulatory elements by GET
- They confirmed the role of GATA TFs (regulate fetal hemoglobin by orchestrating BCL11A expression via an erythroid-specific enhancer) and highlighted an undescribed role for SOX TFs in regulating fetal hemoglobin (by binding to the same enhancer)
- GET outperformed established models when examining fetal hemoglobin-regulating loci, especially when detecting long-range enhancer-promoter interactions
- GET identified the most important TF motifs across cis-regulatory elements for HBG2, BCL11A, and NFIX
- Results agreed with known transcriptional regulators/hematopoietic TFs (NFY and SOX motifs for HBG2 and KLF1 for BCL11A7) and revealed an unknown association between TAL1 (a GATA1 binding partner/hematopoietic factor) and NFIX
- GET identified downstream targets for specific regulators, such as GATA
- Genes influenced by the GATA motif associated with hemopoiesis (agrees with GATA1's role in erythroid development), with known erythroid-lineage TFs predicted to be regulated by the GATA motif
- GET identified TF-TF interactions based on the high correlation between motifs, which aided the identification of known and unknown motifmotif interactions to a better degree than other similar approaches

Building a Structural Catalog of the Human Transcription Factor Interactome
- The team employed AlphaFold2 and a network of causal motif interactions predicted by GET to generate a structural catalog of the human TF interactome
- They categorized interactions as direct (homodimers and intra/inter-family heterodimers) and cofactor-mediated (cooperative/competitive binding) and acquired structure predictions for over 1,700 known human TFs, discovering that AlphaFold captured dimer structures and detected intrafamily heterodimers
- They identified structural interactions via AlphaFold2 based on predicted TF-TF interactions to explore the folding of disordered regions after partner binding
- Electrostatic interactions drove the folding of the ZFX intrinsically disordered region into a multimeric structure when paired with TFAP2A structured domains
- Known motifs of ZFX and TFAP2A share a core, and coimmunoprecipitation experiments revealed an interaction between these TFs
- Their analysis predicted an interaction between SNAI1 and RELA1 via electrostatic interactions of specific domains with the EP300 cofactor
- This success drove the authors to include a broader range of TF interactions predicted by GET
- A focus on the top 5% of predicted interactions in each cell type provided 1,718 TF pairs and the comprehensive structural cataloging of TF interactions

TFTF Co‐binding Prediction: A Closer Look
- GET helps to infer TFTF interactions by analyzing variable importance, with the element-wise product ∇f ⊙ X representing motif embedding
- ∇f denotes the gene expression prediction gradient concerning each input feature, and X represents the corresponding TF motif binding scores and chromatin accessibility levels
- The product reflects the presence of a TF binding signal and its relative impact on gene expression prediction
- Each profile encapsulates the influence of a particular TF's binding event on model output
- GET effectively deciphers which TFs drive expression in each cellular context
- High matrix values suggest that subtle differences in TF binding have outsized effects on transcriptional output
- A high correlation between the patterns of different TFs emerges when aggregating importance scores across many genes, hinting at potential co-binding interactions
- Independent component analysis to the ∇f ⊙ X matrix isolates latent components representing groups of TFs with shared binding patterns, capturing the essence of TF-TF co-binding modules and reflecting known (e.g., cooperative binding of GATA factors with TAL1 in hematopoiesis) and unreported interactions
- Combinatorial binding patterns drive the coordinated regulation of gene expression, suggesting the physical co-occupancy of TFs as a critical determinant of transcriptional outcomes

A Focus on PAX5 and the Functional Role of Mutations
- Demonstrating the utility of the catalog focused on PAX5, which drives the development of B cell precursor acute lymphoblastic leukemia (B-ALL) (Shah et al.) (linked to somatic genetic alteration affecting PAX5 (Escudero et al.))
- The functional role of the G183S mutation affecting the PAX5 intrinsically disordered region (a recurrent familial germline mutation conferring elevated B-ALL risk; Auer et al.) remained elusive
- Exploring PAX5 interaction pairs in fetal B lymphocytes identified interactions with various motifs
- Analysis revealed and confirmed a novel interaction between NR/3 TFs and PAX5 involving the G183 residue and identified a PAX5 and NR2C2 interaction as affected by the G183S mutation
- An examination of the top 10,000 promoters predicted to be most influenced by PAX/2 and NR/3 motifs sought to determine whether these motifs coregulated genes
- The analysis identified 2,570 commonly regulated genes, including oncogenes implicated in B-ALL and genes involved in lymphocyte activation and affected by PAX5 perturbation during B cell differentiation
- Analysis of sporadic childhood B-ALL samples identified a G183S mutation-associated transcriptional program
- Genes in this program displayed enrichment in NR/3- and PAX5 + NR/3-regulated genes, suggesting the relevance of the PAX5NR interaction, and genes activated in ProB cells

The Applications for the General Expression Transformer - The Highlights
Overall, the authors report exciting examples of the potential outcomes of applying GET, but what's next? The authors note that future enhancements to GET may include the integration of additional biological information (e.g., 3D chromatin architecture and single-cell data) and the incorporation of more cell states and a broader range of assays (e.g., TF binding and histone modification profiles) to further improve model outcomes. They also note the potential for GET in predicting the functional impact of non-coding genetic variants, which may provide further insight into disease susceptibility and perhaps support the development of novel therapeutics. Moving forward, single-cell datasets provided by Paired-tag an analytical platform that creates joint epigenetic and gene expression profiles at single-cell resolution and detects histone modifications and RNA transcripts in individual nuclei - may support the development of related studies and exciting new tools. The Bing Ren lab developed Paired-tag, and Epigenome Technologies offers optimized Paired-Tag kits and services to researchers in the epigenetics field under an exclusive license from the Ludwig Institute for Cancer Research.
See Nature (January 2025) for more on the application of GET, and stay tuned to Twitter, Bluesky, and LinkedIn to keep up to date with all the new epigenetics studies; furthermore, check out our Products and Services pages to see how Epigenome Technologies can elevate your research today.