2025/12/23

Epigenetic and Post-Transcriptional Cellular Mechanisms Lie Beyond the Reach of Sequence-Based Taxonomy

Methodological Blind Spots in Phylogenetics: RNA Editing and the Limits of Sequence-Based Inference

RNA editing, alternative splicing, and epigenetic regulation are largely invisible to phylogenetic methods because they cannot be reliably inferred from DNA sequence data and are often species-, tissue-, and context-specific.

Modern phylogenetic reconstruction is built almost entirely on the comparison of DNA or inferred protein sequences. Whether using single genes, concatenated markers, or large phylogenomic datasets, the underlying assumption remains largely the same: genomic sequence similarity reflects evolutionary relatedness, and changes in DNA sequence are the primary source of biological novelty. However, this framework systematically overlooks a major layer of biological information—RNA-level regulation—which can profoundly alter protein output without any change to the underlying DNA.

One of the most significant challenges arises from alternative splicing and RNA editing. In eukaryotes, a single genomic locus can generate dozens, sometimes hundreds, of functionally distinct protein isoforms depending on cellular context, developmental stage, or environmental conditions. Phylogenetic analyses typically reduce this complexity by selecting a single “canonical” transcript or the longest predicted open reading frame. This reduction is not biologically neutral; it represents a strong abstraction that removes much of the functional reality of gene expression.t

RNA editing compounds this problem further. Enzymatic systems such as ADAR (adenosine-to-inosine editing) and APOBEC (cytidine-to-uridine editing) can alter codons post-transcriptionally, leading to amino acid substitutions that are invisible at the DNA level. In sequencing-based phylogenetics, these edited sites are either ignored, misinterpreted as sequencing noise, or incorrectly assumed to reflect genomic mutations.

Protein-based phylogenies do not resolve this issue. Most protein sequences used in phylogenetic datasets are computational predictions derived directly from genomic DNA, not empirically observed proteoforms. As a result, phylogenetic trees often compare hypothetical protein sequences that may not correspond to the dominant or functionally relevant proteins produced by the organism in vivo. Differences in RNA-editing frequency, tissue specificity, or regulatory control between species are therefore excluded from the analysis by design.

This limitation has important conceptual consequences. Phylogenetic trees can show strong statistical support while remaining largely blind to regulatory, post-transcriptional, and epigenetic information. In practice, phylogenetics measures sequence history, not functional biological information. Consequently, evolutionary narratives derived from such trees may imply innovation or continuity where the underlying biological mechanisms instead reflect regulatory reconfiguration, restriction, or loss of existing potential.

Importantly, this omission is not due to ignorance but to methodological necessity. RNA-editing patterns are often species-specific, tissue-specific, and environmentally responsive, making them extremely difficult—if not impossible—to reconstruct from genomic sequence alone. Rather than being incorporated into phylogenetic models, they are excluded altogether. This exclusion, however, places a hard ceiling on what phylogenetics can legitimately claim about the origin and direction of biological information.


Empirical Examples of Functional Change via RNA Editing (No DNA Change)

  1. Human APOB gene
    RNA editing converts a CAA (glutamine) codon into a UAA stop codon, producing ApoB-48 instead of ApoB-100. These two proteins have radically different physiological roles in lipid transport, despite originating from the same DNA sequence.

  2. Glutamate receptor (GluR) subunits in vertebrate brains
    ADAR-mediated A→I editing alters ion channel permeability and calcium conductance, directly affecting neuronal excitability and survival. The genomic sequence alone does not predict these functional properties.

  3. Cephalopods (e.g., octopus and squid)
    Extensive RNA editing recodes thousands of sites in neural genes, generating protein diversity crucial for neural plasticity and temperature adaptation—without corresponding DNA mutations.

  4. Serotonin receptor 5-HT2C (humans)
    Multiple RNA-edited isoforms differ in G-protein coupling efficiency, significantly altering signal transduction and behavior-relevant neural pathways.


Conclusion

RNA editing and alternative splicing demonstrate that biological information is not confined to DNA sequence. Because phylogenetic methods largely ignore these mechanisms, they provide an incomplete and sometimes misleading picture of functional similarity, divergence, and informational change. Phylogenetic trees may accurately reflect sequence relationships, but they should not be mistaken for comprehensive maps of biological information or organismal complexity.