2019/02/01

SNPs based on cDNA sequence could be artifacts of RNA editing

SNP databases are not reliable sources for mapping genetic mutations

https://www.researchgate.net/publication/7660913_Identification_of_RNA_editing_sites_in_the_SNP_database

Excerpt: "The relationship between human inherited genomic variations and phenotypic differences has been the focus of much research effort in recent years. These studies benefit from millions of single-nucleotide polymorphism (SNP) records available in public databases, such as dbSNP. The importance of identifying false dbSNP records increases with the growing role played by SNPs in linkage analysis for disease traits. In particular, the emerging understanding of the abundance of DNA and RNA editing calls for a careful distinction between inherited SNPs and somatic DNA and RNA modifications.

In order to demonstrate that some of the SNP database records are actually somatic modification, we focus on one type of these modifications, namely A-to-I RNA editing, and present evidence for hundreds of dbSNP records that are actually editing sites. We provide a list of 102 RNA editing sites previously annotated in dbSNP database as SNPs, and experimentally validate seven of these. Interestingly, we show how dbSNP can serve as a starting point to look for new editing sites. Our results, for this particular type of RNA editing, demonstrate the need for a careful analysis of SNP databases in light of the increasing recognition of the significance of somatic sequence modifications."

My comment: cDNA samples are sequenced by using mRNAs that often have gone through several epigenetic based sequence level modification events.The most common RNA editing mechanisms are alternative splicing and RNA A-to-I editing. According to some estimatesA-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes.

What does this mean? Most of the SNP databases are based on edited pre-mRNAs and mRNAs and they don't tell us anything about true chromosomal, genomic (gDNA) changes. gDNA sequencing is expensive and it takes a lot of time.
 

RNA editing can be done in several complex ways:

If the product is mRNA, some of the codons in the open reading frame (ORF) of the gene specify different amino acids from those in the protein translated from the mRNA of the gene.

The reason is RNA editing: the alteration of the sequence of nucleotides in the RNA

  • after it has been transcribed from DNA but
  • before it is translated into protein
RNA editing occurs by two distinct mechanisms:

  • Substitution Editing: chemical alteration of individual nucleotides.These alterations are catalyzed by enzymes that recognize a specific target sequence of nucleotides (much like restriction enzymes):
    • cytidine deaminases that convert a C in the RNA to uracil (U);
    • adenosine deaminases that convert an A to inosine (I), which the ribosome translates as a G. Thus a CAG codon (for Gln) can be converted to a CGG codon (for Arg).
  • Insertion/Deletion Editing: insertion or deletion of nucleotides in the RNA.These alterations are mediated by guide RNA molecules that
    • base-pair as best they can with the RNA to be edited and
    • serve as a template for the addition (or removal) of nucleotides in the target
These discoveries help us see DNA tests in a new light. Especially this is important for the case of these monozygotic twins having different skin, hair and eye color. Seems that so called somatic mutations are in fact epigenetic mRNA modifications.