2018/08/30

The central term of genetics— “the gene”—can no longer be defined in simple terms

Reasons why it's difficult to define a gene and why DNA is just a passive information library

http://www.genetics.org/content/205/4/1353

Summary:
  • The hypothesis of “one gene—one mRNA—one polypeptide” as a general description of the gene and how it works started to expire, however, when it was realized that a single gene could produce more than one mRNA, and that one gene can be a part of several transcription units. This one-to-several relationship of genes to mRNAs occurs by means of complex promoters and/or alternative splicing of the primary transcript.
  • Multiple transcription initiation sites, i.e., alternative promoters, have been found in all kingdoms of organisms, and they have been classified into six classes (Schibler and Sierra 1987). All of them can produce transcripts that do not obey the rule of one-to-one correspondence between the gene and the transcription unit, since transcription can be initiated at different promoters. The result is that a single gene can produce more than one kind of transcript (Schibler and Sierra 1987).
  • The discovery of alternative splicing as a way of producing different transcripts from one gene had a more complex history. In the late 1970s it was discovered, first in animal viruses and then in eukaryotes, that genes have a split structure. That is, genes are interrupted by introns (see review by Portin 1993). Split genes produce one pre-mRNA molecule, from which the introns are removed during the maturation of the mRNA by pre-mRNA splicing. Depending on the gene, the splicing pattern can be invariant (“constitutive”) or variable (“alternative”). In constitutive splicing, all the exons present in a transcript are incorporated into one mature mRNA through invariant ligation of consecutive exons, yielding a single kind of mRNA from the gene. In alternative splicing, nonconsecutive exons are joined by the processing of some, but not all, transcripts from a gene. In other words, individual exons can be excluded from the mature mRNA in some transcripts, but they can be included in others (Leff et al. 1986; Black 2003). Alternative splicing is a regulated process, being tissue-specific and developmental-stage-specific. Nevertheless, the colinearity of the gene and the mRNA is preserved, since the order of the exons in the gene is not changed.
     
  • In addition to alternative splicing, two other phenomena are now known that contradict a basic tenet of the neoclassical gene concept, namely that amino-acid sequences of proteins, and consequently their functions, are always derivable from the DNA of the corresponding gene. These are the phenomena of RNA editing (reviewed by Brennickle et al. 1999; Witzany 2011) and of gene sharing originally found by J. Piatigorsky (reviewed in Piatigorsky 2007). The term RNA editing describes post-transcriptional molecular processes in which the structure of an RNA molecule is altered. Though a rare event, it has been observed to occur in eukaryotes, their viruses, archaea and prokaryotes, and involves several kinds of base modifications in RNA molecules. RNA editing in mRNAs effectively alters the amino acid sequence of the encoded protein so that it differs from that predicted by the genomic DNA sequence (Brennickle et.al. 1999). The concept of gene sharing describes the fact that different cells contain identically sequenced polypeptides, derived from the same gene, but so differently configured in different cellular contexts that they perform wildly different functions. This phenomenon, facetiously called “protein moonlighting,” means that a gene may acquire and maintain a second function without gene duplication, and without loss of the primary function. Such genes are under two or more entirely different selective constraints (Piatigorsky and Wistow 1989).

     
Severe cracks in the concept of the gene

Matters changed, however, when the sequencing projects revealed still more bizarre phenomena:
These new findings have shown that there are multiple possible relationships between DNA sequences and the molecular products they specify. The net result has been the realization that the basic concept of the gene as some form of generic, universal “unit of heredity” is too simple, and correspondingly, that, a new definition or concept of “the gene” is needed (Keller 2000; Falk 2009; Portin 2009). Several observations have been crucial to this re-evaluation, and one of us has reviewed these relatively recently (Portin 2009). They are worth summarizing here:
  1. In eukaryotic organisms, there are few if any absolute boundaries to transcription, making it impossible to establish simple general relationships between primary transcripts and the ultimate products of those transcripts.

    Hence, the structural boundaries of the gene as the unit of transcription are often far from clear, as documented particularly well in mammals (reviewed by
    Carninci 2006). In reality, whole chromosomes, if not the whole genome, seem to be continuums of transcription (Gingeras 2007). Furthermore, the genome is full of overlapping transcripts, thus making it impossible to draw 1:1:1 relationship between specific DNA sequences, transcripts and functions (Pearson 2006). Indeed, convincing evidence indicates that the human genome is comprehensively transcribed from both DNA strands, so that the majority of its bases can be found in primary transcripts that compendiously overlap one another (The FANTOM Consortium and RIKEN Genome Exploration Group 2005; The ENCODE Project Consortium 2007; 2012). Both protein coding and noncoding transcripts may be derived from either or both DNA strands, and they may be overlapping and interlaced. Furthermore, different transcripts often include the same coding sequences (Mattick 2005). The functional significance of these overlaps is still largely unclear, but there is an increasing number of examples in which both transcripts are known to have protein-coding exons from one position in the genome combined with exons from another part of the genome hundreds of thousands of nucleotides away (Kapranov et al. 2007). This was wholly unanticipated when the 1960s definition of the gene was formulated.

  2. Exons of different genes can be members of more than one transcript.
    Gene fusion, at the level of transcripts, is a reality, and is completely at odds with the “one gene—one mRNA—one protein” hypothesis. And this is not a rare phenomenon. It has been estimated that at least 4–5% of the tandem gene pairs in the human genome can be transcribed into a single RNA sequence, called chimeric transcripts, encoding a putative chimeric protein (Parra et al. 2006).

  3. Comparably, in the organelles of microbial eukaryotes, many examples of “encrypted” genes are known: genes are often in pieces that can be found as separate segments around the genome.
    Hence, in addition to the fusion of two adjacent genes at the level of transcription, different building blocks of a given mRNA molecule can often be located, as modules, on different chromosomes (reviewed in Landweber 2007). Some evidence indicates that, even in multicellular eukaryotes, protein-coding transcripts are derived from different nonhomologous chromosomes (reviewed in Claverie 2005).

  4. In contradiction to the neoclassical definition of a gene, which posits that the hereditary information resides solely in DNA sequences, there is increasing evidence that the functional status of some genes can be inherited from one generation of individuals to the next, a phenomenon known as transgenerational epigenetic inheritance (Holliday 1987; Gerhart and Kirschner 2007; Jablonka and Raz 2009).

  5. Finally, in addition to protein coding genes, there are many RNA-encoding genes that produce diverse RNA molecules that are not translated to proteins.
My comment: Defining a gene seems to be very complex. This points to Intelligent Design and Creation.