Transcription is the process through which a DNA sequence is enzymatically copied by an RNA polymerase to produce a complementary RNA. In the case of protein-encoding DNA, transcription is the beginning of the process that ultimately leads to the translation of the genetic code (via the RNA intermediate), into a functional peptide or protein.
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for DNA; therefore, transcription has a lower copying fidelity than DNA replication.
Transcription proceeds in the 5' → 3' direction, and is divided into 3 stages: initiation, elongation and termination.
- Occurs in the cytoplasm alongside translation.
- Translation can occur before transcription is complete.
The followings steps occur, in order, for transciption initiation:
- RNA polymerase (RNAP) recognizes the promoter region on DNA and binds to that specific location. At this stage, the DNA is double-stranded ("closed"). This RNAP/wound-DNA structure is referred to as the closed complex.
- The DNA is unwound and becomes single-stranded ("open") in the vicinity of the initiation site (defined as +1). This RNAP/unwound-DNA structure is called the open complex.
- The RNA polymerase transcribes the DNA, but produces about 10 abortive (short, non-productive) transcripts which are unable to leave the RNA polymerase because the exit channel is blocked by the σ-factor.
- Eventually, the σ-factor dissociates from the holoenzyme, and elongation continues.
Most transcripts originate utilizing adenosine-5'-triphosphate (ATP) and, to a lesser extent, guanosine-5'-triphosphate (GTP) (purine nucleoside triphosphates) at the +1 site. Uridine-5'-triphosphate (UTP) and cytidine-5'-triphosphate (CTP) (pyrimidine nucleoside triphosphates) are disfavoured at the initiation site.
The RNA polymerase runs along the DNA, synthesizing mRNA in the process. In prokaryotes, the nascent mRNA is translated co-transcriptionally by ribosomes.
Some proofreading occurs during this process:
- pyrophosphorolytic editing - RNA polymerase immediately removes incorrect pairs by reversing the reaction that put them together;
- hydrolytic editing - RNA polymerase backtracks one or more bases to remove an incorrect pair, stimulated by Gre factors.
Two termination mechanisms are well known:
- Intrinsic termination (also called Rho-independent termination) involves terminator sequences within the RNA as it is being made that signal the RNA polymerase to stop. The terminator sequence is usually a palindromic DNA sequence that forms a stem-loop hairpin structure.
- Rho-dependent termination uses a termination factor called ρ factor to stop RNA synthesis at specific sites. This protein binds and runs along the mRNA towards the RNAP. When ρ-factor reaches the RNAP, it causes RNAP to dissociate from the DNA, terminating transcription.
Other termination mechanisms include where RNAP comes across a region with repetitious thymidine residues in the DNA template.
A (simple) model for a bacterial gene to be transcribed can be depicted as follows:
upstream [[promoter]] downstream
5'--- |-35|----//-----|-10|-------------------------------------------|T|------------3' (Message/Non-Template Strand)
| "+1" site of initiation
where the -35 region and the -10 ("Pribnow box") region comprise the basic prokaryotic promoter, and |T| stands for the terminator. The DNA on the template strand between the +1 site and the terminator is transcribed into RNA, which is then translated into protein.
Promoters can differ in "strength"; that is, how actively they promote transcription of their adjacent DNA sequence. Promoter strength is in many (but not all) cases, a matter of how tightly RNA polymerase and its associated accessory proteins bind to their respective DNA sequences. The more similar the sequences are to a consensus sequence, the stronger the binding is. The "ideal" promoter in E. coli can be represented as this:
Eukaryotes have evolved much more complex transcriptional regulatory mechanisms than prokaryotes. For instance, in eukaryotes the genetic material (DNA), and therefore transcription, is localized to the nucleus, where it is separated from the cytoplasm (where translation occurs) by the nuclear membrane. This allows for the temporal regulation of gene expression through the sequestration of the RNA in the nucleus, and allows for selective transport of RNAs to the cytoplasm, where the ribosomes reside.
Adding to this complexity, eukaryotes have three RNA polymerases, each with distinct roles and properties:
- RNA Polymerase I is located in the nucleolus and transcribes ribosomal RNA (rRNA).
- RNA Polymerase II is localized to the nucleus, and transcribes messenger RNA (mRNA).
- RNA Polymerase III transcribes transfer RNA (tRNA) and other small RNAs.
Further complexity is added by the multitude of transcripton factors and signaling pathways that may interact in combination to mediate cell-type and developmental transcriptional regulation.
The basal eukaryotic transcription complex includes the RNA polymerase and additional proteins that are necessary for correct initiation and elongation.
Primary (initial) mRNA transcripts in eukaryotic cells are synthesized as larger precursor RNAs that are processed by splicing out introns (non-coding sequences) and ligating exons (non-contiguous coding sequences) into the mature mRNA. Primary transcripts for some genes can be large. The primary transcripts of the neurexin genes, for instance, are as large as 1.7 megabases (1,700,000 bases), while the mature (processed) neurexin mRNAs are under 10 kilobases (10,000 bases), with as many as 24 exons and thousands of possible alternative splice variants that produce proteins with different activities.
Gene expression in eukaryotes is also controlled by complex interactions between cis-acting sites within the regulatory regions of the DNA, and trans-acting factors that include transcription factors and the basal transcription complex.
The core promoter of protein-encoding genes contains binding sites for the basal transcription complex and RNA polymerase II, and is normally within about 50 bases upstream of the transcription initiation site. Further transcriptional regulation is provided by upstream control elements (UCEs), usually present within about 200 bases upstream of the transcription initiation site. The core promoter for RNAP II normally (though not always) contains a TATA box, the highly conserved DNA sequence
- T A T A T/A A
A similar sequence, though not as highly conserved, is found in the INR (initiator) element, part of the some RNAP II promoters.
Some genes also have enhancer elements that can be thousands of bases upstream or downstream of the transcription initiation site. Combinations of these upstream control elements and enhancers regulate and amplify the formation of the basal transcription complex.
Measuring and detecting transcription
Transcription can be measured and detected in a variety of ways:
RNA synthesis by RNA polymerase had been established in vitro by several laboratories by 1965; however, the RNA synthesized by these enzymes had properties that suggested the existence of an additional factor needed to terminate transcription correctly.
By the late 1960s several papers that came out of the Harvard University Biological Laboratories established the basic mechanics of gene expression in bacteria.