Gene Organization

The DNA molecules that make up the hereditary elements are called the genome. The functional region of the genome is called genes. In a complex genome only a small part is functional, in that it is coded into a protein with the amino acid sequence determined by the DNA sequence. Another small part performs regulatory role by determining the time and extent of decoding in the life of an organism. Protein-coding DNA, along with associated regulatory sequences, makes it sense. A major part of genome is composed of highly repetitive sequences of function which were previously termed 'junk' or 'selfish' DNA.

The DNA is a linear string of symbols, A,T,G and C. Proteins are synthesized by reading a code from DNA sequence, with a triplet of nucleotides (a codon) corresponding to a given amino acid. Since 20 amino acids are the constituents of naturally occurring proteins, and there are 64 (= 43) codons, the genetic code is degenerate. The genetic code also includes a rule for initiation of protein synthesis (the start codon) and a rule to signal the end (the three stop or non-sense codons) (Fig. 2.9).

Schematic presentation of different regions in and around a gene in a genomic sequence showing the organization of exons, introns, initiation and termination sites, intergenic spacers and promoters (after Tewari et al., 1996)
Fig. 2.9. Schematic presentation of different regions in and around a gene in a genomic sequence showing the organization of exons, introns, initiation and termination sites, intergenic spacers and promoters (after Tewari et al., 1996)

The prokaryotic genes are often continuous open reading frames (ORFs) i.e. they are no misprints or interruption, while eukaryotic gene is split into several discrete segments called 'exons' which are interspersed with non-coding intermediate regions i.e. the introns. Exon may be mixed and matched in various combinations to create new genes. Some times exon of one gene may be intron of another gene. The entire gene is transcribed into an RNA molecule, from which introns are spliced out resulting in mRNA. The mRNA is a continuous ORF which is translated into corresponding polypeptide. There are also ancillary regions or the DNA which regulate and control the expression of proteins at specific time and under specific conditions (Tewari et al, 1996).

There are several ongoing projects to sequence the entire genome of a number of organisms. The complete genome map of some important organisms may come within a few years, for example, Drosophila melenogaster (genome length =165 million bp, consisting of=15000 genes), E. coli (4.7 Mbp, 3000 genes), Saccharomyces cerevisiae (12.50 Mbp, 6400 genes), Arabidopsis thaliana (100 Mbp, 13100 genes), nematode Caenorhabditis elegans (100 Mbp, 15000 genes), Fugu rubripes (390 Mbp, 80000 genes) and the human genome. Recently, entire genome map of Haemophilus influenzae (1.83 Mbp, 1727 genes) and Mycoplasms genitalium (0.58 Mbp, 482 genes) have been sequenced.