Organisation of the Genome

Chromatin organisation

The human genome consists of approximately 3.0 X 109 nucleotides having a maximum length of more than 1 meter if fully stretched and still it can be found packaged within the nucleus of each individual cell, which measures approximately 5 Ám in diameter. The human genome is made up of 23 pairs of chromosomes (diploid number 46) that consists of 22 autosomes and a pair of sex chromosomes (XX in females, XY in males).

DNA is packed inside the nucleus in association with a number of proteins, which are extensively coiled and folded forming nucleosomes. Each nucleosome is made up a histone octamer mainly made up of histones H2A, H2B, H3 and H4. Histones consists of large amounts of positively charged  amino acids mainly lysine and arginine, that binds electro statically to the negatively charged phosphate groups of the DNA backbone. The DNA turns in a 1.65 left handed orientation around each histone octamer covering a total of 146 bp of double stranded DNA. The next 50 bp links one nucleosome to another also interacting with another histone (H1) forming a thicker fibre consisting of six nucleosomes, known as the solenoid. Besides histones there are other proteins that make up what is known as the nuclear scaffold. One of these proteins is the enzyme topoisomerase type II. Different solenoids will in turn form what is known as chromatin fibres of approximately 200 nm in diameter and eventually make up the chromatids that are 600 -  700 nm in diameter. Histones does not dissociate from DNA during replication in S phase of the cell cycle but new histones assemble on the lagging strand.

Histones are encoded by clusters of genes that are repeated many times in the genome and that are highly conserved through different species. Histone genes also lack introns. Histone proteins are replaced by protamines in sperm heads.

Figure 1. Structure of nucleosome


Satellite DNA

Re-association kinetics and sedimentation equilibrium centrifugation showed that when eukaryotic DNA was sheared and analysed a main band DNA and one or two satellite peaks were observed. Re-association kinetics also showed that the satellites observed in the eukaryotic genomes were a result of the re-association of highly repetitive DNA sequences, of which there are two main types either moderately or highly repetitive. Only 10% of the human genome is thought to behave as single copy DNA.

Highly repetitive sequences are short sequences that are repeated a large number of times, usually occurring as tandem repeats. Satellite DNA is found in specific areas on the chromosomes, better known as heterochromatin. Such areas include those around the centromere. The centromere is the region of the chromosome to which the spindle fibres attach during mitosis and meiosis that helps the chromosome to move to one of the poles during anaphase. This region is known as the CEN and in yeasts it consists of approximately 225 bp divided into three regions. Region CEN III is the largest region and is 95% AT rich and is thought to be the most important region for centromere function since the sequences at this region are important to bind spindle fibres. In humans alphoid family of repetitive sequences are found at the centromere and are about 170 bp in length present in tandem arrays of up to 1 million base pairs.

Another very important structure of the chromosome is the telomere, which also consists of repetitive sequences of DNA. Telomeres are found at the tips of linear chromosomes. There are telomeric sequences that consist of short tandem repeats while there are telomere associated sequences found adjacent to and within the telomere. With each cycle of DNA replication these telomeres become shorter and eventually serve as an internal biological clock for the cell and thus determines its age. In germ cells (but not in somatic cells) telomeres are protected by the presence of an RNA-containing enzyme known as telomerase. In immortalised human cancer cells, the activation of telomerase is a very important step in the transition to malignancy.


Repetitive DNA

Moderate repetitive DNA can be found either interspersed or else in tandem across the genome. There are two main types of interspersed repetitive elements known as short or long. The short interspersed elements or SINEs are less than 500 bp long and can be found as much as 500,000 times in the genome. An example of a SINE is the AluI element found in mammals. The long interspersed elements or LINES are about 6400 bp long and can be found as much as 40,000 times. Moderate repetitive DNA can be clustered and some functional genes also fall within this category including those coding for 5.8S, 18S, 28S rRNA in humans that are clustered on the p arms of chromosomes 13, 14, 15, 21, and 22.  There are also tandem repeats such as the variable number tandem repeats (VNTRs) that consists of repeats of 15 to 100 bp and were very useful for forensic work. Another type of tandem repeats are the short tandem repeats (STRs) that can be either di-, tri-, tetra- or even pentanucleotide repeats. These repeats are also used for genetic identification in forensic DNA analysis.


Structure of the Eukaryotic gene

It is estimated that there are about 20,000 to 50,000 genes in the human genome that code for proteins, that is less than two times the amount found in much simpler organisms. The structure of the human protein coding gene is quite complex. Sizes of eukaryotic genes can vary greatly in size ranging from less than 1 kb (histones) to as much as 2500 kb for the dystrophin gene. A typical gene consists of coding and non-coding sequences known as exons and introns, respectively. The exon (coding part) is the code which is transcribed into the mature mRNA and eventually translated into protein. An exon is usually small in size and codes for a single protein domain, averaging 150 nucleotides encoding about 50 amino acids. Each amino acid is encoded by a triplet code known as a codon, and most amino acids are encoded by more than one codon. On the other hand non-coding intervening introns are relatively large and can even be made up of 20,000 bp. The sequence within introns is random but it can contain regulatory sequences that affect the splicing mechanism. Introns are transcribed into the primary RNA but will be eventually removed (or spliced) and so does not make up part of the mature mRNA molecule. The number of introns and exons between genes vary greatly and a gene can consists of simply two or three exons, but can be up to more than 20 exons.

Figure 2. Typical Structure of a Eukaryotic Gene

Besides the introns and coding exons, genes also have other regulatory elements that mainly affect the way how the gene itself is expressed and regulated. The 5' and 3' untranslated regions usually consists of sequences that serve this purpose. The 5' region ahead of the transcriptional start site usually makes up what is known as the promoter region. The promoter region consists of sequences such as the TATA box, where RNA polymerase binds to initiate transcription. Further upstream there is the CCAAT box which also plays a part in the regulation of transcription. Usually there are a number of other consensus sequences to which a number of proteins or transcriptional factors bind and control transcription. A number of enhancers or/and silencers that can be found close or sometimes even quite distant from the gene itself are involved in the regulation of gene expression. Also sequences at the 3' end of the gene act as regulators and terminators of transcription as well as for polyadenylation of the mRNA molecules.


The Genetic Code

The sequence of nucleotides found in exons code for the sequence of amino acids synthesised during translation forming different protein domains. It was shown that a triplet of bases specifies the ribosomal translation of a given amino acid. All amino acids are coded by more than one codon (degenerate code) with the exceptions of tryptophan and methionine. In each codon the last base has reduced specificity and so four codons differing by the last base only will encode for the same amino acid. This ensures that random mutations at this base does not lead to alteration in the amino acid sequence. The code also has three codons that are termination signals and a start codon which is AUG that codes for methionine, in such a way that the first amino acid in a protein is always methionine. This code is shared by all living organisms although some variations exist in the mitochondrial genome.


Gene clusters or families

In the human genome there are a number of related genes found in clusters on the same chromosome or even scattered on different chromosomes, that have similar functions or are switched on and off through a lifetime. Among the families of genes there are:

  • The α and β-globin gene clusters found on chromosomes 16 and 11, respectively

  • Ribosomal RNA, myosin and actin

  • The major histocompatibility complex (MHC) also known as HLA on chromosome 6


The mitochondrial genome

The mitochondria are organelles found within eukaryotic cells, thought to be of a prokaryotic origin that throughout an evolutionary process integrated together as a form of symbiosis. In a single cell there are a number of mitochondria that can be up to 1500 in a liver cell. Mitochondria multiply within the cell by division and each mitochondrion has its own genetic material as well as smaller ribosomes than those found in the cytoplasm. The mitochondrial genome consists of 16.6 kb, is circular and encodes for genes such as those of transfer RNA, 12s and 16s rRNA and a number of cytochome c oxidase subunits, cytochome b, ATPase subunits and eight protein coding genes. Although mtDNA is double stranded, a small part of it appears to be triple stranded due to repetitive synthesis of a short segment of the heavy strand DNA. The genes encoded on mtDNA does not contain non-coding regions as those on the genomic DNA and both strands are transcribed and translated. There are also small variations in the genetic code where some codons code for different amino acids. On the other hand there are genes necessary for mitochondrial functions which are encoded on the nuclear DNA. All these characteristics support the hypothesis that mitochondria originated as a prokaryotic cell. It is now known that variations within the mitochondrial genome can also lead to some diseases in humans. Also mtDNA is usually inherited through the maternal line (with some very rare exceptions) since the oocytes contain multiple copies and the sperm cell only has four mitochondria at the neck of the sperm that does not penetrate the oocyte at fertilization. Since mtDNA is inherited only from the maternal line and does not usually undergo recombination and mutations are rare, mtDNA analysis can be used to study origins of populations through maternal line and also for forensic purposes.



Concepts of Genetics, 5th Edition, (1997) Prentice Hall Inc, New Jersey, USA

Human Genetics, 3rd Edition, (1997) Springer-Verlang, Berlin Heidelberg, New York

Human Molecular Genetics, (2004) Garland Publishing, New York, USA

Some images taken from Wikipedia online free encyclopedia