Human Genome Sequence and Its unexpected "Small" Size?

   The Human Genome Project (HGP) was launched in 1990 with the goal of obtaining a highly accurate sequence of the vast majority of the euchromatic portion of the human genome. The initial work followed a two-pronged approach: (1) the mapping of the human and mouse genomes to allow the study of inherited disease and provide a crucial scaffold for genome assembly; and (2) the sequencing of organisms with smaller, simpler genomes to serve as a testbed for method development and assist in interpreting the human genome. With success along both paths, the sequencing of the human genome itself eventually became feasible. The International Human Genome Sequencing Consortium (IHGSC), an open collaboration involving twenty genome centers in six different countries, was formed to carry out this component of the HGP.
     The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium and Celera Genomics each reported a first draft sequence of the euchromatic portion of the human genome.
     In April of 2003, the IHGSC published in the April 24 issue of the journal Nature, coinciding with the 50th anniversary of Nature's publication of the landmark paper by Nobel Laureates James Watson and Francis Crick that described DNA's double helix, a complete draft of the sequences.


   Since then, the international collaboration has worked to convert these drafts into a genome sequence with high accuracy and nearly complete coverage. The IHGSC reported in Nature on October 21, 2004, the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps not yet sequenced. It covers 99% of the euchromatic genome and is accurate to an error rate of 1 nucleotide per 100,000 bases sequenced. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods.
     The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,00025,000 protein-coding genes.

     Due to the suspected presence of some 100,000 proteins involved in cell function, the human genome was also expected to contain about 100,000 genes. The first draft of the genome sequence was found to have about 35,000 genes.  The most complete draft to date (build 35) has only about 25,000 genes maximally.

                                                     Nature 431, 931 - 945 (21 October 2004); doi:10.1038/nature03001






       BACK                                                                                                                                                       copyright - c.mallery - Oct.2004