June 3, 2003
Their conclusion — just 21,000 — will surprise people who think a human can be developed, operated and maintained with merely 10 percent more instructions than C. elegans, a microscopic roundworm known to possess 19,000 genes.
How did genome scientists paint themselves into such a corner?
It all began in a bar. Three years ago, as the DNA sequence of the human genome was nearing completion, biologists' estimates of the number of its genes ranged from 28,000 to 140,000. At the bar at the Cold Spring Harbor Laboratory one evening, Dr. Ewan Birney, had the idea of opening a sweepstakes. He invited researchers to register their best estimates of the number of genes, with the winner — with the guess closest to the final number — to be announced this year. Bets cost $1 in 2000, $5 in 2001 and $20 since last year.
Dr. Birney runs a genomic data bank called Ensembl at the European Bioinformatics Institute near Cambridge, England. Ensembl curators spend much time identifying genes and are authorities on the human gene count. Dr. Birney set 2003 as the end date because he thought that the annotators at Ensembl would finish their task by this year.
Though there is now a rough consensus that humans have about 30,000 genes, the number is far from solid. Dr. Francis S. Collins, director of the National Human Genome Research Institute in Bethesda, Md., said last week that the number might range from 25,000 to 30,000. But researchers using more reliable methods of gene prediction have started saying the number could be double that. Other biologists, cribbing a page from the astronomers' book of stratagems, are talking of "dark matter" genes, mystery objects that are surely out there but invisible to current methods of detection.
Dr. Birney at first decided to extend the sweepstakes at least five more years, with new bets still at $20.
"Apologies for my ridiculous hubris to think we'd have it nailed by 2003," he said in an e-mail message last month, adding the "smiley" symbol for a wry comment. "The sad fact is that we don't have the number with any sense of confidence," he added in a later message.
But on arriving here, Dr. Birney was persuaded to change his plan. Dr. David Stewart, an organizer of the meeting and the official bookie of the contest, pointed out that the rules specified a winner would be declared now, no loopholes. As it happened, Dr. Birney's recent estimate of 26,000 was announced on the first day of the conference by Dr. Jane Rogers of the Sanger Institute in England. It seemed a reasonable sort of number, within Dr. Collins's range of 25,000 to 30,000. As it happened, this being a serious scientist-only kind of sweepstakes, not a single entrant had bet lower than 26,000.
Dr. Birney was urged to go with what he had and pay off the lowest bettor in the three cohorts.
When he came to look at his best estimates of gene number, he decided that he was fully confident of far fewer than 26,000. Extrapolating from the number of genes found after a very careful search of Human Chromosome 20, one of the three shortest, a scientist arrived at an estimate of 19,140, he told fellow scientists in a talk.
His computer prediction methods, based on the latest version of the human genome sequence, indicated 24,500. But 3,500 of those are probably pseudogenes, Dr. Birney said, referring to genes rendered inactive by mutation and en route to being shed from the genome.
Therefore, he announced 21,000 as the canonical number.
The strangely low figure represents a clash of outlooks between biology's new class of computer jockeys like Dr. Birney, who are confident that computer programs can figure out all the cell's secrets, and experimentalists, who believe that seeing how the cell actually reads off its genome is the surest answer to the truth.
An experimentalist adherent, Dr. Gerald Rubin, is the maestro of the drosophila fruit fly genome. "Computational methods are just not up to the task in drosophila, and I suspect the same will be true in humans," he said. When a bioinformaticist protested that gene prediction had improved tremendously, Dr. Rubin said, without evident expectation of success, "You write a program, and I'll run it on drosophila."
The "dark matter" problem is making all gene estimates look a little uncertain. Dr. Michael Snyder of Yale recently reported that twice as much DNA as expected is transcribed from Chromosome 22, one of the best studied. Transcription, the transfer of the information encoded in a gene's sequence of DNA units into messenger RNA, is the cell's first step in activating one of its genes. The finding suggests that there could be up to twice as many genes as indicated by the best available prediction methods.
"The gene count will certainly go up from the 30,000 that people currently claim," Dr. Snyder said in an interview. "The message out there is that there is clearly a lot more coding information." Pressed for an estimate, he replied, "I'll guess total genes — over 40,000."
But given that computer biologists have declared the correct figure as 21,000, who are the sweepstakes winners?
Half the pot, whose total the bookkeeper estimates at more than $1,200, goes to Dr. Paul Dear of the Medical Research Council in Britain, who bet 27,462 genes in 2000. The other half is split between the low bidder for 2001, Dr. Lee Rowen of the Institute for Systems Biology in Seattle, who guessed 25,947, and Olivier Jaillon of Genoscope in Evry, France, the 2002 low bidder with 26,500.
Dr. Dear was asked how he had predicted such a low number three years ago when numbers around 50,000 were popular. It was late at night, he said, and he had been drinking in a bar. Second, at that hour, people's behavior did not seem so different from that of fruit flies, then known to have 13,500 genes, implying that twice that number would be ample for humans. Third, his birth date, April 27, 1962, immediately suggested 27,462 as the correct number.