
4.9 trillion Number of letters of data sequenced by the 1,000 Genomes Project so far and the pilot results reveal around 15 million single nucleotide polymorphisms (SNPs)
What makes you you, and not somebody else? A groundbreaking international study seeks to uncover the genetic basis of individual difference, as well as find genetic links to disease, by sequencing and comparing the genomes of thousands of individuals from around the world. In a the sequel to one of the most astonishing scientific achievements of the past century and a technical tour de force involving the cataloging of 4.9 trillion letters of human DNA code – enough to fill 300,000 copies of War and Peace, the 1000 genome project published its first result of a map of genetic variations in humans. The article is published in the journal Nature this week.
The 1000 Genomes Project is a collaboration among research groups in the US, UK, and China and Germany to produce an extensive catalog of human genetic variation that will support future medical research studies. The goal of the 1000 Genomes Project is to find most genetic variants including SNPs and structural variants, and their haplotype contexts that have frequencies of at least 1 per cent in the populations studied. The $120m (£75m) project, involving hundreds of scientists in an international collaboration of universities, charities and companies, is using advanced gene-sequencing technology to map out the full diversity of human DNA.
“The 1000 Genomes Project map fills in the gaps between the HapMap landmarks, helping researchers identify all candidate genes in a region associated with a disease,” said Lisa Brooks, Ph.D., program director for genetic variation at the National Human Genome Research Institute, a part of the National Institutes of Health. “Once a disease-associated region of the genome is identified, experimental studies must be done to identify which variants, genes, and regulatory elements cause the increased disease risk. With the new map, researchers can just look up all the candidate genes and almost all of the variants in the database, saving them many steps in finding the causes.”
The pilot phase consisted of three projects. The first sequenced the whole genomes with low coverage of 179 individuals from populations in West Africa, Europe, China and Japan.
The second project collected high-coverage sequences of two families, including the mother, father and child.
The third project sequenced only the protein coding regions of the genomes of 697 individuals.
Results:
The pilot results reveal around 15 million single nucleotide polymorphisms (SNPs), many of which were already known but many were also previously unknown.
Around one million were short insertions or deletions, and 20,000 were structural variants, most of which were previously undescribed.
On average each person carries around 250 to 300 loss-of-function variants, including 50 to 100 that are associated with disease.
The study has so far proven fairly comprehensive, with over 95 per cent of known variants, such as SNPs, included in this data set.
The improved map produced some surprises. For example, the researchers discovered that on average, each person carries between 250 and 300 genetic changes that would cause a gene to stop working normally, and that each person also carried between 50 and 100 genetic variations that had previously been associated with an inherited disease. No human carries a perfect set of genes. Fortunately, because each person carries at least two copies of every gene, individuals likely remain healthy, even while carrying these defective genes, if the second copy works normally.
In the next phase of the 1,000 Genomes Project, 2,000 samples from 27 populations around the world will be studied over the next two years. David Altshuler of Massachusetts General Hospital and a co-author of the Nature paper said that, with successive phases and more sequences, the catalog of human genetic variation, which currently contains 95% of the possible differences, will improve. “We see these numbers going to 98%. By the time the 1,000 Genome Project is done, each person who has their genome sequenced, greater than 95% – maybe even 98%-99% – of the variation in that person would already be in the database and could be referenced back. Around 1%-2% of the variation would be unique to that individual and not in that database.”
The Genome in Numbers:
3bn Number of DNA letters in the human genome (200 volumes the size of a telephone book, which has around 1,000 pages)
20,000-25,000 Number of genes in the genome (though not all scientists agree)
2000 Year the first draft of the human genome was announced to much fanfare at the Clinton White House
2003 Final draft completed to 99.99% accuracy
2500 Number of people whose genomes the 1,000 Genomes Project hopes to sequence
15m Number of single-letter changes identified in the pilot phase
1m Number of small insertions and deletions identified in the pilot phase
4.9 trillion Number of letters of data sequenced by the 1,000 Genomes Project so far
In the short film below, Drs Richard Durbin from the Wellcome Trust Sanger Institute, co-Chair of the consortium and Chris Tyler-Smith describe the key findings and significance of the pilot phase of the 1000 Genomes Project.
1000 genomes project (Click to go to the video)
Courtesy: Lifescientist.com, guardian.co.uk, National Human Genome Research Institute (genome.gov) and The 1000 Genomes Project Consortium (http://www.1000genomes.org), sciencedaily.com

interesting…