A detailed analysis of chromosomes 2 and 4 has detected the largest "gene deserts" known in the human genome
Bethesda, Maryland -- A detailed analysis of chromosomes 2 and 4 has detected the largest "gene deserts" known in the human genome and uncovered more evidence that human chromosome 2 arose from the fusion of two ancestral ape chromosomes, researchers supported by the NHGRI, part of the NIH, reported today.
In a study published in the April 7 issue of the journal "Nature", a multi-institution team, led by Washington University School of Medicine in St Louis, described its analysis of the high quality, reference sequence of chromosomes 2 and 4.
Chromosome 4 has long been of interest to the medical community because it holds the gene for Huntington's disease, polycystic kidney disease, a form of muscular dystrophy and a variety of other inherited disorders. Chromosome 2 is noteworthy for being the second largest human chromosome, trailing only chromosome 1 in size. It is also home to the gene with the longest known, protein-coding sequence -- a 280,000 base pair gene that codes for a muscle protein, called titin, which is 33,000 amino acids long.
The new analysis confirmed the existence of 1,346 protein-coding genes on chromosome 2 and 796 protein-coding genes on chromosome 4.
As part of their examination of chromosome 4, the researchers found what are believed to be the largest "gene deserts" yet discovered in the human genome sequence. These regions of the genome are called gene deserts because they are devoid of any protein-coding genes. However, researchers suspect such regions are important to human biology because they have been conserved throughout the evolution of mammals and birds, and work is now underway to figure out their exact functions.
Humans have 23 pairs of chromosomes -- one less pair than chimpanzees, gorillas, orangutans and other great apes. For more than two decades, researchers have thought human chromosome 2 was produced as the result of the fusion of two mid-sized ape chromosomes and a Seattle group located the fusion site in 2002.
In the latest analysis, researchers searched the chromosome's DNA sequence for the relics of the center (centromere) of the ape chromosome that was inactivated upon fusion with the other ape chromosome. They subsequently identified a 36,000 base pair stretch of DNA sequence that likely marks the precise location of the inactived centromere. That tract is characterized by a type of DNA duplication, known as alpha satellite repeats, that is a hallmark of centromeres. In addition, the tract is flanked by an unusual abundance of another type of DNA duplication, called a segmental
duplication.
"These data raise the possibility of a new tool for studying genome evolution. We may be able to find other chromosomes that have disappeared over the course of time by searching other mammals' DNA for similar patterns of duplication," said Richard K. Wilson, Ph.D., director of the Washington
University School of Medicine's Genome Sequencing Center and senior author of the study.
In another intriguing finding, the researchers identified a mRNA transcript from a gene on chromosome 2 that possibly may produce a protein unique to humans and chimps. Scientists have tentative evidence that the gene may be used to make a protein in the brain and the testes. The team also identified "hypervariable" regions in which genes contain variations that may lead to the production of altered proteins unique to humans. The functions of the altered proteins are not known, and researchers emphasized that their findings still require "cautious evaluation."
In October 2004, the International Human Genome Sequencing Consortium published its scientific description of the finished human genome sequence in "Nature". Detailed annotations and analyses have already been published for chromosomes 5, 6, 7, 9, 10, 13, 14, 16, 19, 20, 21, 22, X and Y.
Publications describing the remaining chromosomes are forthcoming.
The sequence of chromosomes 2 and 4, as well as the rest of the human genome sequence, can be accessed through the following public databases: GenBank (www.ncbi.nih.gov/Genbank) at NIH's National Center for Biotechnology Information (NCBI); the UCSC Genome Browser (www.genome.ucsc.edu) at the University of California at Santa Cruz; the Ensembl Genome Browser
(www.ensembl.org) at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute; the DNA Data Bank of Japan (www.ddbj.nig.ac.jp); and EMBL-Bank (www.ebi.ac.uk/embl/index.html) at EMBL's Nucleotide Sequence Database.