Technical Highlight - November 2014
Short description: An international effort has identified nearly ten million genes belonging to microbes hosted by diverse human guts.
MetaHIT project workflow to create a human gut gene catalog from three continents. 1
The microbes that live in our intestines are intimately tied to our health and represent a rich source of unexplored metabolic information. A typical gut community may possess an order of magnitude more genes than are encoded in the genome of its human host. New work by Wang, Bork and colleagues at the Metagenomics of the Human Intestinal Tract (MetaHIT) consortium greatly expands the list of known gut microbial genes in a single high-quality gene catalog, making it an excellent resource for gene function and structural studies.
With the goal of increasing the number and diversity of sampled populations, the MetaHIT consortium sequenced new Danish and Spanish samples and analyzed sequence data from their previous European samples, American samples from the Human Microbiome Project and Chinese samples from a diabetes study. They also extracted genes from 500 sequenced prokaryotic genomes to help identify genes from prevalent but low-abundance species in the samples.
By applying a standardized workflow to preprocess and assemble sequence reads, predict genes and then cluster them to remove redundancies, they produced the first integrated catalog from three continents. The catalog has fewer redundancies, less fragmentation and longer genes than previous collections; it represents nearly 1,300 gut metagenomes from over 1,000 individuals, amounting to 9.8 million non-redundant genes, a nearly threefold expansion over the older, non-integrated catalogs.
The authors showed that this combination of samples improved sequence mapping quality and coverage of some rare species, and allowed detection of country- and individual-specific gut microbial signatures. Functional annotation using the Kyoto Encyclopedia of Genes and Genomes and the evolutionary genealogy of genes non-supervised orthologous groups indicated nearly 7,000 and 36,500 orthologous groups from these databases, respectively, and suggested that coverage of prokaryotic functional capacity may be saturated in the catalog. The integrated gene catalog is freely available in the GigaScience Database at http://meta.genomics.cn.
J. Li et al. An integrated catalog of reference genes in the human gut microbiome.
Nat Biotechnol. 32, 834-41 (2014). doi:10.1038/nbt.2942