signature=a6941c320414a50941f39ed49eae9b2f,Metagenomic analysis reveals a functional signature for b...

Anatomical confirmation of the cecum as a fermentation chamber

It is generally accepted that small mammalian herbivores have substantial cecal microbial fermentation[7]. We sought to verify if this was the case in the flying squirrel. We examined 4 white-faced flying squirrels, each with a full gastrointestinal (GI) tract. For all 4 squirrels, the average length of the entire GI tract was 411 ± 35 cm (mean ± SD), 10 times the body length (average, 40 ± 3 cm). This GI tract to body length ratio was similar to those of other cecum-fermenter mammals[9], such as rabbits (ratio of 10) and lemurs (ratio of 13)[12]. The weight/length ratio including food (g/cm) was used as an indicator of the digesta-retaining capacity of the small intestine, cecum, and large intestine. An extremely distended cecum, containing nearly 50% of the gut contents by weight, was the most salient feature (Table 1). Moreover, the weight/length ratio for the cecum was 6–8 times greater than that of the small or large intestines.

Table 1

Mean ± SD anatomical features of 3 intestinal compartments of the white-faced flying squirrel (N = 4)

Phylogenetic profiles of cecal microbiota, based on 16S rRNA gene sequences

To characterize the bacterial community of the cecum, 16S rRNA gene libraries were constructed from 2 individuals (FS1 and FS2). After elimination of short, low-quality, and chimera sequences, a total of 520 and 440 sequences were obtained for FS1 and FS2, respectively. Based on a 97% sequence identity threshold, the 2 libraries respectively contained 173 (FS1) and 165 (FS2) phylotypes or OTUs (Operational Taxonomic Units), with 262 (FS1) and 293 (FS2) estimated species diversity (Chao1) of cecal microbiota ( Additional file1).

The 16S rRNA sequences from the 2 flying squirrels were classified into 4 phyla of bacteria, with <1% unclassified bacterial sequences (Table 2). Two microbial communities were both extremely dominated by Firmicutes, with sequence abundances of 96.5 and 88.4%, respectively (average, 92.92%). The remainder of the sequences belonged to Actinobacteria (2.7 and 5.9%; average, 4.17%), Proteobacteria (0.6 and 1.6%; average, 1.04%), and Verrucomicrobia (0 and 3.2%; average, 1.46%).

Table 2

Comparison of the phylogenetic composition of bacteria

Data from the present study were compared to published data from fecal samples of 56 mammalian species[13], and from the fermentation chambers of lean laboratory mice (cecum)[14] and cattle (rumen)[15], using the principal coordinates analysis (PCoA) of the UniFrac metric matrix (Figure 1). This analysis summarized variation in sampled communities, based on phylogenetic differences in bacterial members, and generated plots that separated individual communities. The flying squirrels were near to other herbivores, but not clustered with the omnivorous Prevost's squirrel, although they are phylogenetic kin (Figure 1). As expected, mice were similar to other omnivores, whereas cattle were far from most foregut herbivores, as were banteng, a close relative of cattle, which may reflect domestication of these two ruminant species.

Figure 1

0160409bfbea9094dcc7add26bf3c752.png

Relationships of gut bacterial communities using principal coordinates analysis (PCoA) of the UniFrac metric matrix. Data included sequences from fermentation chambers (flying squirrels, cattle and mice) and from mammalian fecal samples[13]. The scores for the first 2 dimensions (P1 and P2) are plotted. Data for the cattle and mice were derived from[15] and[14], respectively.

To gain more insight into fermentation chambers (functional counterparts to the flying squirrel’s cecum), we further compared our data to those from the mouse cecum[14] and the cattle rumen[15] (Table 2 and Additional file2). A total of 11 bacterial phyla/groups were identified by 16S rRNA gene sequences obtained from the 3 host species (Table 2), of which microbial communities differed in the proportions of microbial groups (P 

Phylogenetic profile of microbiota based on fosmid end-sequences

Based on analysis of ~3 Mb of metagenomic sequences (from FS5), 5,012 open reading frames (ORFs) were predicted from the fosmid end-sequences and treated as gene tags (for further annotation). Up to 65% of the gene tags were classified into taxonomic ranks, based on matches in the SEED database. According to the annotation, the majority of the microbiota belonged to Bacteria (95.8%), with the remainder attributed to Archaea (3.6%), Eukaryota (0.5%), and Viruses (0.1%).

The annotation allowed an additional assessment of microbial diversity from a third individual (FS5) in the present study. For bacteria, the most abundant phylum was Firmicutes (61%), followed by Proteobacteria (12%), Verrucomicobia (9%), Actinobacteria (8%),Bacteroidetes (3%), Chloroflexi (2%), Spirochaetes (1%), Cyanobacteria (1%), with an additional 8 phyla/groups each constituting 

One hundred and sixteen sequences were assigned to archaea, namely Euryarchaeota (92%) and Crenarchaeota (8%); the majority belonged to methanogens (e.g. Methanomicrobia, Methanobacteria, Methanococci, and Methanopyri). Sixteen eukaryotic sequences were also identified in the cecal microbiome, belonging to multicellular metazoan (possibly host DNA debris), Fungi, and Viridiplantae (likely dietary debris). Finally, 3 viral sequences were identified; all were assigned to double-stranded DNA viruses (a phage family: Siphoviridae) which only infect bacteria.

Functional profile of the microbiota, based on fosmid end-sequences

The gene functions of the cecal microbiota were analyzed by searching similarity against several databases. Based on the MG-RAST results, 2,280 of the 5,012 gene tags were assigned to 1 of the SEED subsystems, in which genes are annotated according to biochemical pathways and their specific functional roles[16]. On the basis of SEED Subsystem Hierarchy 1, hits were attributed to 26 functional groups (Figure 2). The “clustering-based subsystems” was the largest group, representing ~13% of hits. Genes in this category are functionally coupled, since they usually cluster together in genomic regions, although their activities are poorly understood. The next 4 most prominent groups were involved in protein metabolism (10%), amino acids and derivatives (9%), carbohydrate metabolism (9%), and synthesis of cofactors / vitamins (7%). Collectively, these 5 dominant groups accounted for almost 50% of the hits.

Figure 2

a94b6277a79b3f3bc76fc556ce1262da.png

Functional profile of the cecal microbiota of the flying squirrel according to the SEED Subsystem Hierarchy 1.

Protein metabolism was the second most prominent functional category and was dominated by the subcategory of biosynthesis (69%), followed by folding (16%), secretion (8%), and degradation (6%). Within the protein biosynthesis subcategory, most genes were involved in tRNA aminoacylation (adding an amino acid to tRNA). In addition, bacterial ribosomal proteins (both small and large subunits) were also abundant in this subcategory. In the protein folding subcategory, 36 chaperone proteins (e.g. GroEL, GroES, and DnaJ) were identified. Proteins involved in the secretory pathway, e.g. preprotein translocase subunits (SecG and SecY) and protein-export membrane proteins (SecD and SecF), were also detected.

The third most prominent functional category contained genes involved in production and recycling of amino acids. In addition to those involved in a variety of biosynthetic pathways, genes related to urea hydrolysis, including genes coding for the alpha, beta, and gamma subunits of urease, and for urease accessory protein UreD / UreG, were also detected.

The fourth most prominent category, carbohydrate metabolism, was dominated by central carbohydrate metabolism (35%), including enzymes involved in the TCA cycle, pyruvate metabolism, and 3 pathways for glucose degradation to pyruvate (namely the Embden-Meyerhof, Entner-Doudoroff, and pentose phosphate pathways). In addition, the subcategories of monosaccharides (23%) and di- and oligosaccharides (14%) were also abundant. Both sugar-degrading enzymes (e.g. beta-glucosidase, beta-galactosidase, beta-xylosidase, and endoglucanase) and sugar-transporters (for xylose ribose, fucose, allose, rhamnose, arabinose, lactose, and cellobiose) were detected.

Following the carbohydrate metabolism category was a group of genes involved in synthesis of cofactors / vitamins, of which folate biosynthesis (24%) was the most abundant subsystem. In addition, syntheses of tetrapyrroles, coenzyme A, and quinone cofactors were well-represented (19, 13, and 12% of the category, respectively). Genes associated with biosynthesis of B vitamins, such as thiamine (B1), riboflavin (B2), niacin (B3), pantothenic acid (B5), pyridoxine (B6), biotin (B7), folic acid (B9), and cobalamin (B12), were also detected.

Similar to results obtained from the SEED subsystems, functional categories identified using the COG (Clusters of Orthologous Groups of proteins; Additional file3) and KEGG (Kyoto Encyclopedia of Genes and Genomes; Additional file4) databases showed that genes involved in amino acid metabolism (7 and 13%), carbohydrate metabolism (4 and 13%), and metabolism of cofactors and vitamins (4 and 4%) were common within the cecal metagenome. Comparing the proportion of major metabolic categories based on the SEED and KEGG databases, carbohydrate metabolism was as dominant as amino acid metabolism, whereas based on COG, amino acid metabolism was twice as well represented as carbohydrate metabolism. In addition, although SEED and COG showed that genes involved in metabolism of cofactors and vitamins were more abundant than those in nucleotide metabolism, KEGG showed the opposite trend. Some apparent discrepancies may be due to differences (among the 3 functional categorization schemes) in naming and assigning differences. According to the COG and KEGG classifications, genes involved in energy metabolism (7 and 6%) were abundant. Those genes were classified into SEED subsystems of respiration (5%), sulfur metabolism (2%), and nitrogen metabolism (1%). Otherwise, genes in protein metabolism of SEED were categorized into information processing groups such as translation of COG and KEGG databases.

To focus on carbohydrate-active enzymes related to degradation of polysaccharides, sequences were annotated using information from the CAZy database[17]. Thirty-three polysaccharide-degrading enzymes belonging to 16 glycoside hydrolase (GH) families and 1 carbohydrate esterase (CE) family were detected in the fosmid end-sequence dataset; 7 carbohydrate-binding modules (CBMs) associated with detected GHs were also identified (Table 3). These enzymes included cellulases (GH3 and GH9) and hemicellulases (GH2, GH35, GH39, and CE4). The amino acid identity between the fosmid end-sequences and the reference sequences ranged from 30 to 91%.

Table 3

Candidate fosmid clones containing enzymes for plant polysaccharide degradation

Gene contents of fosmid inserts containing carbohydrate-associated genes

Sequences from 100 fosmid inserts were characterized to provide a survey of large contiguous genomic fragments. A total of 157 Mb of pyrosequencing paired-end reads was assembled into 125 scaffolds, comprising 3,042 kb genomic fragments. The average scaffold length was 24 kb (range, 2 to 67). In this dataset, 2 large scaffolds (both > 30 kb), each containing at least 3 carbohydrate-active enzymes, were chosen for further analysis. The assembled sequences for these 2 fosmid inserts were 31,463 bp (Scaffold_56) and 33,847 bp (Scaffold_90) and contained 28 and 32 ORFs, respectively (Figure 3). On average, 89% of the sequences were protein-coding regions. The functional and taxonomic assignments of these ORFs were annotated according to the NCBI-nr and the COG databases ( Additional file5).

Figure 3

e7374d07425b51fb1759ccf68ca1700d.png

Gene structures of 2 fosmid inserts: Scaffold_56 (GenBank: JQ335997) and Scaffold_90 (GenBank: JQ335998). The ORFs are colored and labeled according to the COG functional categories as C (energy production and conversion), E (amino acid transport and metabolism), G (carbohydrate transport and metabolism), J (translation, ribosomal structure, and biogenesis), K (transcription), L (replication, recombination, and repair), O (posttranslational modification, protein turnover, chaperones), R (general function prediction only), S (function unknown), T (signal transduction mechanisms), and V (defense mechanisms). Further details of the putative function for each ORF are presented in Additional file5.

Based on taxonomic assignments, these 2 genomic fragments were of bacterial origin and were likely derived from Firmicutes species, since approximately 90% of the ORFs were assigned to this phylum ( Additional file5). Of the 60 ORFs in the 2 scaffolds, 33 had ≦ 60% identity with any known gene, whereas only 9 had ≧ 80% identity. We inferred that Scaffold_56 and Scaffold_90 represented segments of hitherto uncharacterized bacterial genomes. Based on the COG functional categories (Figure 3 & Additional file5), 12, 8, and 7 ORFs were classified into the G (carbohydrate transport and metabolism), L (replication, recombination, and repair), and K (transcription) categories, respectively, with other categories containing ≦ 3 ORFs each.

As regards carbohydrate-active enzymes, 6 putative GHs were encoded by ORFs-7, 11, and 12 of Scaffold_56 and ORFs-9 and 28–30 of Scaffold_90 (Figure 3 and Additional file5). With the exception of ORF-12 in Scaffold_56, which coded for a GH2 enzyme, all of these ORFs coded for members of the GH3 family. The identified GH2 contained a catalytic domain (PF02836) and a sugar-binding domain (PF02837) with potential activities as a beta-galactosidase, beta-mannosidase, or beta-glucuronidase. The ORF-28 and ORF-29 in Scaffold_90 coded for a polypeptide homologous to the C-terminal domain (PF01915) or N-terminal domain (PF00933) of a GH3 enzyme, respectively, whereas ORF-7 and ORF-11 in Scaffold_56 and ORF-9 and ORF-30 in Scaffold_90 each coded for both the N-terminal and C-terminal domains of GH3 enzymes with known activities, e.g. beta-glucosidase and beta-xylosidase.

The protein sequences of the GHs and their homologs from databases were used to construct a gene dendrogram (Figure 4). The GH2 sequences were located at the root and were separated from the GH3 sequences. Three GH3 ORFs (ORF-9 in Scaffold_90, and ORF-7 and ORF-11 in Scaffold_56) were clustered with homologs from various fibrolytic bacteria. The other 2 GH3 enzymes (encoded, by ORFs 28–29 and ORF-30, respectively, in Scaffold_90) were identified as Bgl3D and Bgl3E (both are beta-glucosidases), because they clustered with Bgl3D and Bgl3E of Butyrivibrio proteoclasticus B316 and Ruminococcaceae bacterium D16. In addition, both had homologs in Marvinbryantia formatexigens DSM14469 and Ruminococcus gnavus ATCC29149. It was noteworthy that Bgl3D and Bgl3E in the reference genomes were encoded by 2 adjacent genes, bgl3D and bgl3E, as were our 2 GH3 enzymes encoded by adjoining ORFs.

Figure 4

ca00e8982071ac67e1ae56963c2a4f1e.png

Distance dendrogram of glycoside hydrolases. Data included the deduced amino acid sequences of 6 GHs in Scaffold_56 and Scaffold_90 and their homologs from databases. The tree was constructed by the neighbor-joining method with 1,000 bootstrap replications using MEGA 5 software. Numbers near nodes indicate bootstrap values.

Other identified carbohydrate-associated genes included those coding for 3 sugar transporters (ORF-9 and ORF-10 in Scaffold_56, and ORF-10 in Scaffold_90), a sugar isomerase (ORF-13 in Scaffold_56) and a sugar kinase (ORF-14 in Scaffold_56) (Figure 3 and Additional file5). All 3 sugar transporters were suger-cation symporters which catalyze the uptake of simple sugars, including galactosides, pentosides, and hexuronides, in conjunction with a monovalent cation (H+ or Na+). According to the BLAST results, the isomerase and kinase were probably associated with utilization of L-arabinose and/or D-xylose, and participated in pentose and glucuronate interconversions. Furthermore, 5 genes that encoded transcriptional regulators (ORF-8 of Scaffold_56 and ORF-8, 11, 26, and 32 of Scaffold_90) may be involved in regulation of gene expression associated with carbohydrate utilization, due to their proximity to carbohydrate metabolism genes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值