PacBio But Not Illumina Technology Can Achieve Fast, Accurate and Complete Closure of the High GC, Complex Burkholderia pseudomallei Two-Chromosome Genome
PacBio而非Illumina技术可以实现快速、准确、完全封闭高GC、复杂的pseudomalleburkholderia双染色体基因组
作者:
ade L. L. Teng,Jade L. L. Teng,Jade L. L. Teng,Jade L. L. Teng,Man Lung Yeung,Man Lung Yeung,Man Lung Yeung,Man Lung Yeung,Elaine Chan,Lilong Jia,Chi Ho Lin,Yi Huang,Herman Tse,Herman Tse,Herman Tse,Herman Tse,Samson S. Y. Wong,Samson S. Y. Wong,Samson S. Y. Wong,Samson S. Y. Wong,Pak Chung Sham,Pak Chung Sham,Susanna K. P. Lau,Susanna K. P. Lau,Susanna K. P. Lau,Susanna K. P. Lau,Susanna K. P. Lau,Patrick C. Y. Woo,Patrick C. Y. Woo,Patrick C. Y. Woo,Patrick C. Y. Woo,Patrick
作者背景:
- epartment of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong KongHong Kong, Hong Kong
- State Key Laboratory of Emerging Infectious Diseases, Department of MicrobiologyThe University of Hong Kong, Hong Kong, Hong Kong
- Research Centre of Infection and ImmunologyThe University of Hong Kong, Hong Kong, Hong Kong
- Carol Yu Centre for Infection, The University of Hong KongHong Kong, Hong Kong
- Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong KongHong Kong,
展开
文章关键词:Complete,Genome,PacBio RS II,P6-C4,Burkholderia pseudomallei
原文摘要:Although PacBio third-generation sequencers have improved the read lengths of genome sequencing which facilitates the assembly of complete genomes, no study has reported success in using PacBio data alone to completely sequence a two-chromosome bacterial genome from a single library in a single run. Previous studies using earlier versions of sequencing chemistries have at most been able to finish bacterial genomes containing only one chromosome with de novo assembly. In this study, we compared the robustness of PacBio RS II, using one SMRT cell and the latest P6-C4 chemistry, with Illumina HiSeq 1500 in sequencing the genome of Burkholderia pseudomallei, a bacterium which contains two large circular chromosomes, very high G+C content of 68–69%, highly repetitive regions and substantial genomic diversity, and represents one of the largest and most complex bacterial genomes sequenced, using a reference genome generated by hybrid assembly using PacBio and Illumina datasets with subsequent manual validation. Results showed that PacBio data with de novo assembly, but not Illumina, was able to completely sequence the B. pseudomallei genome without any gaps or mis-assemblies. The two large contigs of the PacBio assembly aligned unambiguously to the reference genome, sharing >99.9% nucleotide identities. Conversely, Illumina data assembled using three different assemblers resulted in fragmented assemblies (201–366 contigs), sharing only 92.2–100% and 92.0–100% nucleotide identities to chromosomes I and II reference sequences, respectively, with no indication that the B. pseudomallei genome consisted of two chromosomes with four copies of ribosomal operons. Among all assemblies, the PacBio assembly recovered the highest number of core and virulence proteins, and housekeeping genes based on whole-genome multilocus sequence typing (wgMLST). Most notably, assembly solely based on PacBio outperformed even hybrid assembly using both PacBio and Illumina datasets. Hybrid approach generated only 74 contigs, while the PacBio data alone with de novo assembly achieved complete closure of the two-chromosome B. pseudomallei genome without additional costly bench work and further sequencing. PacBio RS II using P6-C4 chemistry is highly robust and cost-effective and should be the platform of choice in sequencing bacterial genomes, particularly for those that are well-known to be difficult-to-sequence.