Nanopore and Illumina Sequencing Recover Distinctly Different Viral Populations from Human Gut Samples
Ryan Cook 1*, Andrea Telatin 1, Shen-Yuan Hsieh 1, Fiona Newberry 2, Mohammad A. Tariq 3, Simon Carding 1,4, Evelien Adriaenssens 1
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
- Department of Biosciences, Nottingham Trent University, Nottingham, NG11 8NS, UK
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
Background: The study of uncultivated viruses through viromics has shaped our understanding of global viral diversity. Long-read sequencing approaches, such as Oxford Nanopore Technologies (ONT) and PacBio, have been widely adopted in bacterial metagenomics, increasing the completeness and quality of metagenome assembled genomes. However, there are few examples of long-read sequencing being used for viromics.
Methods: We sequenced viromes from three human faecal samples using an Illumina HiSeq and ONT MinION, and tested a number of assembler-read combinations including binning approaches. To assess the recovery of viral genomes, we processed the assemblies with geNomad and CheckV. To estimate viral genera, ≥50% complete predicted viruses were processed using vConTACT2 alongside INPHARED.
Results: The ONT assemblies recovered more genomes with ≥50% completeness, more predicted species, and more predicted genera than Illumina, although this varied with assembler used. However, Illumina assemblies recovered more fully resolved genomes than any ONT assembly, and this number was increased by using Phables. Furthermore, ONT assemblies had a higher frequency of genomes that may contain chimeras and/or duplications. To determine the effect of polishing ONT assemblies with Illumina reads, we examined the length of predicted open reading frames (ORFs). Illumina assemblies had a mean ORF length of 142 amino acids, versus ONT assemblies with 127, although this was increased to 133 through polishing with Illumina reads.
Conclusions: The use of hybrid sequencing approaches aids the recovery of viral genomes from natural samples, although ONT assemblies should be treated with caution as they may contain duplications and chimeras. Illumina assemblies still reflect the gold standard for fidelity, and polishing ONT assemblies with Illumina reduces the rate of putative assembly errors. Furthermore, bioinformatic approaches such as binning may aid the recovery of complete viral genomes from Illumina assemblies.