evergreen 2023 logo
phages machine learning genome sequencing data bioinformatics

Machine learning based phage discovery in sequencing data

Abstract ID: 45-FR

Anastasiya Gæde *, Thomas Sicheritz-Pontén

  1. University of Copenhagen, GLOBE institute, Section for Hologenomics
  2. University of Copenhagen, GLOBE institute, Section for Hologenomics

Anastasiya Gæde, nastyashen06@gmail.com

Phages are present in every living environment and play a crucial role in steering microbial population dynamics. They are important entities in our ecosystem, yet the true diversity of phages remains largely unknown. Many phages remain undiscovered, making the process of discovering novel phages challenging, time- consuming, and expensive. To expedite this process, we are working on developing an in silico tool that can accurately and rapidly recognize phage genomes in a reference- and host- independent manner.We have gathered all publicly available phage genomes and fragmented each genome into several genes-long fragments to train our machine learning model. We generated 250 genomic features to describe these fragments. Our ultimate goal is to create an easy-to-use, fast, and efficient phage prediction tool. Such a tool will enable reference-free phage discovery in sequencing data, making the process more accessible.Additionally, by extracting the most valuable features selected during training, we can gain a better understanding of the unique genome peculiarities of phages.