Phage genomes - taming the unrecognisable, uncategorisable and uncomparable.
Thomas Sicheritz-Pontén 1*, Martha Clokie 2
- Globe Institute, University of Copenhagen, Denmark
- Leicester Centre for Phage Research, UK
Phages were first isolated and used as therapeutics around a century ago. They were used independently of any genetic knowledge and ultimately abandoned due to their complexity and because antibiotics were simpler to develop. This complexity is now what is needed to be channelled into medicine to produce much needed antimicrobials to treat antibiotic resistant bacteria. However, using phages therapeutically is challenging on many levels. Unlike the situation a century ago, investigating phages in the 2020s can exploit the wealth of genetic information encoded in their genomes. As phage genomes are remarkably diverse with hardly any conserved sequences, bioinformatics analysis is highly challenging where traditional bioinformatics tools compare genomes and genes based on DNA or protein sequence similarity. The high degree of genetic variability can make it difficult to accurately annotate phage genomes and predict the functions of their encoded proteins - the majority of phage genes have no homologs in existing databases, complicating the understanding of phage biology. The vast genetic diversity also hinders genome-wide comparisons of phage genomes, where typically less than 25% of genes within individual phage genomes have sequence similarities, and generally only within closely related groups.
To address these issues, we are developing new methods using feature based machine learning (based on the observation that proteins with similar functions can share features, despite being far apart in sequence space), ecology analysis (based on our observation that prediction of transcriptional take-over strategies allows the identification of ‘Phage functional types’) and social network graph theory (based on the observation that genomic networks display patterns of interactions that are similar between phages targeting different bacterial hosts). These methods will aid clinical, agricultural, and industrial phage therapy by identifying similar phages and predicting content similarity without relying on sequence similarity.