Comprehensive Strain-level Analysis of Metagenomic Data

Principal Investigators: 

DingXuan.Zhou_.jpg

Prof Guanrong CHEN
Chair Professor, Department of electrical Engineering

Meichun Liu(LT).jpg

Prof Dirk PFEIFFER
Director, Centre of Applied One Health Research
and Policy Advice;
Chair Professor, Department of Infectious Diseases
and Public Health

DingXuan.Zhou_.jpg

Dr Yanni SUN
Associate Professor, Department of Electrical Engineering

Meichun Liu(LT).jpg

Dr Patrick LEE
Associate Dean and Associate Professor, School of Energy and Environment

Meichun Liu(LT).jpg

Dr Qingpeng ZHANG
Assistant Professor, School of Data Science


Project Period: 15 August 2020 – 14 August 2022

Most of today’s metagenomic composition and functional analysis focus only on one type of microbes, particularly bacteria, and ignore the interactions between viruses and bacteria. The majority of the studies stop at the species level because of the computational challenges for high resolution analysis. However, strain-level analysis is important for better and deeper understanding of the functions and behaviours of the underlying microbial community. 

The sequence dissimilarity between strains could be within a few percent, making it computationally challenging to detect them. The challenge becomes even bigger in the context of metagenomics when sequences of the strains are all scrambled together, so separating them into their respective genomes is not a trivial matter. The goal of this proposal is to develop a suite of methods and tools for conducting comprehensive strain-level analysis of metagenomic data and to provide an end-to-end solution to strain-level analysis of metagenomic data.

The project aims to achieve the following research objectives:

  1. Conducting hierarchical taxonomic classification for short reads using deep learning models and transfer learning.
  2. Identifying marker genes for species-level read classification using co-clustering.
  3. Constructing genome-scale strains using novel contig binning algorithm.
  4. Predicting short-time microbial community evolution using both metagenomic and metatranscriptomic data.

 Publications

  1. CHERRY a Computational metHod for accuratE pRediction of virus–pRokarYotic interactions using a graph encoder–decoder model
    Shang, J. & Sun, Y., Sep 2022, In: Briefings in Bioinformatics. 23, 5, bbac182.
  2. AnnoSINE : a short interspersed nuclear elements annotation tool for plant genomes
    Li, Y., Jiang, N. & Sun, Y., Feb 2022, In: Plant Physiology. 188, 2, p. 955-970
  3. Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning
    Shang, J. & Sun, Y., Nov 2021, In: BMC Biology. 19, 1, 250.
  4. Metagenomic insights into the microbial communities of inert and oligotrophic outdoor pier surfaces of a coastal city
    Tong, X., Leung, M. H. Y., Shen, Z., Lee, J. Y. Y., Mason, C. E. & Lee, P. K. H., Nov 2021, In: Microbiome. 9, 213.
  5. Bacteriophage classification for assembled contigs using Graph Convolutional Network
    Shang, J., Jiang, J. & Sun, Y., Jul 2021, In: ISMB/ECCB 2021 Proceedings. Oxford University Press, p. i25-i33 9 p. (Bioinformatics; vol. 37, no. Suppl. 1).
  6. Improving protein domain classification for third-generation sequencing reads using deep learning
    Du, N., Shang, J. & Sun, Y., Apr 2021, In: BMC Genomics. 22, 251.
  7. Diurnal variation in the human skin microbiome affects accuracy of forensic microbiome matching
    Wilkins, D., Tong, X., Leung, M. H. Y., Mason, C. E. & Lee, P. K. H., Jan 2021, In: Microbiome. 9, 129.