Decoding oxygen preference: Machine learning discovers functional genes in Bacteria

ElsevierVolume 117, Issue 5, September 2025, 111095GenomicsAuthor links open overlay panel, , , , , , , Highlights•

Random Forest model predicts bacterial oxygen preference with >90 % accuracy.

Genomic features pinpointed key protein domains for oxygen adaptation.

Overexpression of model-identified genes validated their role in aerobic growth.

Model applied to rumen metagenomes, revealing anaerobic dominance.

Abstract

Predicting bacterial oxygen preference and identifying associated genes is critical in microbiology. This study developed a machine learning model using genomic features to predict bacterial oxygen preference and discover potential functional genes. Trained on a dataset of 1813 bacterial genomes, a Random Forest model achieved 90.62 % accuracy in predicting oxygen preference, outperforming prior methods. Feature analysis pinpointed key protein domains and candidate genes. Experimental overexpression of model-identified genes (encoding SOD, SAM radical enzyme, GCV-T, FDH domains) in Escherichia coli enhanced growth under aerobic conditions, validating their role in oxygen adaptation. Applying the model to rumen metagenomes revealed a predominantly anaerobic community. This work establishes machine learning as an effective strategy for bacterial oxygen preference prediction and functional gene identification, offering a novel strategy and tool for in-depth understanding of bacterial oxygen adaptation mechanisms, discovering key functional genes, and efficient exploration of uncultured microbial resources.

Keywords

Machine learning

Bacterial oxygen requirement

Protein domain

Gene function

Application

© 2025 The Authors. Published by Elsevier Inc.

Comments (0)

No login
gif