[Bioinformatics] Environmental Impacts of Machine Learning Applications in Protein Science

Loïc Lannelongue1,2,3,4 and Michael Inouye1,2,3,4,5,6 1Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom 2British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom 3Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, United Kingdom 4Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom 5Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia 6British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB2 0BB, United Kingdom Correspondence: ll582medschl.cam.ac.uk

Computing tools and machine learning models play an increasingly important role in biology and are now an essential part of discoveries in protein science. The growing energy needs of modern algorithms have raised concerns in the computational science community in light of the climate emergency. In this work, we summarize the different ways in which protein science can negatively impact the environment and we present the carbon footprint of some popular protein algorithms: molecular simulations, inference of protein–protein interactions, and protein structure prediction. We show that large deep learning models such as AlphaFold and ESMFold can have carbon footprints reaching over 100 tonnes of CO2e in some cases. The magnitude of these impacts highlights the importance of monitoring and mitigating them, and we list actions scientists can take to achieve more sustainable protein computational science.

留言 (0)

沒有登入
gif