Similarities and discrepancies between carbon, nitrogen, and sulfur cycling genes and model predictions in the Chesapeake Bay
As the volume of hypoxic waters grows around the globe, numerical models are increasingly used to understand the drivers of hypoxia and guide decisions that will mitigate its impacts. Such models represent microbial processes, such as photosynthesis and decomposition, which actively cycle nutrients and particulate material that influence hypoxia. Although microorganisms are critical to such transformations, these models do not typically incorporate any microbial observations, except for chlorophyll a from phytoplankton, into model validation in part because data to constrain such representation is lacking. Metagenomic information could provide useful constraints for these models although it is not clear how best to use this information in models. To advance the use of metagenomics to improve models used to manage water quality in the Chesapeake Bay, major trends were identified in metagenomic observations of metabolic gene abundance across an observational dataset taken during the spring and summer of 2017 in Chesapeake Bay, including several gene sets that are strongly predictable from environmental variables typically simulated by models. Next, a subset of genes responsible for the largest amount of variation across the dataset and associated with primary production, nitrification, denitrification, and sulfur cycling, was compared to rates predicted by a numerical model that represents these processes. Modeled rates or biomass indicators (i.e. chlorophyll) are significantly correlated with many of the associated genes. However, interesting discrepancies were found, such as an overabundance of photosynthesis and denitrification genes in the deep waters in the spring where they are not expected associated with taxonomic changes between spring and summer. Our work highlights the potential of metagenomics as justification for adding taxonomically-resolved representations of key processes and could provide useful data to constrain a wide range of microbial processes in models.