Modeling and Prediction of Marine Microbial Populations in the Genomic era

The articles in this special issue attest to the fact that we are in the early stages of a scientific revolution in marine microbiology that is being fueled by vast quantities of new information derived from advances in microbiological techniques and genomic studies. Many new species, metabolic processes, and pathways in marine systems have recently been discovered. Subtle metabolic variations within and among species have also been revealed, and previously known genes and metabolisms have been detected in new environments. And this is just the tip of the iceberg. Using genome shotgun sequencing techniques, Venter et al. (2004) report finding 148 previously unknown bacterial phylotypes and over 1.2 million previously unknown genes, including more than 782 new rhodopsinlike photoreceptors (Figure 1)—just from surface-water samples from the

. The overwhelming chal- constituents with exchanges between them specified by a set of partial differential equations (Hood and Coles, in press). These are the primary tools that we currently employ to synthesize emerging information and fold it into a coherent framework that can be used for prediction. Efforts to develop biogeochemical models that can be used to predict how the oceans will respond to global warming are a prominent example (e.g., Moore et al., 2002aMoore et al., , 2002b cycles. Such models have been gainfully applied as exploratory and predictive tools. However, it is not clear that these traditional modeling approaches will be sufficient in the face of all this emerging microbiological and genomic information; such models need to be "told" exactly what organisms and metabolisms exist in the ocean, and the rate coefficients that govern their parameterizations must be specified a priori. As such, they cannot tell you what is important and what is not. Moreover, these models cannot "evolve" over time and cannot therefore account for changes in species composition and metabolic capacity that may happen in association with changing climate and environment.
Given the emerging realization that there are vast and previously unknown quantities of genetic information in the ocean, it would appear that these deficiencies are serious.
We argue that if marine-ecosystem models are to be in the vanguard of the ongoing revolution in microbial oceanography, there need to be radical changes in modeling approaches and strategies.
The challenge we face is daunting. Our conceptual understanding is evolving much faster than model development. This is starkly illustrated by the fact that even though hundreds of new bacterial phylotypes have been discovered in recent years, most large-scale biogeochemical model formulations still do not include explicit representations of bacteria (Hood et al., 2006). We can anticipate that the accelerating rate of discovery will tend to encourage more rapid model development, but this poses a dilemma for the modeling community because it will also accelerate the current trend toward the development of increasingly complex and potentially intractable model formulations (see Doney et al., 2004, andRothstein et al., 2006, for additional discussion).
In a perfect world, modeling and theory should help lead the way as we venture into this brave new world of microbial and genomic discovery in the twenty-first century. At the very least, these tools should be employed in combination with field and laboratory studies to help make sense of all this emerging information. To achieve this integration, we will have to augment our traditional ecosystem and biogeochemical modeling approaches with new methods. We argue that these methods should include (1) application of overarching ecological theories that can help guide model development, (2) development of alternative modeling approaches and analysis methods that can overcome some of the limitations Figure 1. Phylogenetic tree of rhodopsinlike genes in the sargasso sea along with all homologs in GenBank. The sequences are colored according to the type of sample in which they were found: blue, cultured species; yellow, sequences from uncultured organisms in other environmental samples; and red, sequences from uncultured species in the sargasso sea. The tree is also divided into proposed distinct subfamilies of sequences on the right. Figure and caption modified from Venter et al. (2004). Reprinted with permission from AAAS of traditional models, and (3) Lotka (1922), for example, argued "…that natural selection tends to make the energy flux through the system a maximum, so far as compatible with the constraints to which the system is subject." Odum (1983) expanded on Lotka's ideas and formulated the "maximum power principle," suggesting that systems prevail that develop designs that maximize the flow of useful energy.
Odum argued that theories and corollaries derived from the maximum power principle can explain much about the structure and processes of ecosystems.
Although the maximum power principle has drawn some sharp criticism (e.g., Fenchel, 1987) Cropp and Gabric (2002), for example, used a genetic algorithm to simulate the adaptation of the biota in a simple linear food chain consisting of a limiting nutrient, autotrophs, and heterotrophs. They also concluded that ecological systems exist within the constraints of thermodynamic laws that prescribe the transfer of energy. But their simulations suggested the hypothesis that, within the constraints of the external environment and the genetic potential of their constituent biota, ecosystems will evolve to the state most resilient to perturbation (i.e., toward "maximum resiliency"). Interestingly, in their simple linear food chain, the selection pressures suggested by Lotka (1922) and Odum (1983) led to essentially the same system behavior as did maximum resiliency. Fath et al. (2001) (Jorgensen and Straskraba, 2000). We suggest that thermodynamic goal functions like these can provide a means to estimate unknown rates and rate parameters, identify organisms or processes that might be missing in a model, and providing a means to guide community evolution under changing environmental conditions. For example, Laws et al. (2000) used the simple box model depicted in Figure 2 to explore the implications of applying a resiliency goal function, based upon concepts developed by Steele (1974) and May (1974), to the regulation of sinking carbon export production in open-ocean ecosystems. Prior to the publication of that model, the seminal paper by Eppley and Peterson (1979) et al., 2007). The Laws et al. (2000) study was an effort not only to estimate unknown rate parameters but also to identify processes that might be missing in the Eppley and Peterson (1979)  unconstrained. These growth rates can be determined by selecting values randomly with the criterion for admissibility being a stable steady state such that the characteristic time constant associated with the return to equilibrium must be shorter than a specified value (following May, 1974). This tuning exercise not only determines these four free parameters, but, in the case of Station ALOHA, also leads to the conclusion that the export ratio most likely to be observed lies toward the low end of the spectrum of possible values, a result consistent with field studies (Emerson et al., 1997). Furthermore, when the predictions of the model are examined over a wide range of conditions, the conclusion is that f-ratios are a function of not only primary production, as postulated by Eppley and Peterson (1979), but also of temperature ( Figure  In theory, it should also be possible to use the same approach for simulating temporal evolution in community structure, production, export, and perhaps many other ecosystem attributes. This could, for example, provide a means to allow a model to reorganize itself seasonally and/or evolve in response to globalwarming-induced changes in oceanic temperatures and primary production. The success of the Laws et al. (2000) model in explaining the pattern of f-ratios over a wide range of conditions also lends support to May's (1974) Figure 2. Feeding and excretion relationships in a model pelagic food web in which photosynthetic production is partitioned between small and large phytoplankton cells. doM and PoM are dissolved and particulate organic matter, respectively. Redrawn from Laws et al. (2000) It would be misleading, however, to think that resiliency is the sole determinant of ecosystem structure and function. A wide variety of constraints, including genetic plasticity and the physical (e.g., temperature) and resource constraints noted by Patten (1993), will combine to limit the ability of biological communities to adapt to their environments. Identifying the right mix of constraint and adaptability within models of microbial communities remains a challenging task and a significant limitation on our ability to predict the composition and functionality of these communities in a changing world. give rise to better predictive skill.
In the vast majority of marineecosystem and biogeochemical models, the representation of key physiological processes-for example, the dependence of phytoplankton growth on light, temperature, and nutrient availability-are From Laws et al. (2000) empirical, using a few simple functional relationships (e.g., Holling's type I, II, and III grazing response curves and Q 10 -type temperature dependence of metabolic function), constrained where possible by laboratory culture and chemostat studies (e.g., Eppley and Coatsworth, 1968;Eppley, 1972). These functional relationships are also typically fixed in the sense that adaptation to changing local conditions is not allowed (i.e., photo and nutrient acclimation).
Thus, compared to real-world ecosystems, most marine-ecosystem and biogeochemical models are quite rigid in terms of their physiological responses.
This rigidity may be particularly prob-lematic in efforts to apply these models to assess how plankton community composition and biogeochemical cycles might be impacted by past or future climate change (i.e., we can anticipate that they will tend to predict changes in species compositions and chemical cycling that are too abrupt because the plankton in these models cannot adapt and ammonium uptake model. no3P, nh4P, GlnP, and Q are internal pools of nitrate, ammonium, glutamine, and other organic cellular n, respectively. nniR is nitrate-nitrite reductase, and Gs glutamine synthetase activities. nt and at are nitrate and ammonium transporters, respectively. nR describes the process of nitrate reduction through to ammonium, and aa the synthesis of amino acids and all other nitrogenous compounds from Gln. "Promotion," "regulation," and "effector" are used in general terms, with no specific biochemical meaning, indicating positive, negative, or complex feedbacks, respectively. Reprinted from Flynn et al. (1997, Figure 2 important caveat is that more complex models must be properly constrained with data (i.e., if they have too many degrees of freedom, then they can be tuned to fit noise in the data, which will result in reduced predictive skill). This is a significant caveat because it means that the amount of complexity that we can usefully incorporate into models will be dictated by the availability of the validation data that is needed to constrain them. The implication is that continuing to add new organisms and metabolic processes to models that are discovered through microbiological and genomic studies may not be useful unless validation data relevant to these processes (e.g., time-series or spatial data) can also be obtained.
The trend toward the development of increasingly complex models has raised concerns, especially among ecologists who have known for many years that there may be fundamental limits to the level of complexity that can be usefully incorporated into ecological and biogeochemical models (May, 1974). Typically, marine-ecosystem models have progressively increased the complexity and resolution of functional groups, incrementally adding new "species" or functional types with a priori imposed physiological characteristics. In contrast, an alternative approach has recently been explored by Follows et al. (2007): the model is initialized with a very diverse phytoplankton community, explicitly representing many tens of potentially viable functional types whose physiological characteristics are provided stochastically, from plausible ranges. When embedded in a simulated, global, four-dimensional (x, y, z, and time) physical and chemical environment, several of the (relatively) fittest organism types grow to dominate each biogeographical "province" while many, less-fit types decline to very low abundance or extinction. The system thus "self-selects" its own community structure in a manner that is conceptually The vertical dashed line separates the single-phytoplankton models (models 1-5) from the multi-phytoplankton models (models 6-12). Bars lower than the dotted horizontal line indicate that the model-data misfit is lower than that computed from the mean of the observations. two solid horizontal lines represent mean cost for the single-phytoplankton and multiphytoplankton models, respectively; error bars indicate one standard error. Expt. 3 results are derived from simultaneous assimilation of as and EP data. Expt. 4 results are from cross-validation experiments where the models were fit to data from as(EP) and then cost J calculated at EP(as). Figures and caption modified from Friedrichs et al. (2007) explicit models of cellular-scale meta- surrounds them (e.g., Hulburt, 1970).
It has long been suggested that phytoplankton create diffusive spheres of nutrient deficit about them (Pasciak and Gavis, 1974), and their discrete nature has been observed in situ (e.g., Franks and Jaffe, 2001). This notion of individual interaction is built into modern models of zooplankton prey encounter and utilization (e.g., Wiggert et al., 2005). In all, marine microbial interactions are, to first order, among individuals, and this discrete nature will have influence on the population dynamics and community structure changes.
The discrete nature of marine populations suggests that to understand and predict the dynamics of marine microbial communities, the spatial organization and interactions of individuals need  Follows et al. (2007). Reprinted with permission from AAAS to be assessed (e.g., Blackburn et al., 1997). Even though the microbial abundances are huge, in the fluid mechanical sense they are dilute. Therefore, microbial populations will respond to environmental perturbations as discrete individuals interacting (slowly) through a viscous fluid medium (e.g., Siegel, 1998).
This, in turn, should allow the emer-   Batchelder et al., 2002), salmon migration and population dynamics (e.g., Rand et al., 1997), and even jellyfish transport and swimming behavior (Hood et al., 1999;Matanoski and Hood, 2005). Similarly, these types of models could also be used to try to better understand the ecological and biogeochemical role of newly discovered bacterial phylotypes and metabolic processes. Bacteria (red) acting on marine snow or detrital particles (black) or organic matter efflux from phytoplankton (green) creating diverse microniches or "hotspots," which can support high bacterial diversity and high productivity. Protozoa are also seen aggregating about these "hotspots" of activity. Figure and legend modified from Azam (1998