Data Assimilation for Coastal Observing Systems

Introduct ion Data assimilation is the process of combining our knowledge of a system, including both observations and dynamics, to produce estimates of the state of a system. The state of a system, or state variable, describes the physical, chemical, or biological environment. Real, four-dimensional systems are generally under-sampled. This is particularly true in the coastal marine environment where there is a broad range of relevant time and space scales. There is inadequate spatial or temporal resolution in the measurements. Variables of interest cannot be measured adequately, or we wish to predict the future. These problems may be addressed through judicious application of data assimilation. Given that within the coastal and estuarine environment there is substantial need for data assimilation, what approaches are likely to be successful? What can we learn from the meteorological community? Here, we describe data assimilation procedures and provide a limited set of examples of coastal marine atmospheric and oceanographic assimilation. These examples provide some guidance for developing data assimilation in the coastal and estuarine marine environment. When data are abundant, these techniques may be used to produce acceptable field es t imates . . .


Introduction
Data assimilation is the process of combining our knowledge of a system, including both observations and dynamics, to produce estimates of the state of a system.The state of a system, or state variable, describes the physical, chemical, or biological environment.Real, four-dimensional systems are generally under-sampled.This is particularly true in the coastal marine environment where there is a broad range of relevant time and space scales.There is inadequate spatial or temporal resolution in the measurements.Variables of interest cannot be measured adequately, or we wish to predict the future.These problems may be addressed through judicious application of data assimilation.
Given that within the coastal and estuarine environment there is substantial need for data assimilation, what approaches are likely to be successful?What can we learn from the meteorological community?Here, we describe data assimilation procedures and provide a limited set of examples of coastal marine atmospheric and oceanographic assimilation.These examples provide some guidance for developing data assimilation in the coastal and estuarine marine environment.
When data are abundant, these techniques may be used to produce acceptable field estimates...

A S SIMILATION PR 0 CED URES
The following discussion is intended to provide a general overview of assimilation procedures, to define important terms, to explain key concepts.For further information, consult Bennett (1992), Ghil and Malanotte-Rizzoli (1991), or Robinson et al. (1998).There are two distinct classifications of data assimilation commonly applied: filters and smoothers.Schematics of these procedures are shown in Figure 1.Both classifications incorporate a dynamical model.Dynamical models for meteorology and oceanography are implemented as numerical algorithms.Without a dynamical model, the process is simply an analysis or gridding of the data.Our discussion begins with data gridding procedures because these are components of many assimilation processes and may be used for data visualization or model-data intercomparison.

Data Gridding
Data gridding procedures are generally forms of statistical or functional interpolation.This subject is well covered by Daley (1991).The most common statistical technique is Gauss-Markov smoothing known to meteorologists and oceanographers as objective analysis (Bretherton, Davis, and Fandry, 1976).This analysis procedure minimizes the expected error subject to knowledge of the covariances and sampling.While not generally recognized, this procedure can be used to estimate a quantity that is different from the quantity measured or even produce estimates from multi-variate data sets.For instance, given measurements of upper-ocean velocity the dynamic pressure field may be estimated (Walstad et al., 1991).This approach is used extensively for producing initial condition estimates from moving and stationary platforms.Other techniques using empirically derived weightings of nearby points or spline interpolation have been used as well.When data are abundant, these techniques may be used to produce acceptable field estimates but these techniques do not exploit our knowledge of system behavior.

Filters
We first consider the case of a filter (Figure 1).Suppose that data were collected at times t 0, t, ..... t, and regard the final data collection time as the current time.The initial data are used to describe an initial condition for the numerical model, (b0.This is accomplished by a gridding algorithm applied to the data at time t o as described above.The initial conditions for each aspect of the state variable must be estimated for each grid point of the numerical model.This generally means that statistical, dynamical, or functional extrapolations must be used to estimate unmeasured quantities.It is common to use empirical orthogonal functions or dynamical modes to extrapolate from upper ocean measurements (Spall and Robinson, 1990;Walstad et al., 1991).Some components of the state variable are not measured.For instance, vertical velocity is not routinely measured in the atmosphere or ocean.These components must be specified through additional dynamical constraints.Meteorologists use normal mode initialization to place the initial solution of the numerical model on the slow manifold (Daley, 1991).The slow manifold is the solution that is slowly evolving in time and is not dominated by gravity wave energy.Oceanographers generally initialize models with fields that are in geostrophic balance, though near-shore and estuarine applications require other approaches.Given a full set of initialization fields, the model is integrated forward to time t,.Regional and coastal models generally require boundary conditions specified on the model grid points at the edge of the computational domain.These may be estimated from measurements or be derived from a data assimilation system estimating the larger scales.The model field at time t 1, (b~',is an intermediate estimate.If no data from time t~ have been used to provide boundary conditions then this intermediate estimate is a forecast.The next step in the process is to difference the model from the observations at the observation locations.This residual at time t 1, r,, is then gridded using operator K to produce an update to the intermediate estimate, A~bl.
This update is added to the intermediate estimate, cb~' to produce the analysis, ~b~=qb,'+G~b,.This is used as the initial condition for the next iteration of the sequential filter.The set of analyses, ~b~, qb~ ..... ~bn, is the product of this procedure.When constructed in near real-time, these analyses are considered nowcasts.Forecasts are constructed by integrating the dynamical model into the future.
If the gridding operator is the Kalman gain matrix, then the procedure is the Kalman Filter (KF).For relatively simple models, the KF has been applied but the computational expense for typical coastal systems is prohibitive.Alternatives to the KF have been constructed.To a certain extent, these methods can be placed into the formalism of the KF and regarded as approximations to the KF.These methods include direct insertion, wherein the data values replace model values at the locations of the measurements.This is equivalent to assuming that the data are perfect and that errors in the model forecast are uncorrelated.Blending of the data into the model using subjectively derived weighting functions is a modification of the direct insertion technique.Nudging or Newtonian damping has been applied as an assimilation tool (Anthes, 1974).The procedure involves adding a forcing term to the dynamical model that drives the model toward the observations.The rate of nudging must be smoothly varying in time and space and must be small.The technique can be effective and is relatively easy to implement.See Fukumori and Malanotte-Rizzoli (1995) for an example.
The optimal interpolation (OI) or statistical interpolation (SI) method is accomplished by empirically assigning the gain.A common procedure is to specify the error covariance matrix and use the objective analysis procedure to calculate the gain matrix and grid the residuals to produce the update, A(b~ (e.g.Robinson, 1999).A procedure not widely used in oceanography is the method of successive corrections whereby the update is constructed by iterating upon scales or processes (Daley, 1991).This procedure recognizes that there are different processes and scales acting within the fluid and that each of these scales and processes has an error covariance with different structure.For instance, geostrophic velocities are correlated with the local normal gradient in sea surface elevation, while ageostrophic velocities are correlated with the local sea surface elevation.Alternatively, a three-dimensional variational scheme could be applied to minimize the difference between the analysis and the data (Daley, 1991).

Smoothers
The smoother is an inverse method; given the output of a system, the parameters that control the system are determined.These parameters can include the initial condition, the boundary conditions, any adjustable parameters of the dynamical model, and possibly a dynamical error term.
The basic algorithm for a smoother is shown in Figure 1.A fundamental difference between filters and smoothers is the impact of the data.Within a smoothing algorithm, the data may impact the entire set of field estimates.Whether or not data actually have an impact on the entire set of field estimates depends upon the dynamics of the system.Most smoothers also differ in objective.Whereas a filter is designed to minimize the expected error of each field estimate (analysis), most smoothers are designed to minimize the difference between the set of field estimates and the data.This difference is described by a cost-function.Cost functions penalize the difference between the solution and the available measurements.Smoothness of initial and boundary conditions may be an additional component of the cost function or may be imposed by limiting the realizable initial and boundary conditions.
As with the filters, an initial condition may first be estimated from available data through a gridding procedure (Figure 1).The model is then integrated throughout the inversion period, and the data-model differences are calculated.The inversion algorithm determines how to modify the initial condition, the parameters of the model, or the forcing functions.When the minimization modifies the parameters to achieve the minimum of the cost function, the procedure is referred to as parameter estimation.There are two important classes of smoothers, strong constraint and weak constraint.For strong constraint methods, the solution satisfies the dynamical system.With weak constraint methods, there is an error term that is added to the model system and the cost function.For each of these methods, there are several approaches to achieving the minimum.
The appropriate approach to minimizing the cost function depends upon the particular system, the adjustable parameters, and the available data.Techniques include the adjoint method, the representer method, the Kalman smoother, steepest descent, conjugate gradient, and simulated annealing (Bennett, 1992).The term adjoint method has been applied to a number of algorithms and therefore the details of each application must be consulted (Wunsch, 1996;Robinson et al., 1998).The representer method has been applied to current meter data from Massachusetts Bay (Bogden et al., 1996).Stochastic (Evensen, 1994) and hybrid methods (Lermusiaux and Robinson, 1999) are being developed and applied.

Regional Atmospheric Assimilation
The ETA atmospheric model (Black, 1994) provides the large scale atmospheric fields for both the Coastal Ocean Forecast System (COFS) and also as input to the Chesapeake Bay Local Analysis and Prediction System (CBLABS) (Fuell et al., 1999).The ETA model fields are produced as a standard NOAA product and used for routine forecasting.The Chesapeake Bay LAPS system is a regional application of the LAPS system (Albers, 1996).This system provides a structure for ingesting data from the atmospheric data streams and modeling systems.Data from local sources including buoys, commercial aircraft, and locally identified meteorological data are collected and gridded onto a fine-scale (4 km) atmospheric-grid.
The CBLAPS analysis is used as the initial condition for the Chesapeake Bay implementation of the Regional Atmospheric Modeling Systems (CBRAMS) (McQueen et al., 1999).This system integrates the non-hydrostatic equations of motion to produce forecasts.The near surface wind prediction shows several important characterisfics (Figure 2).Typically, as the air moves from 4( RAMS 1999 May 1 4:00
above the land to over the water, particles accelerate due to reduced drag.This acceleration is dependent upon the boundary layer dynamics affected by landsea-air temperature differences, humidity, and boundary layer structure.The difference between land and water also leads to channeling of the wind along the axis of the estuary.Sea breezes result in wind-reversals over the water during the late afternoon in summer.These processes are not well represented with measurements that are primarily located over land.Data assimilation with sufficient resolution and appropriate physics can reproduce these important marine atmospheric phenomena.

Regional Coastal Oceanographic Assimilation
The Chesapeake Bay Experimental Forecast System (CBEFS) (Gross et al., 1999) uses the wind-field from CBRAMS to drive a barotropic shallow water model of Chesapeake Bay to nowcast and forecast water level in Chesapeake Bay.The open boundary condition at the Bay mouth is determined by water level measured at the Chesapeake Bay Bridge Tunnel (CBBT).As configured, this system is a filter.The CBEFS nowcast/forecast cycle consists of nowcasts conducted with up-to-date height data from the CBBT and winds from CBRAMS.The nowcast surface height field serves as the initial condition for the forecast that extends 48 hours into the future (Figure 3).The open boundary condition at CBBT is forecast using a combination of persistence, the astronomical tide, and a storm surge model forecast.
Offshore, the Coastal Ocean Forecast System (COFS) nowcasts and forecasts the full three-dimensional water column using the Princeton Ocean Model (POM) (Breaker et al., 1999).The assimilation is a filter.Sea surface temperature from satellite data and temperature profiles from XBT data are used to update the model.Surface data are extrapolated to depth by a mixed layer adjustment algorithm.Altimeter assimilation is now a component of the quasi-operational COFS and has already been observed to reduce some of the difficulties previously observed where the Gulf Stream separates from the coast.An example of the type of sea surface temperature field produced from this model is seen in Figure 4.The quality of the analyses is being assessed by comparison to independent data.Comparisons have demonstrated levels of skill (Kelly et al., 1998.).
Note that while this is a coastal forecasting system, the domain is much larger than the region of the shelf/slope.The region of influence and the quality of the boundary condition information determine the locations of open boundaries in the dynamical model.The region of influence is the area that will impact the region of interest during the course of the assimilation.The region of influence shrinks when there are more data to assimilate because the more data assimilated the more the model is constrained to behave as the real ocean does.With less data, the region of influence expands so that errors on the boundary do not overwhelm the region of interest solution.When very good data are available on the boundaries, the boundaries should be close to the region of interest to maximize the influence of this boundary data.The COFS boundaries are currently forced by climatological values; the boundaries must be far from the region of interest so as not to adversely impact the field estimates.

Lascara, 1999
).There has been substantial progress in the parameter estimation problem for zero-dimensional planktonic ecosystem models (e.g.Lawson et al., 1996;Fasham and Evans, 1995), but comparatively little has been done in the context of spatially explicit frameworks (e.g.Ishizaka, 1990).Difficulties arise from the paucity of data and poorly understood biological dynamics.Progress is being made with approaches that fully exploit the available data and best understood components of the biological dynamical system.This approach has been used in a study of the seasonal variation in climatological abundance of the calanoid copepod Pseudocalanus spp. in the Gulf of Maine Georges Bank (McGillicuddy et al., 1998).
A smoothing technique was applied to determine the sources and sinks of plankton through the inversion of a relatively simple transport model.An illustrative example of the results from McGillicuddy et al. (1998) is shown in Figure 5. From the period January-February to March-April, the inferred biological source term consists of strong growth (red shading) on the crest of the bank and moderate growth (yellow shading) in a coastal strip just offshore of Cape Ann.Data assimilation will provide an improved understanding of the population and the population dynamics.As these techniques mature, the information generated undoubtedly will lead to improved management of natural resources.

Observing Systems Simulation Experiments
While data assimilation can provide the best estimate of the state of a system for a given data set, effective sampling strategies are critical to the production of useful field estimates.The effectiveness of any sampling strategy is ultimately determined by the accuracy with which the observations can be used to reconstruct reality, the state of the natural system being measured.Given limited opportunity for evaluation of sampiing strategies against objective criteria with purely observational means, As these techniques mature, the information generated undoubtedly will lead to improved management of natural resources.
numerical models offer an attractive framework for investigation of these issues.The approach begins with the construction of a simulation characteristic of the natural system.The simulation serves as a representation of reality, which is then sub-sampled in a specified fashion to produce a simulated data set.The simulated data are then fed into the analysis or data assimilation procedure to produce field estimates.Direct comparison of the reconstructed field with the original simulation thus provides a quantitative evaluation of that particular combination of sampling strategy and analysis or assimilation methodology.This approach, an Observational System Simulation Experiment (OSSE), originated in dynamic meteorology (e.g.Charney et al., 1969) and is recognized as an important tool for the development of oceanographic sampling systems (Robinson et al. I 1998).
One aspect of sampling that is particularly amenable to assessment with the OSSE is the synopticity of spatial surveys.A set of measurements is synoptic if it is collected in a time interval that is short enough that the underlying distribution does not change appreciably.OSSEs provide a means to quantify the extent to which dynamics in the underlying field can compromise the fidelity of a map generated from data collected over a finite time interval.The procedure consists of the following three steps: (1) sub-sampling model output along a realistic cruise track, (2) objectively mapping the simulated data, and (3) comparing the analysis with an instantaneous snapshot from the original model calculation.Of course, this estimate of the space/time smearing associated with the sampling strategy is robust only to the extent to which the model simulation used as the basis for the OSSE is representative of the real ocean.

Discussion
Data assimilation is an effective tool for estimating and predicting the coastal ocean environment.While most current methods use relatively sophisticated dynamical models and relatively simple filtering techniques to assimilate data, advances in computational power and numerical techniques are rapidly increasing our ability to use more sophisticated data assimilation methods.We anticipate that data assimilation will become a routine procedure for estimating the coastal environment just as it is for atmospheric phenomena.
The importance of wind-stress for physical circulation and the coupling of biological and physical processes in the coastal ocean require that accurate wind stress fields be applied.As demonstrated with the RAMS fields, the availability of finescale atmospheric data assimilation models can give the oceanographic community the stress fields needed to force coastal circulation models.These atmospheric modeling systems also produce estimates of additional atmospheric fields such as visibility and precipitation.Near population centers, these systems are useful for understanding and predicting pollution patterns as well.The widespread application of these models will be of significant benefit to the coastal marine and estuarine community.A lesson from the meteorological community is the utility of a system for ingesting data, melding model fields, and gridding.The LAPS provides this framework for meteorology.An oceanographic analog of LAPS will permit the efficient implementation of coastal and estuarine assimilation systems.To accomplish this goal will require an important enabling technology, standardization and regularization of data streams.Widely adopted standards for data and modelfield communication will facilitate the exchange of information.The coastal and estuarine oceanographic community must learn from the experience of the meteorological community.
Another lesson from the meteorological experience is the utility of larger scale assimilation systems for estimating and predicting the region surrounding our region of interest.These estimates will permit improved offshore boundary conditions for coastal and regional models.This can greatly reduce the computational expense and complexity of systems targeting the coastal environment.There are several major efforts underway.We have mentioned COFS as a component of the CMDP.COFS will include the Gulf of Mexico; the research version of COFS has already been integrated in this larger domain.The U.S. Navy's Fleet Numerical Meteorology and Oceanography Center (FNMOC) has a system for the west coast (Clancy et al., 1996).FNMOC is also rtmning global models at lower resolution.Following the lead of the atmospheric community, we should work toward suitable transfer of boundary condition information from the large-scale global models to the mesoscale regional coastal models, and then to the finescale coastal models.
Finally, most coastal regions have model simulations if not prototype assimilation systems.These should be exercised to perform OSSEs.These OSSEs can be used, in combination with the intuition and experience of regional oceanographers, to evaluate and design observing systems.The OSSE sampiing can be assimilated into the numerical model to evaluate the assimilation process.This approach can be used to iteratively improve the observational network and the assimilation process to produce improved field estimates.
Advances in sensor technology and numerical dynamical models are providing a new set of tools for sampling and simulating our environment.Data assimilation provides the means for combining these tools to estimate and predict the marine environment.Scientists, managers, mariners, and inhabitants of the coastal environment will benefit immensely from the improving ability to estimate the state of our environment and the mechanisms by which our environment changes.

Figure 5 .
Figure 5. Bi-monthly climatological Pseudocalanus spp.distributions (adults only) derived from the MARMAP data (number of animals m ~) for (a) January-February, and (b) March-ApriL Panel (c) shows the source term which results from the population dynamics inversion.The remaining panels show the remaining terms in the advection-diffusion-reaction equation averaged over the period of integration: (d) advective flux divergence, (e) diffusive flux divergence, and (f) overall tendency.Fields in the bottom four rows have been normalized to the bottom depth, so the units are number of animals m-2s -1.The sign convention is such that the overall tendency equals the sum of advection, diffusion and source terms.