Ocean Data Assimilation Systems for GODAE

Ocean data assimilation has reached a sufficient level of maturity such that observations are routinely combined with model forecasts to produce a variety of ocean products. Approaches to ocean data assimilation vary widely, both in terms of the sophistication of the method and the observations assimilated, but also in terms of improving the details of the assimilation, such as forecast error covariances, model biases, observation representation errors, and quality control of the observations. In this paper we compare ocean data assimilation systems that have been developed within the GODAE community. We discuss the advantages and limitations of the methods and present practical aspects of the implementations, including observing systems used in the assimilation, specification of the error covariances, and analysis performance results. Finally, we describe plans for improving the systems in the post-GODAE time period beyond 2008. The GODAE systems discussed include: (1) Bluelink Ocean Data Assimilation System (BODAS), (2) 1 co-authors and assimilation systems are listed in alphabetical order


Introduction
During GODAE, ocean data assimilation has evolved into a mature science.It has been shown to be a quantitative approach that can be used in a mathematically rigorous way to extract information from the relatively sparse and expensive observations of the time-varying ocean circulation.The main goals of ocean data assimilation and state estimation are to improve our understanding of the ocean circulation and to monitor and predict the circulation on all relevant time and space scales.In GODAE, ocean data assimilation is used to: (1) synthesize observations with numerical models to obtain the best possible dynamically consistent description of the changing ocean, and (2) initialize ocean models and optimally utilize all of the available observations through sequential approaches for forecasting.
Although major challenges remain, substantial progress has been made during GODAE in the development of ocean data assimilation systems.GODAE assimilation systems are now producing ocean state estimates and ocean forecasts on a routine basis, some in near-real-time.The GODAE systems represent large investments by many national efforts that are in the process of being transitioned into sustained assimilation activities, both for scientific and operational purposes.This paper provides a summary status of the ocean data assimilations systems developed in support of GODAE activities.
The GODAE systems discussed in the paper include: BODAS (Bluelink Ocean Data Assimilation System), operational at the Australian Bureau of Meteorology; the near-real-time ECCO (Estimating the Circulation and Climate of the Ocean), in use at the Jet Propulsion Laboratory; FOAM (Forecast Ocean Assimilation Model), operational at the UK Met Office; MERCATOR, operational within the European Union; MOVE/MRI (Multivariate Ocean Variational Estimation / Meteorological Research Institute) system, operational at the Japan Meteorological Agency; NCODA (Navy Coupled Ocean Data Assimilation), operational at the U.S. Navy oceanography centers; NEMOVAR (Nucleus for European Modeling of the Ocean Variational Data Assimilation), planned for research and operations at several European institutions; and TOPAZ (Towards an Operational Prediction system for the North Atlantic European coastal Zones) system, operational exploitation has been transitioned to the Norwegian meteorological services.
Organization of the paper is as follows.Section 1 gives an overview of the GODAE assimilation systems.
Section 2 discusses the data assimilation methods, with practical aspects of implementation of the methods presented in section 3. Section 4 describes the observing systems assimilated in each system, and section 5 discusses error covariances, including specification of the background, observation and any multivariate aspects of the assimilation.Examples of the performance of the analysis systems along with some forecast validation results are given in section 6.Finally, we conclude in section 7 with a presentation on the future plans for each of the assimilation systems in the post-GODAE time period (beyond 2008).

System Overviews
a. BODAS.Bluelink is a partnership between CSIRO, the Bureau of Meteorology (BoM), and the Royal Australian Navy.The primary goal of Bluelink is to develop and implement an Australian capability in ocean forecasting and reanalysis.Initially, Bluelink has focused on developing this capability for the ocean circulation around Australia.This choice is reflected in the resolution of the Bluelink global ocean model that has 1/10° resolution south of 20°N and between 90°E and 180°E, with coarser resolution elsewhere.Developments are underway to expand this region of high resolution across the entire Indian and Pacific basins.The primary test bed for the Bluelink system is the Bluelink Reanalysis (BRAN), a 15-year data assimilating model run.Data from BRAN and other Bluelink model runs are freely available to registered users via an OpenDAP server (www.marine.csiro.au/ofam1/).The Bluelink system became operational at the BoM in August 2007 (www.bom.gov.au/oceanography/forecasts/).b.ECCO.The "Estimating the Circulation and Climate of the Ocean" (ECCO) is a consortium of efforts for understanding the climate state and changes of the ocean by assimilating diverse observations with general circulation models to obtain physically consistent estimates of the timevarying global ocean circulation.In many of its estimates, sources of model errors (boundary conditions, parameters, initial values, numerical errors, etc) are estimated along with the state of the ocean itself (temperature, salinity, currents, sea level, etc) consistent with the dynamical and physical principles embodied in the models such as conservation of volume, heat, fresh water (salt), etc.For instance, when temperature is modified by the assimilation, the underlying circulation including advection, mixing, and/or external forcing are also adjusted in space and time consistent with the particular temperature change.The resulting estimates thus permit explicit closure of temperature budgets and are conducive to identifying balances within the ocean and to exploring causality of the ocean's temporal evolution.
A series of ECCO products are available that have different foci and are summarized in Table 1.
Here we focus on describing the near real-time assimilation product that is primarily the work at JPL as part of the ECCO-GODAE investigation.Heimbach and Wunsch (2007) provide a complimentary description of the ECCO-GODAE adjoint assimilation carried out principally at MIT.The focus of the MIT product is to utilize all available data to understand the mean and decadal variability of the full-three dimensional circulation while the JPL analysis is directed mainly towards analyzing the seasonal-to-interannual variability in near real-time utilizing data with the largest impact for this purpose.

Product
Ver The ECCO ocean state estimations are primarily based on the MITgcm (MIT General Circulation Model; Marshall et al., 1997; http://mitgcm.org).The model for the near real-time analysis is nearly global (73°S~73°N) with a 1° grid resolution that telescopes meridionally to a 0.3° grid within 10° of the equator.There are 46 vertical levels with 10 m spacing within 150 m of the surface that gradually increase to 400 m spacing at depth.The model is forced by 12-hourly surface wind stress and daily heat and freshwater fluxes from 1980 to present based on the reanalysis products of the National Center for Environmental Prediction and the National Center for Atmospheric Research (NCEP-NCAR) (Kanamitsu et al., 2002) following a 10-year spin-up from rest.Analyses are extended monthly to present and are available through the link at http://www.ecco-group.org.
c. FOAM.The Forecasting Ocean Assimilation Model (FOAM) system run at the UK Met Office produces daily analyses and 5-day forecasts of the ocean and sea-ice variables.The data assimilation component of the system has been developed over a number of years and is based on a scheme with relatively simple methodology.A large amount of effort has been focussed on improving the details such as dealing with model and observation biases, quality control of the observations and specification of the error covariances.The scheme is run in the global FOAM system and also in high resolution regional configurations that are nested within the global system.
d. MERCATOR.The Mercator Océan assimilation strategy developed during GODAE is based on a hierarchy of methods of increasing sophistication including Optimal Interpolation, Kalman filtering and variational methods, which have been progressively deployed through the SAM (Système d'Assimilation Mercator Océan) series (Brasseur et al., 2005).This progressive approach was somewhat inspired by the meteorological approach for weather forecasting and more simply by pragmatism.It was also adopted to reach a compromise between the operational needs for highresolution products and the available computational resources.
As in most data assimilation development, the Mercator Océan assimilation methods find their roots in the theoretical framework of least-squares statistical estimation and optimal control.The first release, SAM-1 has been developed from an optimal interpolation scheme developed by De Mey and Benkiran [2002]; SAM-1 has been running on an operational real-time basis between 2001 and 2008.The multivariate version (noted SAM-1v2) performs a reduced-order OI by means of multivariate vertical EOFs (Empirical Orthogonal Functions of the barotropic stream function and vertical temperature and salinity profiles) computed from a prior model simulation.The update of the model state is then expressed as a sum of contributions from each EOF weighted by the product of the innovation and the Kalman gain in the reduced space.The SAM-1v2 version assimilating simultaneously in situ T/S profiles, sea-surface temperature and along-track altimetry data has been running operationally since January 2004 in the North Atlantic system at 1/3° and has been transitioned to the higher-resolution system (1/15°) between late 2005 and 2008.
By construction, the SAM-1 algorithm assumes that the correlations can be separated into a product of horizontal and vertical correlations.The concept of separability is related to the predominant role of stratification and the different scales involved horizontally and vertically in the open ocean.However, different scaling behaviors are seen near boundaries and coastal regions.These limitations motivated the development of the second release noted SAM-2, which does not require the hypothesis of horizontal/vertical separability.This more advanced scheme is considering a reduced order Kalman filter based on the SEEK filter method (Pham et al., 1998) where the error statistics is parameterized using an ensemble of 3D model anomalies.The SAM-2 assimilation system has been the focus of a major R&D effort during GODAE and was successfully transitioned to the operational Global (1/4°) and North Atlantic (1/12°) Mercator Océan system in 2008.This SAM-2 scheme will be described in more details afterwards.
The third release, SAM-3, is targeting variational approach such as 4D-VAR algorithm and is still in R&D mode.The implementation of SAM-3 should be based on the incremental variational assimilation system developed for the OPA-NEMO model (Weaver et al., 2003).First evaluations of the variational system are being conducted by comparing global re-analyses at low resolution (2°) obtained with SAM-2 and SAM-3 in the tropical regions.The outcome of these evaluations will provide some guidance to define the Mercator Océan assimilation strategy concerning SAM-3 in the coming years.The exploration of the feasibility of running variational assimilation algorithms with realistic basin-scale models at eddy-permitting and eddy-resolving resolutions is also work in progress.Hybrid strategies between the statistical and variational approaches are also explored in research mode, for example to reduce the computing cost of 4D-VAR by reducing the dimension of the control vector (Robert et al., 2005) or to improve the building of the 4D-VAR "B" matrix (Robert et al., 2006).(Usui et al., 2005).The aims of MOVE/MRI.COM-G are initialization of MRI coupled GCM for seasonal-interannual forecasting and analysis/reanalysis project, which is related to CLIVAR/GSOP project.The period of the analysis/reanalysis product is 1948 to 2007.The aims of MOVE/MRI.COM-NP and -WNP are initialization of ocean forecasting in the North pacific (esp.around Japan) and analysis/reanalysis for the period is 1993-2007, which is related to the GODAE project.On going and/or near future activities are developing 4D-VAR version, OSSE/OSE type sensitivity experiments with singular vector analysis, coupled atmosphereocean data assimilation system, ensemble initialization for seasonal to inter-annual prediction, and addition of a sea ice assimilation system.
f. NCODA.The U.S. Navy Coupled Ocean Data Assimilation (NCODA) is a multivariate optimum interpolation data assimilation system developed at the Naval Research Laboratory (Cummings, 2005).NCODA is operational at the Navy meteorology and oceanography centers: Fleet Numerical Meteorology and Oceanography Center and Naval Oceanographic Office.The analysis is used in a variety of ways in operations, supporting both global and regional applications.
NCODA is executed in two-dimensional mode to provide sea surface temperature (SST) and sea ice concentration lower boundary conditions for the Navy global and regional atmospheric forecast models.NCODA is also run in three-dimensional mode to provide a stand-alone ocean analysis capability, or it can be used as the analysis component in a sequential incremental, cycling data assimilation system with an ocean forecast model.In regional applications, NCODA performs simultaneous, multi-scale analyses on nested, successively higher resolution grids using a 3:1 nested grid ratio.This downscaling strategy is of particular importance in Navy applications where very high resolution is required in a rapid environmental assessment mode of operation.The regional system is globally re-locatable and will soon be a fully coupled air-ocean-wave system.
NCODA can be cycled with a variety of ocean forecast models, but the assimilative model-based results discussed here are from HYCOM (Hybrid Coordinate Ocean Model) running globally on a grid with an equatorial resolution of 1/12° (~9 km).The HYCOM grid is on a Mercator projection from 78.64°S to 47°N and north of this it employs an Arctic dipole patch where the poles are shifted over land to avoid a singularity at the North Pole.This gives a mid-latitude (polar) horizontal resolution of approximately 7 km (3.5 km), with 6.5 km global grid spacing on average.This resolution makes HYCOM eddy-resolving.Eddy-resolving models more accurately simulate western boundary currents and the associated mesoscale variability, and they maintain more accurate and sharper ocean fronts.In ocean prediction, eddy resolving is important for ocean model dynamical interpolation skill in data assimilation/nowcasting and in ocean forecasting, which is now feasible on time scales up to about a month (Hurlburt et al., 2008).HYCOM is run with a thermodynamic (energy loan) ice model that will soon include more physics as it is upgraded to PIPS (based on the Los Alamos CICE ice model).The global assimilative HYCOM development using NCODA has been supported by the National Ocean Partnership Program (NOPP) in a project specifically designed as the U.S. contribution to GODAE.

g. NEMOVAR.
NEMOVAR is an incremental variational data assimilation system for the NEMO (Nucleus for European Modeling of the Ocean) model (Madec 2008).The development of NEMOVAR was initiated in early 2006 as a collaborative project between CERFACS and ECMWF.
The NEMOVAR system inherits the basic incremental algorithmic structure of OPAVAR, the variational assimilation system developed at CERFACS for NEMO's predecessor, OPA8.2 (Madec et al. 1998).The OPAVAR system was initially developed for tropical Pacific applications (Weaver et al. 2003;Vialard et al. 2003;Ricci et al. 2005) and later extended to a global configuration in the European ENACT (ENhanced ocean data Assimilation and Climate PredicTion) project where it was applied to produce multi-decadal ocean analyses for seasonal hindcast initialization and studies of ocean climate variability (Davey et al. 2006).More recently, an ensemble version of the system was developed for the European ENSEMBLES (ENSEMBLE-based predictions of climate changes and their impactS) project to generate multiple ocean analyses in order to contribute to the production of probabilistic forecasts on seasonal to decadal time scales (Daget et al. 2008).
NEMOVAR is not as mature as OPAVAR in terms of algorithmic options and scientific assessment.NEMOVAR is, however, already a significant improvement over OPAVAR in terms of computer performance, code structure, different global resolutions, and the ability to use real-time data-sets.Moreover, whereas OPAVAR was developed on a static and now out-dated version of the ocean model, NEMOVAR is designed to benefit from continual improvements to the NEMO model made by the ocean modeling community.In this paper, we describe the current status of the NEMOVAR system, illustrate its basic performance, and discuss future developments that are ongoing or planned in collaboration with other research institutes and operational centers.
h. TOPAZ.The TOPAZ system runs a North Atlantic and Arctic configuration of the HYCOM model with an ensemble Kalman Filter (EnKF).Since January 2003, TOPAZ runs in real-time on a weekly basis.TOPAZ was built upon the developments of the DIADEM system (Brusdal et al. 2003) and then continued within the European funded TOPAZ and MERSEA projects.The TOPAZ system has been adapted to the Pacific Ocean (Wan et al. 2008).

Assimilation Method
a. BODAS.The Bluelink Ocean Data Assimilation System (BODAS) is a sequential data assimilation system that uses multivariate Ensemble Optimal Interpolation (EnOI; Oke et al., 2002;Evensen 2003).EnOI is a simplified version of the Ensemble Kalman Filter (EnKF; Evensen, 2003) based on a stationary ensemble of modeled anomalies of all prognostic variables (T, S, U, V and sea-level) to approximate background error covariances (BECs).For a complete description of BODAS the reader is referred to Oke et al. (2008).
b. ECCO.The near real-time assimilation employs an approximate Kalman filter and Rauch-Tung-Striebel (RTS) smoother, which are fundamentally least-squares inversions of observations and model, respectively (e.g., Fukumori, 2006).The particular approximations reduce the computational requirements of the estimation that pertain to the derivation of the model state error covariance matrix.These approximations include evaluating independent errors separately from one another (partitioning; Fukumori, 2002), estimating only the most dominant modes of the errors instead of all inaccuracies (state reduction; Fukumori and Malanotte-Rizzoli, 1995), and deriving and utilizing the asymptotic limit of the time-evolving error covariances (Fukumori et al., 1992).
c. FOAM.The assimilation scheme used in FOAM is based on the Analysis Correction (AC) method introduced by Lorenc et al. (1991), which provides an efficient means of calculating the solution to the Optimal Interpolation analysis equation using an iterative procedure.The iteration is initialized by setting 0 , = y y , and the th ν iterate is obtained using )), for 0,1, , 1 N ν = K − .The analysis is then set equal to the last iterate: .
The matrix multiplication ( ( )) x is the most costly aspect of this procedure.It has been shown by Lorenc (1992) that successive applications of a recursive filter provide a good approximation to this matrix multiplication when Gaussian or second order autoregressive (SOAR) functions are used to approximate the off-diagonal elements of .This application of the filter is used in FOAM to approximate the effect of SOAR correlation functions using the covariance information described later in this paper.

B
The above method is designed to work at one particular 'analysis' time at which the background model field is valid.The model counterparts of the observations which have become available during the period since the last analysis are calculated at the correct time during the model integration in a 'first-guess-at-appropriate-time' (FGAT) scheme.The number of new observations which become available during one analysis period is small so FOAM uses a scheme which allows the information from previous analysis cycles to be properly included in the current analysis.This is done by identifying observations which have previously been assimilated and setting their observational increments to zero.The effect of this is to give the background field at previously assimilated observations' locations weight in the current analysis which effectively updates the background error covariance without explicitly having to change the matrix.The scheme is described in detail in Martin et al. (2007).

B
The result of the assimilation is a set of three-dimensional updates to the model prognostic fields.To reduce the shock to the model the temperature, salinity and velocity increments are applied slowly during the next day of the forecast using a method known as incremental analysis updates (IAU) (Bloom et al. 1996).
d. MERCATOR.The SAM-2 algorithm is inherited from the analysis scheme of the SEEK filter, which is a reduced-order Kalman filter introduced by Pham et al. [1998] in the context of ocean circulation models.It is a sequential method where the analysis step of the conventional Kalman Filter is reformulated to take advantage of a low-rank approximation, leading to more efficient inversions of the data in the reduced space than in observation space.The error statistics of the SEEK filter are represented in a sub-space spanned by a small number of dominant error directions.The propagation of information of the data from observed to non-observed variables is performed along the directions represented by these error modes which connect all dynamical variables and grid points of the numerical domain.Several strategies can be adopted to initialize the vectors of the reduced basis.A method involving the computation of the error modes obtained from prior simulations is used to built the error statistics of the SAM-2 system (see afterwards); this approach leads to corrections of the model trajectory that are multivariate and consistent with the dynamics of primitive-equation models.
e. MOVE/MRI.MOVE/MRI.COM systems are composed of OGCMs and a variational analysis scheme which synthesizes the observed information (i.e., temperature, salinity and sea surface height) together with the OGCMs.The numerical code for the OGCMs used in the MOVE/MRI.COM system is the MRI community ocean model (MRI.COM).MRI.COM has been developed in JMA/MRI and is independent of any other popular OGCM code.It is a multilevel model code that solves the primitive equations under the hydrostatic and Boussinesq approximations.The vertical coordinate is a terrain following-depth (σ−z) hybrid, i.e., the levels near the surface follow the surface topography.It enables us to adopt a fine vertical resolution near the surface because it prevents the uppermost layer from vanishing during integration when the free surface variation is explicitly solved.For momentum advection, MRI.COM uses the generalized enstrophy-preserving scheme along with the Takano-Oonishi scheme, which contains the concept of diagonally upward/downward mass momentum fluxes along a sloping bottom (Ishizaki and Motoi, 1999).Detailed model structure and numerical schemes are reported by Ishikawa et al. (2005) and Tsujino et al. (2006).The OGCM used in MOVE/MRI.COM-G is a model with a global domain (model G) for climate variability.On the other hand, for ocean weather, MOVE/MRI.COM employs two models, namely the North Pacific and western North Pacific models (models NP and WNP).Model WNP is nested into model NP (one-way nesting).The model domain of the model G extends from 75°S to 75°N globally.The grid spacing in the zonal direction is 1° and that in the meridional direction is 0.3° within 5°S-5°N, and 1° poleward of 15°S and 15°N.There are 50 levels in vertical.The domain of the model WNP extends from 15°N to 65°N, and 117 °E to 160°W, with a grid spacing of 1/10° × 1/10° around Japan.This model is nested into model NP, whose model region is from 15°S to 65°N, and 100°E to 75°W with a grid spacing of 1/2°x1/2°.NP and WNP have the same vertical grid spacing (54 levels).
The analysis fields for models G, NP, and WNP are calculated separately.The analysis scheme adopted in the MOVE/MRI.COM system is a multivariate 3DVAR analysis scheme with vertical coupled T-S EOF modal decomposition of a background error covariance matrix.The scheme is based on Fujii and Kamachi, 2003a, 2003cand Fujii et al., 2005.The amplitudes of the coupled EOF modes are employed as control variables and the analyzed temperature and salinity fields are represented by the linear combination of the EOF modes in the scheme.A preconditioned optimizing utility for large-dimension analysis (POpULar; Fujii andKamachi, 2003b andFujii, 2005) is developed and applied for minimizing the nonlinear cost function as the descent method.This scheme can minimize a cost function including a constraint of the background without inversion of the background error covariance matrix, even if the function is nonlinear.It is useful for handling the correlation among background errors.The regions of the models G, NP, and WNP are divided into 40, 12 and 13 subregions, respectively.EOF modes are calculated in each subregion for each model from world ocean database 2001 (WOD2001), as well as the representativeness error covariance matrix, according to Fujii and Kamachi (2003b).We retain 12 dominant modes in each subregion.In fact, more than 85% of the total variance can be explained by the dominant 12 modes although this estimate will differ from one in a different subregion.The Gaussian function is adopted as the horizontal correlation model applied in the background covariance matrix B. The efolding scales along latitude and longitude lines are also different in different subregions and are decided from Kuragano and Kamachi (2000).The model temperature and salinity fields are corrected by the analysis result through the incremental analysis updates (IAU) technique (Bloom et al., 1996).The assimilation period is 1/3 month.MOVE/MRI.COM systems are composed of OGCMs and a variational analysis scheme which synthesizes the observed information (i.e., temperature, salinity and sea surface height) together with the OGCMs.The numerical code for the OGCMs used in the MOVE/MRI.COM system is the MRI community ocean model (MRI.COM).MRI.COM has been developed in JMA/MRI and is independent of any other popular OGCM code (Ishikawa et al. 2005, Tsujino et al., 2006).It is a multilevel model code that solves the primitive equations under the hydrostatic and Boussinesq approximations.The vertical coordinate is a terrain following-depth (σ−z) hybrid, i.e., the levels near the surface follow the surface topography.It enables us to adopt a fine vertical resolution near the surface because it prevents the uppermost layer from vanishing during integration when the free surface variation is explicitly solved.For momentum advection, MRI.COM uses the generalized enstrophy-preserving scheme along with the Takano-Oonishi scheme, which contains the concept of diagonally upward/downward mass momentum fluxes along a sloping bottom (Ishizaki and Motoi, 1999).
f. NCODA.The method used in NCODA is an oceanographic implementation of the multivariate optimum interpolation (MVOI) technique widely used in numerical weather prediction systems (Daley 1991).The analysis variables are temperature, salinity, geopotential (dynamic height), and velocity, which are analyzed simultaneously in three dimensions.The horizontal correlations are multivariate in geopotential and velocity, thereby permitting adjustments to the mass fields to be correlated with adjustments to the flow fields.The velocity adjustments (or increments) are in geostrophic balance with the geopotential increments, which, in turn, are in hydrostatic agreement with the temperature and salinity increments.The mass-velocity multivariate relationship is parameterized to relax geostrophic coupling near the equator and in shallow water and also to allow flow divergence of the velocity increments to be weak or strong, depending on depth.The analysis background fields, or first guess, are generated from a short-term ocean model forecast at a userspecified update cycle interval.NCODA computes balanced corrections to the first-guess fields using all of the synoptic temperature, salinity and velocity observations that have become available since the last analysis was made and are within the geographic and time domains of the forecast model grid and update cycle interval.
The MVOI problem in NCODA is formulated as: (2.2) where x a is the analysis state vector, x b is the background state vector, B is the background error covariance matrix, H is the forward operator, R is the observation error covariance matrix, and y is the observation vector.Observations can be assimilated at their measurement times within the update cycle time window by comparison against time dependent background fields using the first guess at appropriate time (FGAT) method.An advantage of the FGAT method is that it eliminates a component of mean analysis error that occurs when comparing observations and forecasts not valid at the same time.
The increments are assimilated into HYCOM evenly over the first six hours of the forecast using the Incremental Analysis Update (IAU) approach (Bloom et al. 1996).The IAU procedure is not applied because increments from the analysis are unbalanced, but rather to protect the HYCOM routine that generates the hybrid layer structure at each model time step.This routine is sensitive and can produce spurious layer structure when presented with large (but realistic) analysis increments.

g. NEMOVAR.
The current version of NEMOVAR employs a variant of the multivariate incremental 3D-Var method described in Weaver et al. (2003), Vialard et al. (2003) and Ricci et al. (2005).Let x denote the vector of model state variables (temperature (T), salinity (S), horizontal velocity (u,v) and sea-surface height (SSH)) on the 3D model grid.Let b x denote a background estimate of x , and let x δ be an increment defined such that y distributed over a time window t 0 ≤ t i ≤ t N , the 3D-Var produces an analysis increment a x δ by approximately minimizing the quadratic cost function is the background-error covariance matrix, and R is the observation-error covariance matrix.The vector is the background state at time t i , which is obtained by integrating the ocean model from t 0 to t i from the background initial condition .The operators denote interpolation operators (bilinear in the horizontal; cubic spline in the vertical) from model grid points to observation points.For data available as a time-average, the observation operator also includes an appropriate time-averaging of background states.Note that operators in the observation matrix act on the same increment δ in Eq. ( 2.3), whereas to calculate the operators act on the different background states in the computation of the innovation vector .This property of comparing the observation with the background state at the appropriate measurement time is usually referred to as FGAT for First-Guess at Appropriate Time.
The increment x δ and background-error covariance matrix are formally defined with respect .In 3D-Var, can be chosen from any background state within the assimilation window although it is conventional to choose it from either the middle or start of the window (the specific choice is a user-defined option in NEMOVAR).Taking it to be in the middle of the window ( ) can minimize the effects of the approximations in 3D-Var, although these approximations are reduced to some extent using FGAT.Taking it to be at the start of the window ( ) can simplify the technical implementation of 3D-Var in systems that also support incremental 4D-Var.The latter also provides a useful interpretation of 3D-Var as a limiting case of incremental 4D-Var in which the tangent-linear operator that propagates the increment in 4D-Var is replaced by the identity operator in 3D-Var.In NEMOVAR, the analysis increment is introduced in the model via direct initialization of the background state or via Incremental Analysis Updates (IAU) to reduce spurious adjustment processes (Bloom et al. 1996).With IAU, the increment is applied through an additional forcing term in the model equations.Different IAU weights can be chosen and the forcing period can be an arbitrary sub-window of the main assimilation window to control the adjustment time of the model to the increment.Typically the increments are spread evenly over the full assimilation window.
h. TOPAZ.The EnKF derives flow-dependent forecast error statistics from a Monte-Carlo forward model integration that makes it well suited to strongly non-linear dynamics (Evensen, 1994(Evensen, , 2006)).The update is a multivariate least-square estimation as in the Kalman Filter.One advantage of the EnKF is that it is relatively non-intrusive in the dynamical model code, which made it also popular outside of the oceanographic community.The EnKF source code and its documentation are provided on http://enkf.nersc.no.The TOPAZ system is the first operational oceanography forecast system based on flow dependent forecast error covariances.

Practical Aspects of Implementation
a. BODAS.The ensemble-based covariances used by BODAS are localized using an isotropic quasi-Gaussian function that reduces the covariances to exactly zero over 8 o of longitude/latitude.Localization significantly increases the rank of the ensemble, enabling the EnOI system to produce global analysis fields that adequately fit the observations.Localization also reduces the sampling error of distant-covariances, but it also has the negative impact of degrading the dynamical balances of the analysis fields (Oke et al. 2007).
For each application of BODAS, the model domain is divided into approximately 500 sub-domains.Analysis fields are computed for each sub-domain independently and in parallel.For each subdomain, observations are assimilated within that domain, plus a surrounding halo region.The size of the halo region is chosen to match that of the localizing function.This implementation means that analyses in adjoining sub-domains match seamlessly.Using this approach, each observation is used several times.BODAS is therefore very computationally expensive.We reduce the number of observations directly assimilated by combining observations of each type into super-observations.The spatial distribution of the super-observations reflects that of the model grid -with more superobservations in the high resolution region of the model domain.
In the assimilation BODAS uses an observation window of up to 11-days.This yields approximately global coverage for all data sets (altimetry, SST and Argo).This imposes an important constraint on the EnOI system.For experiments using a shorter observation window -and therefore sparser data coverage -the EnOI system sometimes generates unrealistic analysis increments in regions of poor data coverage.
Initialization is an area of active research under Bluelink.To date, we have tried three different approaches to initialization.We started with an explicit update to the full model state (T, S, U, V and sea-level) in a single time step (Oke et al. 2005).The dynamical imbalance of the analysis fields results in some significant shocks to the model that subsequently degrades the model forecast.Subsequently, we used a simple nudging approach, where we nudge the model T, S and sea-level field to the analyzed fields over a 1-day period.This approach significantly reduces the shocks, but results in a less optimal system because not enough of the analysis information is retained by the model (Oke et al. 2008).The latest experiments use incremental analysis updating (IAU; Bloom et al. 1996).This approach holds the most promise of those considered.We are currently exploring the sensitivity of the system to different implementations of IAU. b.ECCO.The Kalman filter/RTS smoother approximations presently utilized in the near-real time ECCO estimates are configured to estimate adiabatic model errors associated with large-scale inaccuracies in time-variable wind forcing.Corresponding errors in the model state are represented by corrections in vertical displacements and horizontal velocities.State reduction is achieved by expanding these errors vertically in terms of the gravest five baroclinic dynamic modes and the barotropic mode and horizontally on grids coarser than that of the model.Taking advantage of differences in temporal and spatial scales, the barotropic errors are estimated separately from the baroclinic errors.The barotropic errors are estimated globally on a 6° grid whereas the baroclinic errors are estimated on a 5°-by-3° grid defined over seven different overlapping regions that cover separate basins (Figure 3.1).Estimates on the coarse grid are interpolated to the model's grid using an objective mapping procedure (Bretherton et al., 1976) based on distances over the ocean (i.e., around land, if any).Observations that are assimilated are averaged in space and/or time prior to assimilation commensurate with the filter's reduced grid dimension (Section 4b).The assimilation is conducted every 6-hours utilizing all observations that are made within 3-hours of the particular instant, approximating that these measurements are coincident in time.c.FOAM.The assimilation scheme described in the previous section assumes that there are no systematic errors in either the model or the observations.In practice this assumption is not valid and the systematic errors in both the model and observations can cause significant problems for the assimilation scheme.An example of problems caused by systematic model errors is in the tropical oceans where the main large-scale balance is between the surface wind stress and the subsurface pressure gradients.The assimilation can disrupt this balance and cause spurious circulation cells to be set off by the assimilation which tend to oppose the increments being put in by the assimilation.Bell et al (2004) describe the method used in FOAM to overcome these problems by including a correction to the pressure gradient term, calculated using information from the model and observation differences.
A comprehensive automatic quality control system is used in FOAM, as described by Ingleby and Huddleston (2006).All data types are quality controlled against the current model forecast using a Bayesian procedure which checks the compatibility of the observations and model forecast, given a priori error statistics for the observations and model fields.For profile data, various extra checks are made, including track checks, spike checks, stability checks, duplicate checks and buddy checks.d.MERCATOR.The assimilation sequence is based on a 7-day assimilation cycle where the innovation is calculated during the model integration using the First Guess at Appropriate Time (FGAT) approximation.The analysis step provides a statistical correction of the 3D state vector (TEM, SAL, U, V), which is added to the forecast state after physical adjustment.In order to take into account the complexity of a realistic model, a number of additional features have been integrated into the SAM-2 analysis.
First, the algorithm is implemented following a local assimilation scheme.The practical motivation is to suppress spurious correlations with distant variables that may appear as a result of the truncation of the error modes.This feature is implemented in SAM-2 by assuming that distant observations have negligible influence on the analysis.In practice, the global system is split into sub-systems, and for each of these, the traditional analysis is computed.The concept of "influence bubble" (i.e.limited geometrical domain defined by means of zonal and meridional radii) is introduced as follows: only data points located within an individual bubble, centered on a subdomain of one or several grid points to be updated, actually contribute to the gain.In this way, the influence radii define an ellipse within which the signal is significantly correlated, while signals outside the ellipse are assumed to be uncorrelated.The influence radii are calculated a priori from SLA (Sea Level Anomaly) satellite observations, and a digital filter is used to remove the seasonal cycle.The typical size of the influence regions in the Mercator Océan systems extend from about 200 to 500 km, i.e. several Rossby radii of deformation at mid-latitudes.
Unlike the original SEEK filter, the SAM-2 scheme which is considered in the Mercator Océan systems does not evolve the error statistics according to the model dynamics [Ballabrera-Poy et al., 2001].This would require prohibitive costs given the size of the operational systems.In order to compensate for this approximation, an adaptive scheme has been implemented which modulates the amplitude of the background error variance locally.At each assimilation cycle, the background error is adjusted in such a way as to maintain the innovation sequence and the background and observation error covariances in statistical equilibrium [Testut et al., 2003].This adaptive parameterization is shown to preserve the consistency of the statistical scheme, and thereby avoid divergence of the assimilated model trajectory which could be obtained with "fixed" background error.
e. MOVE/MRI.Observation data assimilated are given through GTS transmission system for the operation in JMA.JMA traditional QC system is applied to the observation data before analysis.MOVE system uses anisotropic and inhomogeneous horizontal Gaussian decorrelation scales in the background error variance/covariance matrix (B).The B matrix is also decomposed by the vertical coupled T-S EOF modes (see also section 2).The modes are calculated in each subdomains (see section 2) from the historical data WOD01 and other data obtained in operational agencies.Representativeness error is calculated from a aprt from B matrix.Dynamic QC (or variational QC) is also adopted in the iterative process in the 3DVAR scheme (Fujii et al., 2005).A preconditioned optimizing utility for large-dimension analysis (POpULar; Fujii andKamachi, 2003b andFujii, 2005) is developed and applied for minimizing the nonlinear cost function.
f. NCODA.Solution of Eq. (2.2) is done via the overlapping volume approach of Lorenc (1981), with some new capabilities to allow the analysis to interface with any grid.A total of 8 volume solutions are computed for each analysis grid point.The different solutions are weighted by grid point distance from volume center when forming the final analysis estimate.Volume size is a function of the local correlation length scale at volume center, and includes 8 correlation length scales if the number of observations in the volume does not exceed a threshold value (>7200), in which case the volume is subdivided.This volume subdivision rarely occurs in practice.The combination of overlapping volumes, and a large number of correlation length scales within a volume, produces analysis increments that are very smooth with no seams along volume edges, and reduces the departure from geostrophy that occurs when interpolating different solutions.
The global HYCOM gird consists of 4500x3298x42 analysis grid points.The size of the model state vector presents practical problems when solving the analysis equation on distributed memory machines using the overlapping volume approach.To avoid memory limitations, the global grid is partitioned into 8 subdomains, which are solved simultaneously in separate jobs.
Late arriving satellite altimeter SSH observations are another problem when running in real-time.In order to have synoptic altimeter observations in the analysis, a five-day hindcast of the HYCOM/NCODA system is run each day using a 24-hr analysis update cycle.In the hindcast, a NCODA analysis is performed and HYCOM is run for 24 hours with the IAU procedure applied during the first 6 hours of the forecast.This cycle is repeated for five days up to the nowcast time when HYCOM continues to run in forecast mode out to 120 hours.Examination of the timeliness of the altimeter data has shown that nearly complete altimeter SSH data are obtained 5 days into the past, with an added bonus that orbits also improve with older SSH data for some satellite altimeter missions.Using 379 IBM Power 5+ processors, the 5-day hindcast and 5-day forecast run takes about 14.6 wall-clock hours.g.NEMOVAR.An important feature of NEMOVAR is its capacity to run on distributed-memory machines.In the chosen parallelization strategy, model vectors are distributed onto processors assigned to specific model sub-domains and message passing is used to communicate between horizontal sub-domains (e.g.halo exchanges and global sums) and observation vectors are distributed evenly amongst processors and message passing is used to retrieve the appropriate 2 x 2 grid-point stencil needed for bilinear interpolation.This strategy avoids processor load imbalance which would otherwise occur when geographically clustered observations are distributed using a domain-decomposition approach.
The cost function is minimized iteratively using a conjugate gradient (CG) algorithm (Tshimanga et al. 2008).To improve the convergence properties of the minimization, a preconditioning transformation is employed in Eq. ( 1), where is an invertible matrix defined such that .The minimization problem is first solved for and then the solution is transformed to a model increment using .The adjoint operators and are needed to compute the gradient of the cost function on each iteration of the CG algorithm.For the ORCA1 global ocean configuration (42 levels; 1 o x 1 o , except near the equator where the meridional resolution is increased to 1/3 o ), the computational cost of the minimization (40 CG iterations) is roughly twice the cost of a 10-day ocean model forecast, using 16 MPI tasks.h.TOPAZ.The state vector in TOPAZ contains all of the HYCOM baroclinic and barotropic prognostic variables on the native grid, including the layer thickness in the isopycnal domain of the oceans.The sea-ice concentration and thickness are also in the state vector.The inclusion of all prognostic variables is required for the physical consistency of the ensemble update (Evensen, 2003).The length of the model state vector is 81 million.The number of members is 100, which is small considering the state dimension but has still proven useful thanks to localization.Using smaller ensembles can be problematic (Natvik and Evensen 2003).Localization is applied for the sake of efficiency: the 50 nearest observations (resp.profiles) are selected within a radius of 700 km for satellite data (1000 km for Argo data).No incremental updating has been necessary, indicating that the updates are sufficiently consistent.The EnKF update is implemented in MPI-parallel on 8 CPUs and uses one CPU hour for each data type on an IBM P575+.Note that the CPU costs of the analysis are only 1% of those used by the ensemble run for providing the flow-dependent error covariances (1.600 CPU hours per week).The systems processes one week back in analysis mode and provides 10 days forecasts.The forcing fields are provided by the ECMWF.

Observing systems assimilated
a. BODAS.BODAS assimilates observations from a range of different sources.SLA observations are obtained from all available satellite altimeters (ERS, GFO, Topex/Poseidon, Envisat and Jason) and from 57 coastal tide gauges around Australia.In situ temperature and salinity observations are obtained from Argo floats, from the Tropical Atmosphere-Ocean (TAO) array, and from CTD and XBT (temperature only) surveys from a variety of different field surveys, including WOCE, the Indian Ocean Thermal Archive (IOTA) and others.The XBT and CTD profiles (excluding Argo) are sourced from the Bluelink Ocean Archive, a quality-controlled data set used to construct CARS2000 (CSIRO Atlas of Regional Seas; Ridgway et al., 2002).For reanalyzes, SST observations are obtained from Pathfinder and AMSR-E satellites; and for the operational Bluelink forecast system, SST observations are obtained from AMSR-E and will soon include observations from AVHRR satellites.

b. ECCO.
Temporal anomalies of sea level and in situ temperature profiles relative to their respective time-means are assimilated in the estimates.Time-mean errors of the model are not corrected as, in its elemental form, Kalman filters assume temporally uncorrelated model error sources.
The sea level observations consist of satellite altimeter measurements of TOPEX/POSEIDON and Jason-1 from 1993 to 2002 and from 2002 to present, respectively.Temperature profiles include those from expendable bathythermographs (XBTs), conductivity-temperature-depth sensors (CTDs), Argo profiling floats, and the Tropical Atmosphere Ocean (TAO) array.To minimize computational requirements, the data are reduced in resolution that is commensurate with the effective resolutions of the model and assimilation (Figure 3.1).Altimetric sea level measurements are binned along the satellite's ground track to 2° wide within 20° of the equator and 6° wide elsewhere that are offset by half a bin-width between ascending and descending ground-tracks, providing an effective latitudinal resolution of 1° and 3° in the tropics and extra-tropics.Temperature measurements are averaged in 10-day and 3° bins and are sub-sampled at five depths in the upper ocean.c.FOAM.The observation groups which are assimilated in FOAM are sea surface temperature, sea surface height, temperature and salinity profiles, and sea-ice data.Each observation group has time correlation scales associated with it.These are specified to be 5 days for the surface observations, i.e.SST, SSH and sea-ice, and 10 days for the temperature and salinity profile data.These time correlations are included by linearly increasing the error variances associated with each observation according to the time since the observation reported.
Sea surface temperature.In situ measurements from moored and drifting buoys, ships and other platforms are assimilated into the FOAM system.These data are obtained in real-time over the Global Telecommunications System (GTS).Various types of satellite data from the GODAE high resolution SST (GHRSST) project are also assimilated.These include data from the AATSR, AMSRE and AVHRR (from NOAA and MetOp satellites) instruments.SST data from these different platforms contain biases for various reasons.These biases are treated in FOAM using the in situ data and data from the AATSR satellite as reference data (assumed to be unbiased) and calculating a bias field from match-ups between these reference data and each satellite observation type in turn.This method is described in more detail in Stark et al. (2007).
The analysis increments produced by the two-dimensional analysis of SST are applied throughout the mixed layer of the ocean as diagnosed by the model.The depth to which these increments are applied is limited to 660m in order to prevent increments being applied at depth in regions of deep convection where the vertical correlations are likely to be less reliable.
Temperature and salinity profile data.All in situ temperature and salinity profile data which are available in near real time on the GTS are assimilated into FOAM.These include the large amount of Argo temperature and salinity profiling data which are now the dominant source of in situ profile data.Other CTD measurements are also assimilated, together with the moored buoy data from the TAO/TRITON and PIRATA arrays, and temperature data from XBTs.
A three-dimensional analysis is produced separately for the temperature profile and salinity profile data.In each case, the analysis is obtained by assuming the solution can be separated into vertical and horizontal components.In the iteration, a vertical analysis is performed at observation locations prior to a horizontal analysis at each vertical level.The vertical correlations take the same form as the horizontal correlations described later in this paper.
Altimeter sea surface height data.The FOAM system assimilates the along-track sea level anomaly (SLA) altimeter data processed in near real-time by CLS.These currently consist of data from the Jason-1, Envisat and Geosat Follow-On satellite instruments (and will include Jason-2 when it becomes available).The data is quality controlled during the CLS processing, but an additional background check is performed before assimilation into FOAM using the method described in section 2 above.The SLA observations are provided as anomalies relative to a time-mean field and so a reference mean dynamic topography (MDT) needs to be added back to the SLA data in order to compare the observations to the model fields.In FOAM, the MDT of Rio (2005) is used.The use of an external MDT can introduce systematic errors in the observations (SLA+MDT).A method which calculates and removes these biases on-line has been implemented in FOAM as described by Lea et al. (2008).The altimeter data are assimilated using a two-stage approach.Firstly a horizontal analysis is performed using the Analysis Correction scheme described above.This results in a twodimensional field of SSH increments, which is split into a part due to errors in the mesoscale ocean dynamics and a part due to larger scale errors in the model, calculated as part of the analysis using the error covariance information described later.These two components of the increments are used differently.The part due to the mesoscale errors is projected onto the subsurface temperature and salinity using a modified version of the scheme introduced by Cooper and Haines (1996).The part of the SSH increments which is due to the larger scale errors is used to update the SSH field, but no subsurface changes are made using this component.
Sea-ice concentration data.The sea-ice concentration data assimilated in FOAM are obtained from the EUMETSAT Ocean & Sea Ice Satellite Application Facility, which is based on SSM/I data.This data is compared with the sea-ice model's concentration field to produce a two-dimensional field of sea-ice concentration increments.The distribution of the ice concentration increments onto the ice thickness categories could be done in a number ways.In FOAM, a fraction of the thinnest ice in each grid box is removed when the analysis increment is negative.Where the analysis increment is positive, new ice is added by adjusting the ice thickness distribution to represent the addition of 10cm thick ice.Analysis increments smaller than 1% per day on ice free grid points are likely due to numerical artifacts and these are removed.Changes to the ice concentration are balanced by associated changes to the ocean surface salinity to conserve total salt.No direct changes are made to the ocean temperature by the ice concentration assimilation since we are unable to determine a priori whether the heat fluxes should be exchanged with ocean or atmosphere.This scheme is described in detail in Stark et al. (2008).
d. MERCATOR.The SAM-2 scheme has the capability to assimilate a broad variety of observations.The Mercator Océan operational systems assimilate the same baseline of observation datasets: along-track sea level anomalies from JASON-1, ERS-2/ENVISAT and GFO satellites coming from SSALTO/DUACS; vertical T/S profiles (at depths varying from a few tens of meter to 2000 m in some cases) measured by ARGO floats, XBT/CTDs, moorings or buoys as provided by the CORIOLIS in situ data assembly centre; and Real Time Global SST analysis product (RTG_SST) produced daily by the NOAA/NCEP.The synthetic Mean Dynamic Topography produced by Rio et al. [2006] is taken as a reference level for the assimilation of Sea Surface Height (SSH) data.The observation error matrix in the assimilation system is diagonal, and a representativeness error depending on the data type is added to the prescribed measurement errors.
e. MOVE/MRI.Temperature, salinity and along-track SSH observations are employed in the analysis.The temperature and salinity observations were collected from WOD2001 and the global temperature-salinity profile program (GTSPP) database that contains ship hydrography, XBT, drifting buoy, Argo float and tropical array systems such as TAO/Triton.We also adopted the along track SSH anomaly data of TOPEX/Poseidon (T/P), Jason-1, ERS-1/2, ENVISAT, which are extracted from the SSALTO/DUACS delayed time multi-mission altimeter products, after adding it to the mean SDH calculated from a preliminary analysis using temperature and salinity observations alone.Real time global merged SST analysis MGDSST (i.e., Japan GHRSST) combined ship and satellite data by JMA is also adopted.f.NCODA.NCODA makes full use of all sources of operational ocean observations.The ocean observing systems currently assimilated include sea surface temperature from satellite and in situ sources (ships and buoys), sea ice concentration from SSM/I, sea surface height anomalies from satellite altimeters, and temperature and salinity profiles from Argo, CTD casts, XBT drops, fixed buoys and thermistor chained drifting buoys.New data sources are continually being added to the analysis, such as threaded data from ocean gliders, which provide continuous profiles of temperature and salinity during both the descending and ascending segments of the glider dive.
Altimeter SSH anomalies are assimilated using synthetic profiles based on a modified form of the Cooper-Haines technique (Cooper and Haines, 1996).The modifications include restricting the depth range over which the temperature and salinity innovations are computed from below the mixed layer to a level of no motion defined at 2000 m.SST, valid at the analysis time, and a static stability test are used to correct mixed layer temperatures and salinities in the synthetic profiles, while maintaining model mixed layer density gradients.The synthetic profile temperature and salinity innovations are added to the real data in the assimilation as a distinct observing system with unique error characteristics.While having the potential of adding important information in datasparse areas, the number of altimeter-derived synthetic profiles can greatly exceed and overwhelm the relatively sparse in situ profile observations in the analysis.Accordingly, synthetic profiles are generated and applied only when: (1) the mismatch in model predicted and altimeter measured SSH exceed the measurement error of the altimeters (~3 cm); (2) the Brunt-Väisälä frequency indicates the model background is well stratified; and (3) there are no in situ profile observations in the vicinity.In this latter case, in situ measurements of temperature and salinity are considered to be a more reliable source of subsurface information, and are used preferentially in the assimilation wherever such observations exist.
NCODA is tightly integrated with a fully automated ocean data quality control (QC) system (Cummings, 2006).All observations are passed through the QC prior to assimilation.A sequence of sensibility checks are performed first followed by a series of gross error procedures that include background field checks and cross validation analyses.A complex QC process is used in that all checks are performed before a final QC decision is made to accept, reject, or schedule the observation for manual intervention.The QC outcomes are used in the analysis when selecting observations for the assimilation.Within the analysis itself an innovation check is performed in the more difficult process of identifying measurements that fall within valid and reasonable ranges, but nevertheless are erroneous.This procedure uses the diagonal of the background error covariance matrix and prescribed tolerance limits to remove observations that are unacceptable to the analysis background.The tolerance limits are equivalent to the number of acceptable standard deviations of the innovation from the analysis background.
It is not unusual for the global HYCOM/NCODA system to assimilate more than a million observations in a 24-hr update cycle interval.Accordingly, extensive pre-processing of the observations is performed to remove redundancies in the data and minimize horizontal correlations among the observations.The algorithm used to thin the data in the analysis is adaptive in that as the model grid resolution increases across the global HYCOM domain the actual number of innovations combined in the data thinning decrease until, eventually, the original data are directly assimilated.Care is taken to not combine observations from different water masses and observing systems with different error characteristics in the data thinning process.
g. NEMOVAR.The observation types currently used in NEMOVAR are temperature and salinity profiles, along-track SSH anomalies from altimeters, and model-gridded sea surface temperature (SST).In NEMOVAR it is possible to use the data, or a subset of the data, either for assimilation or for diagnostic purposes such as data monitoring or model validation.Initial testing of the NEMOVAR system has focused on the use of delayed-mode data products.Profile data are taken from the 1950-2006 ENACT/ENSEMBLES quality-controlled database (EN3v1c; Ingleby and Huddleston 2007) and contain data primarily from the World Ocean Database 2005, supplemented with data from the World Marine Environmental Laboratory and the Global Temperature Salinity Profile Program.The data-set is essentially composed of bathythermographs (MBTs and XBTs), hydrographic profiles (CTDs and predecessors), moored buoys from the TAO/TRITON and PIRATA arrays, profiling floats and Argo data.SSH anomalies are taken from the multiple satellite (Geosat, T/P, Jason-1, ERS-1, ERS-2, GFO) database from CLS/AVISO.The SSH anomalies are referenced to a gridded Mean Dynamic Topography derived either from the model (typically from an experiment that assimilates in situ data only) or from an external product (Rio-05) that makes use of gravity measurements from GRACE.SST information is provided from daily-interpolated Reynolds analyses (OIv2; Reynolds et al. 2002).The SST analyses can be assimilated through the cost function/IAU-procedure or they can be "nudged" in via a surface relaxation term.
h. TOPAZ.Weekly, 1/3° resolution grid, sea level anomaly maps merged from different satellite altimeter missions (ERS 2, Jason 1, Geosat Follow-On) from the Aviso products at CLS, France are assimilated in TOPAZ.The mean sea-surface is taken from a long integration of the model.Weekly, 1° resolution grid, analyses of sea surface temperature from NOAA (Reynolds analysis) are also assimilated.Sea ice concentration retrievals from the AMSR-E sensor at 12.5 km resolution, available from NSIDC in the U.S., are assimilated.Sea ice drift products from CERSAT, Ifremer in France, with 62.5 km resolution, are assimilated.The 3-day Lagrangian drift product is used, prior to the analysis day.Sea ice drift products are available only in winter.Since Arctic sea ice drift is highly variable, the EnKF has been used as a 4D-assimilative system via an "ensemble-FGAT", with the ensemble used for ageing the covariances between the observations and the state variables.Argo temperature and salinity profiles are obtained from the Coriolis center, Ifremer, France for the assimilation.The TOPAZ system typically assimilates about 250,000 observations per week in total.

Error covariances (observations, background, multivariate)
a. BODAS.BODAS requires explicit estimates of observation errors.SLA errors are assumed to be between 4 and 6 cm, depending on the satellite.In situ temperature and salinity errors are attributed to a constant instrument error (0.01°C and 0.05 psu for CTD; 0.2°C for XBT).SST errors are assumed to be 0.5° for Pathfinder SST and 0.25° for AMSRE.In addition to these errors an estimate of the representation error, computed using the method described by Oke and Sakov (2008b), is attributed to each observation.Briefly, this method is based on a simple processing of altimeter observations that attempts to quantify the magnitude of the unresolved oceanic processes for a given model grid.The method provides estimates of representation error for T, S and sea-level that reflect the variance of unresolved mesoscale variability in the ocean.
The background error covariances for BODAS are ensemble-based.BODAS uses a 120-member stationary ensemble.Each ensemble member is computed from a 15-year model run without data assimilation.For each month of this model run, we simply subtract a 3-day mean from a 3-month mean.This yields anomaly fields that have scales and variability that resemble the mesoscale ocean circulation.We assume that the background error covariances are well approximated by the ensemble-based covariances derived from this ensemble of intra-seasonal anomalies.The ensemblebased covariances that BODAS uses are inhomogeneous and anisotropic -reflecting the different length-scales and variability in different regions.An example of the ensemble-based covariances (shown as correlations) is given in Figure 5.1.In this example, the ensemble-based correlation between sea-level at a fixed point and sea-level in the surrounding region is shown.The panels in Figure 5.1 show correlation fields for the reference location at different locations, demonstrating the anisotropic and inhomogeneous nature of the ensemble-based covariances.A nice feature of the EnOI approach used in BODAS is its multivariate capabilities.Provided an observation type can be directly compared to the model state (e.g., T, S, U, V and sea-level), assimilation of different data types is readily achieved in a single step.An example of the multivariate nature of EnOI is presented in Figure 5.2 (adapted from Oke and Schiller, 2007) showing the SST, SLA and sub-surface T increments when different observations are assimilated.For this example, a series of warm-core and cold-core eddies are evident in the SLA observations, but only one cold-core eddy is evident in the SST observations.This can occur when, for example, the warm-core eddies are capped by cold near-surface waters.As a result, we find that when only altimetry is assimilated (panels a, d and g in Figure 5.2) the increments reflect both the warm-core and cold-core eddies, with a clear surface-expression of the eddies in the SST increments.By contrast, when only SST is assimilated (panels b, e and h in Figure 5.2), the increments reflect only cold-core eddies, plus a general T decrease over most of the region shown.However, when both altimetry and SST are assimilated (panels c, f, and i in Figure 5.2), the increments reflect both the warm-core and cold-core eddies, as well as the surface T decrease.This example demonstrates both the multivariate nature of the EnOI system used here and the importance of different types of observation for assimilation.(d, e, f) SLA increments, and (g, h, i) T increments for a longitude-depth section at 24.4°S (sections denoted in panels a-c) when only altimetry (panels a, d, and g), only SST (panels b, e, and h), and both altimetry and SST (panels c, f, and i) observations are assimilated.For this region, the RMS differences between observed and analyzed SST (SLA) are 0.9° (4.4 cm), 0.02° (11.5 cm), and 0.05° (4.8 cm) for cases ALTIM, SST, and ALTIM+SST respectively.For comparison, the RMS difference between the observed and background SST (SLA) is 0.9° (13.1 cm).Adapted from Oke and Schiller (2007).
b. ECCO.Model (process) and data errors are estimated by the method of covariance matching (Fu et al., 1993).Namely, assuming that errors of the data and that of the model simulation are independent of each other, the two, in data space, can be empirically estimated from auto-and crosscovariances among the observations and their model simulation.The former is utilized as the data error estimate in the assimilation and the latter is employed as a reference to estimate the model error source.In particular, in the ECCO near real-time assimilation, covariances of the NCEP-NCAR reanalysis winds are scaled and employed as those of wind error such that the resulting theoretical model simulation error estimate matches that of this empirical estimate.The theoretical model error estimate that defines the filter and smoother are in turn evaluated following estimation theory (i.e., Riccati Equation) based on these data and process noise estimates.Figure 5.3 illustrates the first order statistical consistency between the resulting Kalman filter estimates and these formal uncertainty estimates.The panels show differences as root-mean-square sea level residuals (model-data differences) between the forecasting and analysis estimates of Kalman filtering.c.FOAM.The Analysis Correction scheme requires the a priori specification of the error covariances for the model forecast and observations.In FOAM it is assumed that there are two main sources of model forecast error.The first arises from errors in the forcing of the model by atmospheric fields such as the wind, heat and freshwater forcing, or in the response of the model to the large scale atmospheric forcing.Errors which occur on scales similar to those of atmospheric synoptic scale systems are denoted 'synoptic scale' errors.The second source of forecast error arises from errors in the internal dynamics of the model.These internal errors are likely to be associated with the baroclinic modes of the ocean and hence consist of several horizontal and vertical scales.At present, the resolution of FOAM is such that many of the higher baroclinic modes are not resolved and so these are included in the errors of representation.The first baroclinic mode is likely to have horizontal errors on scales of a few tens of kilometers with large vertical scales and is represented in the background error covariance.These types of errors are termed 'mesoscale' errors.The functional form of the off-diagonal elements of B is therefore specified to be the combination of two SOAR functions, each with their associated variance and length scale.
Two methods have been combined to provide estimates of the error covariances for FOAM.The method of Hollingsworth and Lonnberg (1986) uses pairs of observation minus forecast values to form the error statistics.This method provides coarse resolution estimates of the observed and background error covariances.The NMC method (Parrish and Derber, 1992) uses the difference between two model forecasts (one of 24 hours and one of 48 hours in this case) which are valid at the same time as a proxy for forecast error, and statistics of these errors are calculated.This method provides background error estimates on the model grid, but can underestimate the magnitude of the errors when model error is present.Estimates of the forecast error covariances have been made using both these techniques, with data taken from a two-year hindcast of the global FOAM system.The estimates of the background error variances from the NMC method are inflated using the variances calculated from the observation-based technique with the spatial structures from the model-based method being retained.The observation error variances from the observation-based technique (which include both measurement errors and errors of representation) are used directly.
The assimilation procedure produces three-dimensional increments to the temperature and salinity fields and two-dimensional increments to the surface height at the analysis time.In order to produce changes to the other prognostic variables, multivariate balance relationships are used.Increments to the velocities are obtained using the geostrophic relationship except within a few degrees of the equator where this relationship does not apply.The baroclinic velocity increments are therefore ramped to zero within 5 degrees of the equator.d.MERCATOR.In a multi-data and multivariate data assimilation system, an important issue is the specification of the error sub-space which in this case is built from a 3D modal decomposition of model solutions.Up to now, a classical EOFs calculation followed by a truncation was used to represent the background error covariances [Testut et al., 2003;Brankart et al., 2003].Even if this method is able to take into account the seasonality of the variability signal, it is costly in term of storage to maintain a real continuity of the error covariances between each analysis step.In practice, the main advantage of this method is the truncation in terms of number of modes.A new method has been then introduced which relies on the concept of statistical ensembles where the members are drawn from model anomalies representative of the actual error covariances.With this approach a truncation is not required anymore, but it is necessary to generate an appropriate number of anomalies.This approach is similar to the Ensemble optimal interpolation (EnOI) used by Oke et al. [2005] which is an approximation of the EnKF that uses a stationary ensemble to define background error covariances.Weekly dependant multivariate 3D anomalies (HBAR, TEM, SAL, U, V), computed from a prior interannual simulation, are used to estimate the background error.By using a running mean, the main feature of this calculation process is to filter out temporal scales at periods greater than a few assimilation cycles.
In a reduced-order method like the SAM-2, the quality of the analysis solution is also strongly constrained by the size of the background error sub-space and the associated numerical cost allowed for the analysis.An important optimization effort has been achieved to minimize the computational requirements of the SAM-2 technological platform, leading to a capacity to manage error sub-space dimensions up to several hundred modes (typically 300 with the Mercator Océan configurations).A number of statistical and dynamical features of the background error covariances as computed in SAM-2 are illustrated by the representer functions in Figure 5.4.e. MOVE/MRI.MOVE systems use multivariate (T and S) analysis scheme through background and observation error variance/covariance matrices.Background error variance/covariance B matrix is decomposed by the T-S coupled EOF modes (see section 2).Observation error variance/covariance R matrix has diagonal elements except for the representativeness error which is formulated as part of the B matrix, i.e., it has off diagonal elements (Fujii and Kamachi, 2003a).
f. NCODA.Specification of the background and observation error covariances in the analysis is very important.The background error covariances in NCODA are separated into a background error variance and a correlation.The correlations are further separated into horizontal and vertical components, which are then modified by a flow-dependent component.All correlations of scalar variables are modeled as second order auto-regressive (SOAR) functions.Horizontal correlation length scales vary with location and are specified as a scaled first baroclinic Rossby radius of deformation (Chelton et al. 1998).Rossby length scales vary from 10 km at the poles to greater than 200 km in the tropics, and the scaling is on the order of 1.3 to 2.8, with small latitude dependence.Vertical correlations evolve from one analysis cycle to the next and are computed from background vertical density gradients using a change in density stability criterion.The resulting vertical correlation length scales vary with depth and are large (small) when the water column stratification is weak (strong).Flow-dependence is introduced in the analysis by a correlation computed from forecast model SSH differences between two locations.A small (large) flow dependent length scale will produce strong (weak) flow dependence in the analysis increments.The flow-dependent correlations tend to spread innovations along rather than across SSH contours.This is a desirable effect, since error correlations across an ocean front are expected to be characteristically smaller than error correlations along the front.
Formulation of the multivariate background error correlations is derived from the first and second derivatives of the SOAR model function using the location dependent length scales.This form requires the calculation of the angles between the two locations and specification of a parameter that measures the divergence permitted in the velocity correlations, and a parameter that specifies the strength of the geostrophic coupling of the velocity/geopotential correlations.A full derivation of the multivariate horizontal correlations is provided in Daley (1991, chapter 5).
Background error variances are poorly known in the ocean and are likely to be strongly dependent on model resolution and other factors, such as atmospheric model forcing errors and ocean model parameterization errors.In NCODA, background error variances vary with location, depth, and analysis variable, and are computed prior to an analysis from a time history of the analyzed increment fields.Using an error growth parameterization, background error variances are allowed to increase with time in the long-term absence of observations until the errors asymptote at the limit of the expected variance, specified as either climate variability or model error.This relaxation to an expected error variance is needed because not all analysis grid locations are corrected in every update cycle due to sampling limitations of the ocean observing systems.In order to distinguish between a zero increment indicating a perfect model forecast or simply no data, an age of the data on the grid variable is analyzed.Age of the data is defined as the number of hours since a grid location has been influenced by an observation.In this adaptive scheme, NCODA background error variances are: (1) appropriate for the time interval at which data are inserted into the model, (2) coherent with the variance of the innovation time series, and (3) reflect the geographically variable skill of the HYCOM model.In practice, the background error variances tend to evolve to a quasisteady state over time and provide a reasonable measure of the spatial structure of model forecast error.
Observation errors and background errors are assumed to be uncorrelated in the analysis.As a result, the observation error covariance matrix R is set equal to the normalized observation error variances along the diagonal and zero elsewhere.Observation errors are computed as the sum of a measurement error and a representation error.Measurement error variances are specified as input parameters in the analysis and reflect what is known about the accuracy of the instruments and the ambient conditions in which the instruments operate.As a result of GHRSST, measurement errors of the satellite SST observing systems are now part of the real-time data streams.Many of the GHRSST data providers compute SST retrieval errors using a sliding time window of retrieval and drifting buoy matchups in space and time.Representation error is a function of the resolution of the model and the resolution of the observing network and is more difficult to quantify.A simple scheme is used in NCODA to increase the measurement error of satellite retrievals based on the gradient of the background field when the spatial averaging properties of the satellite footprint exceed the resolution of the model grid.Representation error of temperature and salinity profile observations is assumed to be a function of an uncertainty associated with the internal wave field.The observed profile vertical gradient is used as a proxy for this aliasing error.g.NEMOVAR.Observation errors are assumed to be mutually uncorrelated so that R is a diagonal matrix of variances.Vertical thinning of profiles (particularly XBTs and CTDs) is performed to reduce the effects of vertically correlated observation error.Two formulations of the variances are possible.The first formulation employs, except near coastlines, constant variances for SSH and SST, and a simple depth-dependent analytical function for profiles.Near coastlines, where model resolution is often inadequate to resolve coastal features, the variances are inflated.The second formulation employs geographically-dependent variances that are estimated from a model-data comparison prior to assimilation, following the method of Fu et al. (1993).This method has the advantage over the first formulation of increasing observation error in dynamically active regions (where representativeness error is presumably larger) although experience with OPAVAR suggests that the method tends to overestimate the variances (Daget et al. 2008).
Background errors are assumed to be correlated.The key to the design of the covariance matrix is to transform the model state variables to new variables whose background errors are approximately uncorrelated.This is achieved in NEMOVAR using a linearized "balance" operator K .The new error variables correspond to temperature and so-called unbalanced components of salinity, SSH and horizontal velocity.By transforming variables in this way, the background-error covariance matrix can be taken to be block-diagonal (univariate) with respect to the transformed variables.The covariance matrix with respect to the original variables is represented by the symmetric product of operators where the matrix product is the preconditioning transformation mentioned earlier, and the product is the block-diagonal (covariance) matrix for the transformed variables.The multivariate component of is established implicitly via and .The matrix contains the variances of the transformed background errors, and the matrix product is the correlation matrix of the transformed errors.The balance relationships currently used in NEMOVAR are identical to those used in OPAVAR (Weaver et al. 2005).The balanced component of salinity is computed from temperature under the assumption that the local background T-S relation is approximately preserved; the unbalanced component of salinity represents changes to the background T-S relation.The balanced component of SSH is computed from density and corresponds to a baroclinic signal; the unbalanced component of SSH corresponds to a barotropic signal.The balanced component of horizontal velocity is computed from the geostrophic relation, with a β-plane approximation used near the equator to obtain geostrophically balanced zonal velocity errors; the unbalanced component of velocity corresponds to the ageostrophic component.
The variances in are currently specified empirically.For temperature and unbalanced salinity, the variances are flow dependent.For temperature, the variances are parameterized in terms of the vertical gradient of the background temperature field in order to increase background error in regions of strong thermal variability.For unbalanced salinity, the variances are defined to be largest in regions where mixing is important and thus where water masses are least likely to be preserved.

D
For unbalanced SSH, a simple function of latitude is used to increase the variances at higher latitudes where the barotropic component is important.Constant variances are used for unbalanced velocity.An ensemble method has recently been developed to provide more objective estimates of these variances.The method has been applied in OPAVAR (Daget et al. 2008) but is not yet implemented in NEMOVAR.
The block matrix components of are 3D univariate smoothing operators, each constructed as the product of a 1D (vertical) and 2D (horizontal) anisotropic diffusion operator (Weaver and Courtier 2001).The product of with its adjoint is, with appropriate normalization, a 3D correlation operator.The correlation functions implied by the diffusion model are approximately Gaussian.The correlation length scales are specified as described in Weaver et al. (2003).As with the variances, these estimates are somewhat subjective and will be improved in the future using the ensemble method mentioned above.F F T F h. TOPAZ.Initial errors (only at the very start of a forecast experiment) are obtained by random perturbations of the isopycnal layer thicknesses, followed by 2 months of an ensemble equilibration run that uses randomly perturbed atmospheric forcing fields.The statistics of the initial errors are a linear combination of the following structures, using a Gaussian horizontal covariance and an exponential vertical covariance: (1) large scales, 400 km horizontal range, vertically correlated over 2 layers, standard deviation of 10% of the layer thickness; (2) eddy scale, 60 km horizontal range, vertically correlated over 5 layers, 5% of the layer thickness, and (3) ice thickness, 300 km horizontal range, 10% of the ice thickness.
Model errors are dominated by errors in the atmospheric forcing fields.Random perturbations are temporally correlated with 2 days temporal range and 250 km horizontal range, using the following standard deviations: wind stress, 0.03 N/m 2 ; surface wind speed, 1.6 m/s; surface air temperature, 3 K; and radiative fluxes, 0.07 W/m 2 .These standard deviations are purposely set higher than errors reported by ECMWF, in order to mimic uncertainties in the model mixing scheme parameterization.
All satellite and in situ measurement errors have a Gaussian spatial autocorrelation with a radius of 200 km.Argo profiles are thinned on the hybrid layer thickness and their errors are uncorrelated.The following standard deviations are used and increased for representation errors: SLA, 10 cm; SST, 0.5° K; sea ice concentrations, 10%; sea ice drift, 1 pixel (62.5 km) over 3 days; Argo T/S profiles, 1° K for temperature, and 0.1 psu for salinity.
The resulting forecast error covariances are multivariate and flow dependent.In the example of an SST in the Gulf Stream, the horizontal correlations are anisotropic depending on the wind direction.In Figure 5.5(a), on the 3 rd of January 2006 a strong northwesterly wind is blowing from Newfoundland and the correlation pattern is oriented NW-SE.In addition, a front is located to the Northwest of the target point (marked with a cross) that cuts the correlation across the front.These influences make the horizontal correlation dissymmetric, so that assimilation of SST on one side of the front will not update SST on the other side of the front.In Figure 5.5(b), the zonal section across the target point shows that the vertical covariance pattern is also dissymmetric.Other examples of flow-dependent multivariate statistics between sea-ice and ocean variables are given in Lisaeter et al. (2003).

Performance/validation results of assimilation
a. BODAS.The primary test bed for BODAS is the Bluelink ReANalysis (BRAN).BRAN is a multi-year model integration with data assimilation.To date, three versions of BRAN have been performed, version 1.0 (Oke et al. 2005), version 1.5 (Oke et al. 2008) and version 2.1 (Schiller et al. 2008).A comprehensive assessment of BODAS, when applied for version 1.5 of BRAN, is presented by Oke et al. (2008).Through a series of comparisons with both assimilated and with-held observations, Oke et al. (2008) show that around Australia (90-180°E, 60°S-20°N), BRAN fields are typically within 4-10 cm and 0.4-1°C of observed SLA and SST respectively; and within 1°C and 0.15 psu of observed sub-surface T and S respectively; and within about 0.2 m/s of observed nearsurface currents from surface drifting buoys.
An example of BODAS applied to version 2.1 of BRAN is presented in Figure 6.1, showing a series of comparisons between 6-day composite AVHRR SST fields and 5-day averaged SST from.Overlaid on the BRAN SST fields are 5-day Lagrangian trajectories, derived from the time-varying surface velocities computed by BRAN.Note that BRAN does not assimilate the high-resolution SST data shown in Figure 6.1; rather it assimilates spatially averaged fields of AMSRE SST, with a nominal spacing of 50-100 km.These comparisons highlight the energetic nature of the circulation in the Tasman Sea, particularly in the region south of 32 o S, where EAC eddies frequently develop.BRAN shows good agreement with the observed features, demonstrating that it can realistically represent the time-varying mesoscale circulation in the Tasman Sea (Figure 6.1).This demonstrates that the EnOI system used by BODAS, including the error estimates therein, while not optimal, is capable of constraining an eddy-resolving model in this highly energetic region.b.ECCO.The RTS smoother further improves upon the Kalman filter analysis by utilizing observations that lie formally in the future of each filtered estimate.More importantly, the smoother also provides estimates of model error sources, in particular, corrections to the wind forcing in case of the ECCO near real-time analysis.In account of the approximations employed in the approximate smoother, a simulation is repeated with the modified forcing to obtain a smoothed state estimate that is fully consistent with the model and with the smoothed forcing.
Figure 6.2 demonstrates the fidelity of such smoothed estimate in comparison to that of the filtered solution in simulating the observed sea level variability.Other measures of the assimilation's performance and utility can be found in many of the analyses' applications.For instance, estimates have been employed to study temperature budgets and its variability (e.g., Kim et al., 2007) and changes in water mass characteristics associated with an El Niño (e.g., Wang et al., 2004).The solutions have also been utilized in analyzing pathways of water mass circulation (e.g., Qu et al., 2008), in identifying mechanisms of sea level variability (e.g., Fukumori et al., 2007), and in initializing coupled ocean-atmosphere models for climate prediction (e.g., Yulaeva et al., 2008).The ocean state estimates are also employed in geodetic applications which provide an entirely independent validation of the estimates' fidelity (e.g., Gross et al., 2005).c.FOAM.The model component of the FOAM system has recently been changed to NEMO.A number of experiments have been run using the FOAM system with both the old and new model, but only results from the old system are shown here.Figure 6.3 shows the impact of assimilation of in situ profile data in the old FOAM 1/9° north Atlantic model for a set of 5-year integrations.This shows that the temperature and salinity errors are significantly reduced when assimilating the in situ data than without.When no Argo data are assimilated, the salinity errors are marginally worse than with no data assimilated in the top 400m, probably due to the temperature-only assimilation disrupting the density structure in the model.These statistics show both the importance of the Argo data and the beneficial impact of the data assimilation on the model fields.Results from the new FOAM system show similar levels of RMS errors to the old system compared to the in situ data, but with reduced biases (not shown).The impact of the assimilation of altimeter SSH data on the old FOAM system was assessed using identical twin experiments.The 1/9° north Atlantic model was run with assimilation of real temperature and salinity data to generate a reference model state known as the "truth".The conditions at the end of this year-long run were used as the initial conditions for another run of the same period in which the temperature and salinity data were assimilated, known as the "control".Simulated observations of SSH were created from the "true" run according to the locations of the real along-track altimeter data with measurement noise added with a standard deviation of 4cm.These were then assimilated into a run of the model which started from the same initial conditions as the "control".Figure 6.4(a) shows RMS errors calculated over all model grid points between the model field and the "true" field.The assimilating run has much lower RMS errors (about 6cm) than the run in which only in situ data were assimilated.Errors in the forecast are shown in Figure 6.4(b), which shows the average errors for the forecasts that are started from the "control" and "assimilation" integrations, together with errors when the analysis is persisted into the forecast period.The forecast runs from the "assimilation" integration have smaller errors than the forecast from the "control" integration throughout the 30-day forecasts, showing that the analysis of SSH is providing a good initialization of the model.The errors from the "assimilation" forecasts are similar to persistence for the first 7 days of the forecast, but the model does provide an improved forecast in the subsequent period d.MERCATOR.The SAM-2 scheme has been running operationally since 2007 in the Global ¼° and North Atlantic (1/12°) Mercator Océan systems.Significant improvements have been obtained with SAM-2 in comparison to the previous release SAM-1 in most oceanic regions.For instance, coastal regions are better represented with SAM-2, in particular across the shelf breaks.Biases in the equatorial regions have been reduced significantly, and the occurrence of spurious equatorial upwelling patterns as noticed earlier with SAM-1 has been suppressed.The overall performances of the SAM-2 assimilation scheme on the Global ¼° system are illustrating by RMS misfits to assimilated data (Figure 6.5).The ability to correct at each cycle the forecast solution is also indicated by the spatial distribution of mean misfits over 2007 to sea surface temperature from insitu before and after the analysis step (Figure 6.6).Here we introduce some characteristics of the analysis/reanalysis experiments.In the western North Pacific, two typical water masses (Kuroshio and Oyashio waters) are confluent at the boundary of the subtropical and subpolar gyres.Figure 6.7 shows temperature and salinity distributions along two JMA observation lines.The assimilation scheme (multivariate 3DVAR) works for representing temperature and salinity fields of the subtropical (Kuroshio) and subpolar (Oyashio) waters (Usui et al. 2006(Usui et al. , 2008a,b),b).Statistical validation is reported in Usui et al. (2005).Mean water mass property (temperature and salinity) along JMA hydrographic observation lines around Japan are also examined.In Fig. 6.8, mean water mass properties though some lines (PH line in the subtropicalsubpolar boundary region, bottom layer in the Japan Sea along PM line) show biases of the property (Matsumoto ( 2008), in preparation).f.NCODA.Performance of the NCODA analysis system is routinely assessed by examination of the model innovation and analysis residual time series from the cycling assimilation.Residual RMS errors are consistently less than innovation RMS errors, indicating that the analysis is making effective use of the observations.Residual mean errors are indistinguishable from zero for all analysis variables, indicating an unbiased analyzed state.However, HYCOM mean temperature innovation errors are consistently biased cold at the surface (~0.3°C), and there is a small positive temperature bias at depth.Consistency of the specified error variances with the innovation vector is monitored by the J min diagnostic (Daley and Barker 2001), which is routinely computed for all observing systems and analysis variables at each update cycle.When normalized by the number of observations assimilated, the expected value of the J min is one.Fig. 6.9 shows a six-month time series of J min  computed for analysis variables and profile observing systems.For temperature and SST analysis variables, Fig. 6.9a shows J min values close to the expected value of one during summer that slowly drift to J min values less than one in winter.For altimeter SSH, however, the J min diagnostic indicates a systematic problem with the error variances believed to be due to altimeter along-track correlated error that is not taken into account in the assimilation.For profile observing systems, Fig. 6.9b shows that the error variances specified for fixed and drifting buoys and synthetic temperature profiles computed from the altimeter SSH data using Cooper-Haines need to be tuned.The background and observation error variances specified for the other profile observing systems appear to be correct or nearly correct (Fig. 6.9b).An example of assimilative HYCOM forecast performance in terms of SSH anomaly correlation is shown in Figure 6.10.The panels in Fig. 6.10 show how SSH forecast skill is dependent not only on ocean model error but also on the quality of the atmospheric forcing.In areas like the Gulf Stream, a highly nonlinear region with strong flow instabilities, current meanders, and generation of mesoscale eddies, forecast skill is relative insensitive to errors in the atmospheric forcing.In other areas, like the equatorial Pacific, where there are wind-driven equatorial waves and variations in equatorial upwelling, 30-day forecast errors are much more sensitive to errors in the atmospheric forcing.Overall, the assimilative model forecasts perform better than persistence, with evolution of the forecasts relatively more sensitive to errors in the initial state than to errors in the atmospheric forcing over most of the global ocean.forecast length in comparison with the verifying analysis for global HYCOM over the world ocean and five subregions.The red curves verify forecasts using operational atmospheric forcing, which reverts toward climatology after five days.The green curves verify "forecasts" with analysis quality forcing for the duration and the blue curves verify forecasts of persistence (i.e.no change from the initial state).The plots show median statistics over twenty 30-day HYCOM forecasts initialized during January 2004 -December 2005, a period when data from three nadir-beam altimeters, Envisat, GFO and Jason-1, were assimilated.

g. NEMOVAR.
A cycled 3D-Var experiment using a global version of NEMOVAR has been conducted for the 20-year period 1987-2006.The assimilated data consist of temperature and salinity profiles from EN3v1c.The surface forcing fluxes are derived from ERA40 and include a strong relaxation to SST analyses from OIv2 in the surface heat flux.Results from the experiment are summarized in Figure 6.11, which shows the global mean and root-mean-square (rms) of the model fit to the temperature and salinity data (the first year of the experiment has been excluded from the statistics).Compared to a control experiment in which no data are assimilated, the assimilation experiment improves the mean and rms fit to the data in both the analysis and, to a lesser extent, the background.Also evident is how the fit to the data achieved by the 3D-Var analysis is degraded with the IAU procedure, with the model-data fit after IAU lying in between the 3D-Var analysis residual and the background-minus-observation difference.h.TOPAZ.The validation of the TOPAZ system considers jointly the validation of the model and of the assimilation scheme.Since the two steps are run in a sequence, their validation is linked.Figure 6.12 shows some "Class4" diagnostics that illustrate some aspects of the TOPAZ assimilation updates.Figure 6.12(a) shows average forecast skills of sea-ice concentrations in the Barents Sea.The green line shows the residual errors of the analysis (after assimilation), the blue line the forecast errors (using 10 days forecast forcing fields from the ECMWF, completed to 14 days using the climatology) and the red line the errors of a trivial predictor: the persistence of the analysis.The forecast is doing better than persistence, and the assimilation reduces the errors from 13% to 10%.a. BODAS.To date, BODAS has been applied to the Bluelink ocean forecast and reanalysis system (Oke et al. 2005;2008;Schiller et al. 2008;www.bom.gov.au/oceanography/forecasts/).There are several directions that BODAS developments may take in the future.BODAS is likely to be converted into a full EnKF, exploiting a time-varying ensemble derived from an ensemble of forecasts.These developments will initially be tested on a limited area modeling system.BODAS may also be modified to facilitate coupled ocean-atmosphere data assimilation.This is likely to be pursued under the development of a tropical cyclone prediction system.The application of BODAS to sea-ice forecast system and a biogeochemical forecast system are also areas that are under consideration.In theory, BODAS can be readily applied to these applications, but in practice, this will take some significant developments.
b. ECCO.The ECCO near real-time Kalman filter has been ported to the ocean model of the National Centers for Environmental Prediction's (NCEP) next generation seasonal-to-interannual climate forecasting system so as to compare the efficacies of the filter estimates with NCEP operational analyses in state estimation and climate forecasting.The adjoint of the NCEP model has also been derived permitting an efficient means to evaluate sensitivities of the circulation to various model elements and controls.The NCEP model is based on the Modular Ocean Model (MOM4) of the Geophysical Fluid Dynamics Laboratory, National Oceanic and Atmospheric Administration.The model is configured over the entire globe, including the Arctic Ocean, and employs a nominal spatial resolution of 0.5°.The additional spatial extent and higher resolution of the new NCEP model compared to the ongoing MITgcm-based estimate is expected to provide a higher degree of fidelity in estimating the oceanic state.
c. FOAM.The model component of the FOAM system has recently been changed to NEMO.This provides the opportunity to share and collaborate on the use of an advanced data assimilation scheme.It has therefore been decided to replace the current ocean data assimilation system used in the FOAM system with a more advanced scheme.After a review of the various schemes which are available, it has been decided to implement the NEMOVAR scheme which is described elsewhere in this paper.This will initially be implemented in a 3DVar FGAT scheme, but will eventually evolve into a 4DVar scheme.

d. MERCATOR.
Beyond 2008, it is intended to further develop the assimilation system by implementing more advanced statistical parameterizations (e.g.4D background errors to account for temporal correlations), and to investigate the feasibility of ensemble-based forecasting methods to improve the representation of background errors according to the "dynamics of the day".On-going research efforts aim at introducing a smoother version of SAM-2 which will be more suitable for the production of multi-years reanalyzes.In the meantime, a better initialization strategy based on the Incremental Analysis Updating method [Ourmières et al., 2006] is being deployed in the Mercator Océan systems from 2008 onwards.
A further extension of the observation operator to assimilate a wider variety of data types is foreseen in the perspective of new observing systems, such as sea-surface salinity measurements from satellites with the forthcoming SMOS mission [Tranchant et al., 2008].The assimilation of sea-ice properties, such as sea-ice concentration, is another R&D work in progress.
In terms of systems, Mercator Océan is developing a new ocean forecasting system based on a global NEMO ocean circulation (at a resolution of 1/12°) and the LIM-3 sea-ice model [Vancoppenolle et al., 2008] coupled to the SAM-2 scheme.In the context of the E.U.MERSEA project, a near real-time demonstration experiment has been performed successfully, and promising results have been produced in terms of performances, resolution of the mesoscale variability, and downstream applications to regional systems and primary production (e.g.Brasseur et al., this symposium).This new system will offer new perspectives from 2009 onwards to monitor in realtime the global ocean circulation at high resolution and implement coupling with integrated marine biogeochemistry and ecosystem models (Schiller et al., this symposium).
e. MOVE/MRI.Improvements and developments of the model and assimilation schemes in the MOVE/MRI.COM systems will be continued in MRI.JMA's operational and MRI's research groups also have the plans of the developments of coastal/shelf sea applications (with MOVE/MRI.COM-Cst system) and air-sea coupled assimilation (MOVE/MRI.COM-C) using 3DVAR, and also will develop 4DVAR version.Therefore we will operate five systems: MOVE-G, -NP, -WNP, -Cst, -C.
f. NCODA.The NCODA analysis has been upgraded to an observation space formulation of 3DVar based on NAVDAS (Daley and Barker, 2001).The advantages of 3DVar include: (1) global solution with no data selection, (2) greater flexibility for assimilating different observation data types, (3) general framework for using more sophisticated background error covariance models, and (4) a multi-incremental analysis, with an inner and an outer loop, where observations non-linearly related to the forecast state can be assimilated.3DVar is a valid approximation to 4DVar when the assimilation time window is short and the dynamics are quasi-stationary.Early testing with the NCODA 3DVar system has shown: (1) the analysis increments are better balanced, (2) the solution extracts more information from the sparse ocean observations at depth than does the MVOI, and (3) the run time of the analysis is much reduced since there is no need to perform multiple grid point analyses in overlapping volumes.Another important aspect of the 3DVar is that it has a much reduced memory footprint.It is now possible to solve the entire HYCOM global 1/12° resolution grid in a single execution of the analysis script.
The global HYCOM model is being upgraded as well.Additional isopycnal layers are being added to the model and the model is being configured to run on a global 1/25° resolution grid (3.5 km average grid spacing) that includes tides.The inclusion of tides in the model will present a challenge in the assimilation of altimeter SSH using methods that propagate the measurement solely into baroclinic structures at depth.

g. NEMOVAR.
At ECMWF, NEMOVAR will replace the current HOPE/OI ocean data assimilation system (Balmaseda et al. 2008) used to initialize the operational coupled model for monthly and seasonal forecasts.The UK Met Office is also adopting NEMOVAR to replace their current operational ocean data assimilation system.In France, NEMOVAR is the focus of several multi-partner research projects sponsored by the CNRS, CNES, ANR and Mercator-Océan.Major scientific improvements to the NEMOVAR system are planned through the inclusion of an automated data quality control system and through the development of four-dimensional variational assimilation, ensemble methods, and better background-and observation-error covariance models.
h. TOPAZ.The ecosystem model coupled to TOPAZ has been run on research mode for hindcast studies (Hansen and Samuelsen, 2008).The next step in the developments of TOPAZ is the inclusion of NORWECOM (the NORWegian ECOsystem Model) in real-time and assimilation of ocean color data.This will necessitate an extension of the EnKF scheme to assimilate non-Gaussian variables like the Gaussian anamorphosis (Bertino et al. 2003).A 20-years reanalysis will be conducted with the TOPAZ3 version of the system as part of the MyOcean European Integrated Project, assimilating ice and ocean observations.The data from the forthcoming space missions (GOCE, SMOS, CryoSat II) will be assimilated in TOPAZ.The impact of assimilating ice thicknesses as from CryoSat has been previously been evaluated (Lisaeter et al., 2007).The developments of the HYCOM model and the developments of the sea ice module will be integrated in the TOPAZ system and will improve the assimilation by improving the dynamical error statistics.

Figure 3
Figure 3.1 ECCO.Coarse horizontal grid employed in ECCO near real-time assimilation.Different symbols denote different regional reduced-grid partitions used to estimate baroclinic errors.

Figure 5
Figure 5.1 BODAS.Examples of the ensemble-based cross-correlations between sea-level at a reference location, denoted by the star, and sea-level in the surrounding region for a reference location (a) on the continental shelf, (b) over the continental slope and (c) over the deep ocean off eastern Australia (panel (d)).Contour intervals are 0.2; zero is bold, dotted is negative, correlations above 0.6 are shaded.Adapted fromOke et al. 2008.

Figure 5
Figure 5.2 BODAS.Examples of (a, b, c)SST increments, (d, e, f) SLA increments, and (g, h, i) T increments for a longitude-depth section at 24.4°S (sections denoted in panels a-c) when only altimetry (panels a, d, and g), only SST (panels b, e, and h), and both altimetry and SST (panels c, f, and i) observations are assimilated.For this region, the RMS differences between observed and analyzed SST (SLA) are 0.9° (4.4 cm), 0.02° (11.5 cm), and 0.05° (4.8 cm) for cases ALTIM, SST, and ALTIM+SST respectively.For comparison, the RMS difference between the observed and background SST (SLA) is 0.9° (13.1 cm).Adapted fromOke and Schiller (2007).

Figure 5
Figure 5.3 ECCO.A comparison of model-data differences (left) with their theoretical expectations (right).The panels show differences as root-mean-square sea level residuals (model-data differences) between the forecasting and analysis estimates of Kalman filtering.

Figure 5
Figure 5.4 MERCATOR.Representer functions (superimposed) for the height fields (colors) in cm and seasurface temperature fields (contours) in °C relative to four independent +1°C sea surface temperature observation anomalies located in four different regions: in the Florida current at (32°N, 77°W), in the equatorial Pacific at (2.25°N, 120°W), near Panama canal at (5.25N, 80W) and in the North Brazil current at (5,25°N, 50°W).The contour interval for the SST representer fields is 0.2 °C.
Figure 5.5 TOPAZ.Forecast error on 3 rd January 2006 in the Gulf Stream, (a) spatial correlations of an SST in one point (cross) with surrounding SSTs, and (b) with temperatures along a section at 40 N (b).

Figure 6
Figure 6.2 ECCO.Model-data residual sea level variance; simulation (black), Kalman filter (red), smoother (blue).Note the smoother's results being nearly indistinguishable from the filter, whereas the simulation residuals are substantially larger.

Figure 6
Figure 6.3 FOAM.R.M.S. errors in (a) temperature (°C) and (b) salinity (psu) analyses compared to in situ profile observations before they are assimilated averaged between January 2001 and July 2005 and over the north Atlantic model domain.Results are shown for runs which assimilated all in situ data (solid), all in situ data except Argo (dotted) and no data assimilation (dashed).

Figure 6
Figure 6.4 FOAM.RMS errors in SSH (cm) compared to the "true" run at all model grid points for the "control" integration (dashed line) and SSH assimilation integration (solid line) (a) for the daily mean fields and (b) as a function of forecast day with persistence (dotted line) included.

Figure 6
Figure 6.5 MERCATOR.Overall performances of the Global ¼° operational system with SAM-2 scheme on the whole domain.RMS of SLA misfits in cm (top) for Jason (black), Envisat (blue) and GFO (Orange) from

Figure 6
Figure 6.6 MECATOR.Mean sea surface temperature (in °C) innovation (top) and residual after SAM-2 analysis (bottom) over 2007.These mean misfits to assimilated vertical T/S profiles from CORIOLIS are computed from the Global ¼° system.

Figure 6 .
Figure 6.10 NCODA.Verification of 30-day HYCOM ocean forecasts: median SSH anomaly correlation vs.forecast length in comparison with the verifying analysis for global HYCOM over the world ocean and five subregions.The red curves verify forecasts using operational atmospheric forcing, which reverts toward climatology after five days.The green curves verify "forecasts" with analysis quality forcing for the duration and the blue curves verify forecasts of persistence (i.e.no change from the initial state).The plots show median statistics over twenty 30-day HYCOM forecasts initialized during January 2004 -December 2005, a period when data from three nadir-beam altimeters, Envisat, GFO and Jason-1, were assimilated.

Figure 6
Figure 6.11 NEMOVAR.Vertical profiles of the 1988-2006 time-mean (thin curves) and rms (thick curves) of the globally-averaged model-minus-observation differences for temperature (left panel) and salinity (right panel) in a 1° x 1° global NEMO configuration.Displayed are the statistics from a control experiment in which no profile data are assimilated (red solid curve) and from a 3D-Var (NEMOVAR) experiment.For the 3D-Var experiment, the statistics are shown for the background-minus-observation (green dashed curve), the cost function residual (pink dotted curve), and the analysis-minus-observation after IAU (blue dashed curve).

Figure 6
Figure 6.12 TOPAZ.Performance of the data assimilation system.a) Forecast skills of ice concentrations in the Barents Sea on average on the period April to September 2007.b) Statistics of residual errors of SLA in the North Atlantic Drift from April 2007 to February 2008, alternate weeks are drawn in different colors to highlight the assimilation steps.

Figure 6 .
Figure6.12(a) shows average forecast skills of sea-ice concentrations in the Barents Sea.The green line shows the residual errors of the analysis (after assimilation), the blue line the forecast errors (using 10 days forecast forcing fields from the ECMWF, completed to 14 days using the climatology) and the red line the errors of a trivial predictor: the persistence of the analysis.The forecast is doing better than persistence, and the assimilation reduces the errors from 13% to 10%.Figure6.12(b)shows the residual errors of SLA in the North Atlantic Drift region, with the analysis time series only and not the forecast.The time series show almost no jump at each assimilation step, indicating a smoother assimilation of SLA than of ice concentrations.The error time series shows a seasonal bias maximum in the fall, but no temporal variation of the residual "unbiased" RMS error.

Table 1 .
A summary of ECCO ocean analyses.