Serving GODAE Data and Products to the Ocean Community

. The Global Ocean Data Assimilation Experiment (GODAE [http:// www.godae.org]) has spanned a decade of rapid technological development. The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven the development of a number of innovative technologies that are essential for connecting scientists with the data that they need. This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world. The key to this capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products. The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals.

Oceanography September 2009 71   mandate to oversee management of the comprehensive collection.Furthermore, transferring all of these data to a central location would be impractical and would force data providers to relinquish some control over their data products.Thus, a distributed data management approach that provides the ability to share oceanographic data effectively across the Internet is central to GODAE's aim of developing a global ocean forecasting system.In addition to solving many of the problems associated with managing large data holdings, a distributed system can provide users with more reliable and efficient data services through data replication and a more efficient use of networks.
The underlying approach to GODAE Data Services is a suite of tools based upon shared approaches.These tools have been designed primarily for use by the scientific research community, but provide a solid foundation upon which other systems can be built to serve other user communities such as governments and commercial entities.The tools allow scientists to use data in a manner that frees them from the necessity to understand the low-level details of file formats, structure, or even the physical location of the data.The notion of hiding complexity is fundamental to the success of a data system that must deal with such large varieties and volumes of data.This hidden complexity must be balanced against the need to support a very wide range of applications, and it requires flexibility.GODAE services strike this balance by providing two different ways for the scientist to access and use data: (1) through Web portals, which hide data complexity but provide a fixed range of functionality, and (2) by allowing data to be ingested into the scientist's desktop tool of choice.These two approaches can work in concert, with the scientist performing preliminary discovery, evaluation, and analysis on the Web, then using specialist desktop tools, if required, to perform further tasks.

GODAE DAtA cENtErS AND thEir prODuctS
The GODAE data centers (Blanc et al., 2008)

ElEmENtS OF thE GODAE DAtA SyStEmS A common knowledge of the products and their uses
It is important for all data providers and users to understand the processes that have been undertaken to produce a given data product.Knowledge of the entire context of a data product involves the traceability of events for operational production and the understanding of the product's attributes (including what is produced, the ocean region covered and its scale resolution, how it has been produced, who produced it, when and where it is made available, for how long, with what accuracy, delivery format and network delivery services, data and network service policy, and more).This information allows a user to decide upon a data product's fitness for a particular purpose.A key concept here is the notion of a product's "level, " from raw instrument data (Level 0) all the way up to an ocean indicator (Level 5) (see Figure 1).The MERSEA Web site (http:// www.mersea.eu.org/) uses this notion of "levels" to provide users with a consistent view of all available data products.Jon D. Blower (j.d.blower@reading.ac.uk)

Base technologies
Before sophisticated data systems can be built, there must be widespread adoption of common approaches for describing and transporting oceanographic data.
Three main technologies play a large role in harmonizing GODAE products: the netCDF file format, the Climate and Forecast (CF) metadata conventions, and OPeNDAP (Open source Project for a Network Data Access Protocol).

File Formats and Conventions: netCDF and CF
A large number of file formats are available for expressing oceanographic data, from free-form, plain-text (ASCII) files, to highly structured binary formats.These formats often differ at a fundamental level, making it difficult to develop tools and applications that work with all formats.To alleviate this difficulty, the GODAE community has standardized around netCDF (http:// www.unidata.ucar.edu/software/netcdf/),which provides an array-oriented, platform-independent binary file format that can contain a wide variety of data types, from in situ measurements to large multidimensional grids of data from numerical models.The netCDF format is backed up by high-quality software libraries, in a variety of languages, which greatly ease the process of developing applications that consume and produce netCDF data.Furthermore, some of these software libraries (e.g., the official Java netCDF library) are able to read a variety of other file formats (such as GRIB-GRIdded Binary format, a World Meteorological Organization standard for encoding forecast data) and interpret them as if they were netCDF files.In this way, the GODAE community has achieved harmonization of previously disparate data sets.netCDF provides a simple, disciplineneutral data way to encode multidimensional arrays and their attributes.
The CF conventions (http://www.cfconventions.org)provide the additional semantics defining how to encode oceanographic data (and data from other disciplines) in netCDF files.These conventions are currently focused on the description of gridded data from numerical models or analyzed satellite products.
They provide a means to describe the grid on which the data are expressed, together with a suite of "standard names" that are used to identify the geophysical quantity that the data represent (e.g., "sea_water_potential_temperature").Godiva2 [Blower et al., 2009]

OPeNDAP
While netCDF provides a consistent data format in which to store GODAE data and CF provides a consistent metadata description of these data, an OPeNDAP service provides a consistent mechanism with which data may be accessed over the Internet (Cornillon et al., 2009).

Primary Viewing Services
In the European GODAE project MERSEA, the "primary viewing services" were defined to permit visualization of daily updated, predefined plots (historical plots may also be available).These services (see Figure 2

Downloading Data
The visualization systems described above can help a new user to evaluate whether a data product meets his Data distribution by standard methods such as FTP or HTTP is reliable and mature, but can lead to a very high load on networks, as users need to download large data files even if they only wish to access a small subset of the data (e.g., to access data for a regional sea from a global ocean model).More sophisticated data services such as OPeNDAP allow much more powerful and flexible access to data, permitting users to access only the precise data they need.In addition to reducing the amount of data transferred, these "intelligent" data serving systems can help users by presenting data in consistent forms,

Analyzing Data
Many dedicated software tools have been developed and made available over the past 10 years for various scientific applications (Blower et al., 2008).One of the primary goals of developing technologies for describing, discovering, visualizing, and accessing data is to allow data products from different providers to be intercompared.In GODAE, some of the early intercomparisons were implemented via the GrADS system, and this capability was later implemented in Europe, the United States, and Australia using the Live Access Server.

The Live Access Server
The foundation of GODAE data management planning from its earliest days in 2002 included the use of OPeNDAP for access to distributed data sets and the use of the Live Access Server (LAS; http://ferret.pmel.noaa.gov/LAS;Schweitzer et al., 2007;Blower et al., 2008) to generate products (maps and scientific graphics, tables, and data subsets) and to perform intercomparison   The ocean community must develop good-quality reusable "building blocks" that can be assembled in various ways to enable new end-user applications to be developed at a reasonable cost.Such building blocks will include user interface components (such as interactive maps) and Web services for accessing data and catalogues.
There will be a continuing need to engage closely with users in order to ensure that future data systems meet their needs.The near future will see a strong move toward "operationalization" of the GODAE data systems.For example, in Europe, MyOcean, the Global Monitoring for Environment and Security Marine Core Service (Blanc, 2008), will deliver a pre-operational data system that will provide data, with specified service-level agreements, to users of oceanographic data in many disciplines all over Europe.It would not be possible to develop such a system without the advances made within GODAE.
This paper summarizes the technological advances that have been made in the context of GODAE, advances that greatly facilitate the user's ability to discover, evaluate, visualize, download, and analyze a huge number of oceanographic data products.iNtrODuctiON Oceanographic data are highly diverse, covering a broad range of spatial scales and encompassing remotely sensed data, in situ measurements, and numerical simulations.To extract the maximum amount of information from these data sources, it has been necessary to develop and deploy new technology platforms that allow data to be discovered, shared, visualized, and analyzed.GODAE data holdings are very large: current ocean prediction systems and data repositories generate tens of terabytes of oceanographic data per year per organization.This rate is expected to increase rapidly with the deployment of new observing systems and increases in the resolution and complexity of numerical models.There is no single centralized authority with the resources or ABStr Act.The Global Ocean Data Assimilation Experiment (GODAE [http:// www.godae.org])has spanned a decade of rapid technological development.The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven the development of a number of innovative technologies that are essential for connecting scientists with the data that they need.This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world.The key to this capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products.The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals.

Figure 1 .
Figure1.From raw data to ocean summary information: various levels of ocean data products.
, andDChart [http://www.epic.noaa.gov/epic/software/dchart/]).Although CF-netCDF has mainly been used to describe gridded data (e.g., from numerical models or satellites), use of CF-netCDF as a standard for encoding in situ measurements is rapidly gaining acceptance.Within the moored data (http://www.oceansites.org/), underway ship observation (http:// www.ifremer.fr/gosud/,http://samos.coaps.fsu.edu/html/), and ocean profiler communities(Gould, 2005), standards that are based on CF are nearing completion.These standards build on CF by adding additional metadata needed to describe the specific measurements.The resulting files are fully CF-compliant and can be read by a number of generic CF-compliant applications.Note, though, that such files can describe only a single observation event such as a single time series, profile, or ship track.No widely agreed upon standards exist as yet to describe collections of observations (although a number of candidates exist), and this is a key obstacle to the development of systems that allow users to visualize and process in situ data.The latest version of netCDF (version 4) contains new features that make it suitable for encoding such collections of observations, and research is ongoing into how this can be achieved in practice.
Specifically, users may access subsets of data sets residing elsewhere on the Internet and ingest them directly into their analysis packages.OPeNDAP servers may also provide for aggregation of large gridded data sets residing across several files.These capabilities are important, because a scientist wishing to access an oceanographic data set (for example, a multidecadal ocean reanalysis) often does not require the entire data set, which may be hundreds of gigabytes or even terabytes in size.Furthermore, it is often not desirable to regard the data set as a large set of individual files: the scientist may prefer to regard it as one large four-dimensional data set, which can be subsampled in numerous ways.There is a very close relationship between the netCDF file format and OPeNDAP.It is possible to transmit netCDF data via OPeNDAP with (very nearly) no loss of information.Many desktop data analysis tools, such as Ferret, GrADS, and the MATLAB OPeNDAP Ocean Toolbox (http:// oceanographicdata.org/toolbox),treat locally held netCDF data in exactly the same way as remotely held data on an OPeNDAP server, providing the scientist with the capability to analyze and visualize huge quantities of distributed data.OPeNDAP servers can act as means of accessing data that are held by data centers in many other file formats such as HDF, GRIB, and BUFR.(Such formats are popular in other communities such as meteorology and Earth observation.)The end user does not need to know anything about which data format is used on the remote server.OPeNDAP is therefore a very powerful technology for data harmonization and integration.Discovering Data Each GODAE data provider has implemented a Web portal for users to discover and browse their products, and to provide users with links to download them.Dedicated catalogues exist at many of these sites to aid users in the discovery of specific data sets.This structure has led to the development of a large number of Web portals-each is designed differently, which is often confusing to users, particularly those outside the ocean community.Therefore, there are ongoing efforts to create integrated catalogues that provide users with a single point of discovery to an aggregation of data products held in GODAE archives.The MERSEA catalogue in Europe (by Loubrieu colleagues, submitted the Journal of Operational Oceanography) is one example of such an aggregated catalogue.Evaluating and Visualizing Data Having found data of potential interest through a text search at one of the sites, the user will often like to evaluate these data suitability for his or her Oceanography September 2009 75 application before acquiring it.Many viewing services are now implemented, providing access to either predefined or dynamically generated visualizations.The low-level data standards described in the section on base technologies are extremely important here: it would not be feasible to provide visual access to all the diverse data sets in the GODAE systems without first agreeing upon how data are formatted.
or her needs and to sift through the large amounts of available information.However, when the user knows which data products are required, he or she usually needs an efficient means of accessing the data.Two classes of technology are typically used: bulk file transfer and Web service.Bulk file transfer systems are based upon FTP and/or HTTP to transfer unmodified files from the data center to the user.These systems can be secured by various means.Users with the necessary access rights can be alerted to the presence of new data, which can then be downloaded.This simple download service provides an efficient and simple way to transfer whole data sets, file-byfile or pre-created derivative product files, either freely to all or through a registration process.

Figure 3 .
Figure3.interactive viewing services.left: the mErSEA Dynamic quick View system (http://www.resc.reading.ac.uk/mersea).This web site allows the user to pan and zoom through large data sets, adjust the color palette and scale, change the map projection, and create animations.right: Argonautica education project (http://www.jason.oceanobs.com/html/argonautica)demonstrating data visualization in Google Earth, which allows multiple data sources to be overlain.

(
regridding and differencing) between those data sets.Users interact with LAS primarily via a Web interface, behind which lies a data processing engine that can read data from many sources and process data in many ways.Having performed initial data visualization and processing on the Web, an LAS user can seamlessly switch to a desktop tool (such as MATLAB or Ferret) for more complex and customized analysis tasks.The most basic of LAS output products are custom scientific visualizations along all principal planes and axes: maps, time series, vertical sections and profiles, and Hofmuller (space-time) contour plots.

Figure 4
Figure4shows a montage of such outputs.All products are created on the

Figure 4 .
Figure 4.A montage of selected output products from a live Access Server.
Craig Donlon is Principal Scientist for Oceans and Ice, European Space Agency/European Space Research and Technology Centre, Noordwijk, the Netherlands.Peter Hacker is Manager, Asia-Pacific Data Research Center, International Pacific Research Center, University of Hawaii, Honolulu, HI, USA.Keith Haines is Director, Reading e-Science Centre, Environmental Systems Science Centre, University of Reading, Reading, UK.Steve C. Hankin is Research Scientist, National Oceanic and Atmospheric Administration Pacific Marine Environmental Laboratory, Seattle WA, USA.Thomas Loubrieu is a research scientist a the Institut français de recherche pour l'exploitation de la mer (Ifremer), Plouzané, France.Sylvie Pouliquen is Project Manager, Ifremer, Plouzané, France.Martin Price is Data Scientist, Ocean Forecasting Research and Development, Met Office, Exeter, UK.
is Technical Director, Reading e-Science Centre, Environmental Systems Science Centre, University of Reading, Reading, UK.Frederique Blanc is responsible for ocean product information and dissemination for the Space Oceanography Division, CLS, Ramonville-Saint-Agne, France.Mike Clancy is Technical and Scientific Director of the US Navy's Fleet Numerical Meteorology and Oceanography Center, Monterey, California, USA.Peter Cornillon is Professor of Physical Oceanography, Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA.Timothy F. Pugh is Senior IT Officer, Centre for Australian Weather and Climate Research, Bureau of Meteorology, Melbourne, Australia.Ashwanth Srinivasan is Research Assistant Professor, Division of Meteorology and Oceanography, Rosenstiel School of Marine and Atmospheric Science, University of Miami, Miami, FL, USA.Oceanography September 2009 73