The GEFS-Wave Retrospective was run for the 00 cycle daily from December 2018 through November 2019. The results are compared to the two current operational wave models at NCEP:

- GLobal Wave Ensemble System (GWES)
- Global Wave Model (Multi_1)

All of the wave models compared here are WAVEWATCH III, albeit of different versions.

In order to browse the archive by individual buoy, follow these steps:

- Select a Buoy
- Select a Year
- Select a Month
- Select a Forecast

This will open up the tabs available for "Individual Buoys". Not all buoys have data for significant wave height (Hs), primary wave peak period (Tp), 10-m wind-speed (u10), and 10-m wind-direction (udir).

The buoy files generated by the wave model were used for this verification. This means that some buoys are not in all 3 models compared. Results are given where available.

To browse the archive by aggregated buoys (i.e. all buoys available per day), follow these steps:

- Select a Year
- Select a Month
- Select a Forecast

Statistics are given on the aggregated buoys, i.e. all buoys available per day, which is usually between 100-200 buoys. For these statistics, the deterministic models are compared against the mean of the ensembles

Calculated as the difference between the observations continuous distribution function (CDF) and the forecasts CDF. Defaults to the Mean Absolute Error in the case of a deterministic forecast.

Brier Score for an ensemble for exceeding the specified threshold. It calculates the mean squared error between predicted probabilities and the expected values. The score summarizes the magnitude of the error in the probability forecasts. The score falls between 0.0 and 1.0, with perfect skill having a score of 0.0.

Rank histograms (sometimes called verification rank histograms or Talagrand diagrams) are a way to show how reliable an ensemble forecast is compared to a set of newly observed data. In other words, they show the bias for the model. If an ensemble forecast is accurate, the rank histogram — a graph of observed data — will be flat. Deviations from a uniform distribution(i.e. histogram blocks that are above or below the red line) mean that the model isn’t completely accurate. These types of diagrams are not commonly used outside of ensemble forecasting. (2016 Statistics How To)

Proper Scoring and How To Score Probability Predictions In Python

All the wave models are driven by 10-m surface winds:

- GEFS-Wave is driven by 1/4 degree winds
- GWES is driven by 1/2 degree GEFSv11 winds
- Multi_1 is driven by 1/4 degree GDAS/GFSv15 winds

NDBC buoy wind speeds are recorded at the buoy anemometer height and are not height adjusted by NDBC in the archive. We estimate the winds at 10 meters by using as method described by S. A. Hsu et al. (1994).

Although never used operationally by NDBC, the method was tested and found to compare favorably with the more elaborate method under near-neutral stability. This is the condition most frequently encountered at sea and occurs when air and water temperatures are not too far apart. The method, referred to as the Power Law Method, is offered here for those who may want to explore the nature of the marine wind speed profile without having to deal with the complexity of the above method. The relationship is:

u2 = u1 (z2/z1)^Pwhere u2 is the wind speed at the desired reference height, z2, and u1 is the wind speed measured at height z1. A value for the exponent, P, equal to 0.11 was empirically determined to be applicable most of the time over the ocean.

Reference

Hsu, S. A., Eric A. Meindl, and David B. Gilhousen, 1994:
Determining the Power-Law Wind-Profile Exponent under Near-Neutral
Stability Conditions at Sea,
Applied Meteorology, Vol. 33, No. 6,
June 1994.

The significant wave height is defined as the mean wave height (trough to crest) of the highest third of the waves. Note that the highest wave height of an individual wave will be significantly larger. Significant wave height values are in meters (m).

The peak wave period is estimated as the period corresponding to the highest peak in the one dimensional frequency spectrum of the wave field. The wave field generally consists of a set of individual wave fields. The peak period identifies either the locally generated "wind sea" (in cases with strong local winds) or the dominant wave system ("swell") that is generated elsewhere. Peak wave period values are in seconds (s).

The mean difference between the model and observations, measures the tendency of the model process to over- or under-estimate the value of a parameter. Smaller absolute bias values indicate better agreement between measured and calculated values. Positive bias means overprediction, negative means underprediction.

diff = model_data - buoy_data bias = diff.mean()

Also called the root-mean-squared deviation, it's a measure of the differences between the observed and predicted values. Smaller RMSE values indicate better agreement between measured and calculated values.

rmse=(diff**2).mean()**0.5

Defined as the standard deviation of the difference between model and observations, normalised by the mean of the observations. Smaller values of SI indicate better agreement between the model and observations. Note that low wave heights or areas where wind seas dominate can result in high SI values.

scatter_index=100.0*(((diff**2).mean())**0.5 - bias**2)/buoy_data.mean()

Shows the relationship of a specific parameter (u10, udir, Hs, Tp) by plotting the buoy value on the x-axis and the model value on the y-axis. The ideal agreement would line up all the points along the 45 degree dashed line. The solid blue line represents the Ordinary Least Squares (OLS) fit to the data. Additionally, the Adjusted R-Squared value is given: this is a statistical measure of how well the regression line approximates the real data points, adjusted based on the number of observations and the degrees-of-freedom of the residuals.

A Q-Q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. This is a graphical technique for comparing two probability distributions - if the two distributions agree, then the Q-Q plot follows some line. Perfect agreement would yield the y=x line. If the general trend of th Q-Q plot is flatter than the line y=x, the distribution plotted on the x-axis is more dispersed than the distribution plotted on the y-axis, and vice versa.

Taylor diagrams are usually used to show how a variety of different models do compared to the same data source. However, we want to show how one model does at a variety of data sources (different buoys). In order to put the data from many different buoys on the same Taylor diagram, first you have to normalize the model and observation values by the standard deviation of the observation. In this case the model buoys standard deviation is divided by the observed buoys standard deviation. But you must also normalize the observed values standard deviation: this means that the observations standard deviation will be set to 1, cross-correlation is 1, and the RMS is 0. So you've basically replaced all the observed points by the star at 1 along the x-axis of the Taylor diagram, and you just plot all of the model buoys statistics.

The "Aggregated Statistics" and "Taylor Diagram" tabs contains time series for all the variables gathered across all reporting buoys and analyzed per-day for the full month.