To get an estimate of the accuracy of the interpolation, we make the interpolation for bottles where measurements were available. In this case the actual measurement was not used for the interpolation (i.e. was assumed missing). The interpolated value is then compared with the measured value. The interpolation errors are given in relation to an assumed measurement error, i.e. for oxygen the greater value from 1 umol/kg or 1% of the measured value (see table 1). Several parameter combinations and proximities were used for the MLR. As a reference for the error estimate, we take a linear pressure interpolation. At each station data just above and just below the point to be interpolated are used for a linear interpolation, above/below the uppermost/lowermost data point the interpolated value is taken from the nearest pressure.

Parameter | error | No. of | |

minimum | percentage | measurements | |

O | 1.0 | 1 | 3603 |

NO | 0.5 | 1 | 3497 |

PO | 0.1 | 1 | 3175 |

SiO | 1.0 | 1 | 3566 |

T-C | 2.0 | 0 | 754 |

F-11 | 0.02 | 1 | 88 |

F-12 | 0.01 | 1 | 1384 |

The proximity, i.e. a region surrounding the value to be interpolated
from which data for the interpolation are taken, can be defined as
geographical region. E.g. a box extending 2^ of latitude times 2^ of
longitude in the horizontal and 150 dbar in the vertical at the
surface. (Although for a zonal section as used here, the latitude
restriction has no influence). Taking into account that the water mass
characteristics are in general more homogeneous at greater depths and
that the bottle density is decreasing with depth, the vertical extent
can be increased with increasing pressure. With an additional 20% of
the pressure an interpolation for a bottle at 1000 dbar therefore
would have an vertical extent of 350 dbar (150 + 0.2*1000). For
individual hydrographic sections it is also possible to define the
horizontal extent using the station number instead of the geographic
location. Because station spacing is normally increased in regions of
strong gradients (for example the western boundary region compared to
the open ocean) such an approach takes into account previous
oceanographic knowledge of the parameter distribution. The *vertical* extent also does not have to be defined using pressure, a
definition using density incorporates the assumption that mixing
occurs predominantly along density surfaces. Other definitions of
proximity, even with more dimensions, are possible. The definitions
used here are given in Table 2. It only has to be taken
into consideration, that the given proximity must be big enough to
include sufficient data to carry out the MLR.

Case | Longitude | Station | Pressure | factor/max |

a | ... | 8 | ... | 0/1 |

b | ... | 8 | 250+20% | 0/1 |

c | ... | 8 | 150+15% | 0/1 |

d | ... | ... | 150+15% | 0/1 |

e | ... | 3 | 150+15% | 1.41/ 8 |

f | 2 | ... | 100+20% | 1.41/ 8 |

As the proximity goes to zero so does the error of the interpolation (assuming a smooth field). For a given data density the linear interpolation has the smallest possible proximity (2 data points or 1 in the case of extrapolation). The MLR uses more parameters, therefore needs more data points and therefore a larger proximity to find these data points. But because the MLR makes use of more information, the larger proximity does not imply a larger error.

A larger proximity also has advantages, measurement uncertainties or even plain wrong input values have a smaller influence in a larger data point ensemble then in a smaller number of data points. A drawback is that wrong data points influence more interpolated values with a larger proximity. Another advantage of using larger proximities is that larger data gaps can be interpolated. Using linear interpolation the 754 measured total dissolved inorganic carbon (T-C) values can be used to estimate a total of 1065 values, while with a larger proximity and using MLR the values for all bottles (N=3743) can be calculated. To be able to interpolate over large data gaps but in general still use small proximities we have the possibility to increase the proximity if there are not enough data points in the original proximity.

The interpolation was carried out using different parameter combinations (see Table 3). These parameter list gives the maximum parameters to be used, if for a certain bottle a parameter is missing, this parameter although given in the parameter list, cannot be used and therefore the MLR is made without this parameter.

CTD-data | Bottle-data | |||||||||||

Case | longitude | P | T | S | NDEN | O | AOU | NO | PO | SiO | T-C | |

lin | X | |||||||||||

a | X | X | X | |||||||||

b | X | X | X | X | ||||||||

c | X | X | X | X | ||||||||

d | X | X | X | X | ||||||||

e | X | X | X | X | X | |||||||

f | X | X | X | |||||||||

g | X | X | X | X | ||||||||

h | X | X | X | X | X | |||||||

1 | X | X | X | X | X | X | X | X | ||||

4 | X | X | X | X | X | X | X | |||||

5 | X | X | X | X | X | X | X | |||||

6 | X | X | X | X | X | X | X | X |

Using more parameters can increase the accuracy of the MLR interpolation, but several things have to be considered:

- adding parameters with a much lower data density then the parameter to be interpolated is not recommended because of the necessary increase of the proximity
- adding a parameter which is linearly correlated with another (e.G. NO and PO) does not give additional information and in some cases can introduce larger errors due to the measurement uncertainties (this clearly does not apply if one of the parameters is the one to be interpolated). If the data distributions of the two parameters are different, a positive effect will be seen using the parameter with the better distribution, or to first interpolated one of the parameters using the other and then used these interpolated values for the interpolation of other parameters.
- adding a parameter which has no correlation with the parameter to be interpolated has the negative effect of increasing the error due to noise.

Parameter | lin | |||||||

O | 5.13 | 17.73 | 6.56 | 0.33 | 0.06 | 0.66 | 0.40 | |

NO | 2.03 | 6.73 | 2.66 | 2.29 | 1.53 | 2.00 | 1.51 | |

PO | 0.65 | 2.12 | 0.92 | 0.97 | 0.67 | 0.75 | 0.64 | |

SiO | 1.71 | 8.02 | 1.21 | 4.62 | 0.97 | 4.52 | 1.03 | |

T-C | 5.77 | 9.92 | 3.94 | 2.90 | 2.02 | 1.71 | 1.33 | |

F-11 | 2.56 | 2.25 | 1.80 | 2.41 | 1.97 | 1.37 | 1.05 | |

F-12 | 6.26 | 12.03 | 4.06 | 5.01 | 3.20 | 4.29 | 3.12 |

The results (Table 4 and Figure 3) shows that using no restriction for the vertical proximity (case A) or no restriction in the horizontal proximity (case D) generally gives larger errors then the linear interpolation for the high density data (oxygen and nutrients). We could not conclude if the better results for lower data density parameters like T-C are because the linear regression is not so good or because T-C is better described with the MLR then the nutrients.

As expected for a more local approach (smaller proximity) the interpolation generally gives smaller errors for more local proximities. We can also see that using a larger amount of parameters the interpolation also becomes better. The importance of a parameter varies with the parameter to be interpolated, oxygen is quite important for the interpolation of nutrients and T-C, but it has not such an great influence on silica.

Figure 4 shows the errors of some interpolations as a function of pressure. For all interpolations the errors in the upper several hundred meters are much larger then in greater depths. This is not unexpected due to the generally higher temporal and spatial variability near the surface. There is a, possible, positive side-effect associated with these higher interpolation errors in surface waters. Some part of this higher natural variability is, in the context of a temporal and spatial mean field, random noise. For example two data points with the same temperature have different T-C content due to local over- or under-saturation, one water parcel had experienced a fast warming, the other cooling. After some time the two water parcels will again reach equilibrium with the atmosphere. Or the two data points are effectively the same water parcel, just at different times. Assuming that other parameters stayed unchanged, the MLR effectively takes the mean value of the measurements as interpolated value. That is, there is a certain amount of smoothing associated with the MLR, where the interpolated values corresponds more to the mean field then the actual single measurements.

Greater care must be taken in the data quality evaluation (DQE) when using MLR. Using a pressure interpolation just the parameter to be interpolated and pressure have to be consistent. But in the MLR an error in just one of the used parameter leads to an error in the interpolated value. So good data quality control is essential, but MLR can also help us in the DQE. The parameters measured with the CTD (pressure, temperature, salinity, oxygen and inferred parameters as density) are in essence self consistent. A MLR using just these parameters to interpolate another parameter as for example silica gives high deviations for silica data which is inconsistent with the other silica data (for example measurement error) or the CTD-data (for example wrong bottle closing depth). In essence this corresponds to the evaluation of property-property plots. If we use just high quality historic data to infer the coefficients of the MLR and then apply them to the actual data we can also infer measurement offsets or even temporal changes as for example in anthropogenic tracers (see below).

Using just the parameters available from the CTD it is also possible to interpolate the bottle parameters like nutrients, T-C, etc. onto the high resolution CTD data. This can be of importance when calculating transports, as the strongest currrents are generally in the core of a water mass which is characterized by extrema in water mass characteristics.