De-spike the shear-probe data: Difference between revisions

From Atomix
Aleboyer (talk | contribs)
mNo edit summary
Rolf (talk | contribs)
No edit summary
Line 1: Line 1:
There is currently no standard method of de-spiking shear-probe data.
There is currently no standard method of de-spiking shear-probe data.
An algorithm that seems to effectively remove spikes from shear-probe data uses the following steps.
# Data are high-pass filtered with a first-order Butterworth filter with a cutoff frequency of <math>0.1\, \mathrm{Hz}</math> to remove any offset and very-low frequency signals.
# The shear data are rectified by taking their absolute value.
# A copy of the rectified shear-probe data is smoothed with a first-order low-pass filter with a cutoff frequency, that is usually in the range of <math>0.25</math> to <math>2\ \mathrm{Hz}</math>.
# Those samples for which the ratio of the absolute to the smoothed absolute shear exceeds a threshold (8 is a typical choice), are identified as spikes.
# A number <math>N</math> of samples after a spike and <math>N/2</math> samples before a spike are replace by a constant value equal to the mean shear of an interval of one-half second before and after this area of replacement.
The purpose of the low-pass filter is to establish the level of shear in a neighbourhood of duration that is roughly equal to the inverse of the low-pass filter cutoff frequency.
A shear sample is anomalous if its magnitude exceeds the typical magnitude of its neighbourhood by more than a factor of the threshold.
Thus, if the variance of shear is small, a small anomaly is detected, while the same anomaly remains undetected if the variance of shear is large.
That is, only anomalies that have the potential to bias the variance are removed.
What is a suitable neighbourhood and low-pass cutoff frequency?
Turbulent patches in the ocean seem to seldom be thinner than about <math>0.5\ \mathrm{m}</math> in the vertical direction.
This can serve as a lower limit to the neighbourhood and an upper limit to the cutoff frequency.
Thus, if a vertical profiler is moving at a speed of <math>0.5\ \mathrm{m\, s^{-1}}</math>, then the cutoff frequency should be no higher than <math>1\ \mathrm{Hz}</math>.
Gliders that move more slowly and at an angle of <math>30^{\circ}</math> with respect to the horizontal should use a lower cutoff frequency to establish a neighbourhood for comparison of the ratio of signals, because they will take longer to pass through a patch of turbulence.
A spike usually consists of a number of contiguous samples. A region surrounding the spike is then replaced by a local mean calculated using data from both sides of the spike (but excluding the spike itself). Because the response of the shear probe to a collision with plankton  -- the ringing of anomalously large amplitude – is a temporal response, the amount of data replaced by a local mean is usually of fixed duration and not fixed in length. Typically, the amount of data replaced is 20ms before a spike and 40ms after a spike. This algorithm is applied iteratively until no more spikes are detected.
The question of how many points around a spike should be removed is determined by the typical relaxation time of the shear-probe to a collision with zooplankton.
Such anomalies seem to last about <math>0.04\ \mathrm{s}</math>, and so a good choice is <math>N=0.04\, f_s</math> where <math>f_s</math> is the sampling rate.
The iterative application permits longer anomalies to be removed, such as those that might occur because of collisions with jelly fish and seaweed. The fraction of the data that is altered by a de-spiking routine must be noted for each diss-length segment because this is a quality-control metric. Dissipation estimates should be treated with caution if the fraction of altered data exceeds a few percent. But, there is currently no standard for what is an acceptable fraction.
-------
This algorithm is often applied iteratively until no more anomalies are detected.


One method consists of calculating the absolute value of the (high-pass filtered) shear. A copy of this absolute shear signal is smoothed with a low-pass filter that has a cut-off frequency that is approximately the expected minimum duration of turbulence patches (usually about 1 meter divided by the speed of profiling). When the ratio of the instantaneous absolute shear divided by the smoothed absolute shear exceeds a threshold, the data is deemed to be a spike. A typical threshold is 8.  
One method consists of calculating the absolute value of the (high-pass filtered) shear. A copy of this absolute shear signal is smoothed with a low-pass filter that has a cut-off frequency that is approximately the expected minimum duration of turbulence patches (usually about 1 meter divided by the speed of profiling). When the ratio of the instantaneous absolute shear divided by the smoothed absolute shear exceeds a threshold, the data is deemed to be a spike. A typical threshold is 8.  
Line 6: Line 39:


The iterative application permits longer anomalies to be removed, such as those that might occur because of collisions with jelly fish and seaweed. The fraction of the data that is altered by a de-spiking routine must be noted for each diss-length segment because this is a quality-control metric. Dissipation estimates should be treated with caution if the fraction of altered data exceeds a few percent. But, there is currently no standard for what is an acceptable fraction.
The iterative application permits longer anomalies to be removed, such as those that might occur because of collisions with jelly fish and seaweed. The fraction of the data that is altered by a de-spiking routine must be noted for each diss-length segment because this is a quality-control metric. Dissipation estimates should be treated with caution if the fraction of altered data exceeds a few percent. But, there is currently no standard for what is an acceptable fraction.
----------------------------
return to [[Flow chart for shear probes]]

Revision as of 19:57, 19 November 2021

There is currently no standard method of de-spiking shear-probe data.


An algorithm that seems to effectively remove spikes from shear-probe data uses the following steps.

  1. Data are high-pass filtered with a first-order Butterworth filter with a cutoff frequency of [math]\displaystyle{ 0.1\, \mathrm{Hz} }[/math] to remove any offset and very-low frequency signals.
  2. The shear data are rectified by taking their absolute value.
  3. A copy of the rectified shear-probe data is smoothed with a first-order low-pass filter with a cutoff frequency, that is usually in the range of [math]\displaystyle{ 0.25 }[/math] to [math]\displaystyle{ 2\ \mathrm{Hz} }[/math].
  4. Those samples for which the ratio of the absolute to the smoothed absolute shear exceeds a threshold (8 is a typical choice), are identified as spikes.
  5. A number [math]\displaystyle{ N }[/math] of samples after a spike and [math]\displaystyle{ N/2 }[/math] samples before a spike are replace by a constant value equal to the mean shear of an interval of one-half second before and after this area of replacement.

The purpose of the low-pass filter is to establish the level of shear in a neighbourhood of duration that is roughly equal to the inverse of the low-pass filter cutoff frequency. A shear sample is anomalous if its magnitude exceeds the typical magnitude of its neighbourhood by more than a factor of the threshold. Thus, if the variance of shear is small, a small anomaly is detected, while the same anomaly remains undetected if the variance of shear is large. That is, only anomalies that have the potential to bias the variance are removed.

What is a suitable neighbourhood and low-pass cutoff frequency? Turbulent patches in the ocean seem to seldom be thinner than about [math]\displaystyle{ 0.5\ \mathrm{m} }[/math] in the vertical direction. This can serve as a lower limit to the neighbourhood and an upper limit to the cutoff frequency. Thus, if a vertical profiler is moving at a speed of [math]\displaystyle{ 0.5\ \mathrm{m\, s^{-1}} }[/math], then the cutoff frequency should be no higher than [math]\displaystyle{ 1\ \mathrm{Hz} }[/math]. Gliders that move more slowly and at an angle of [math]\displaystyle{ 30^{\circ} }[/math] with respect to the horizontal should use a lower cutoff frequency to establish a neighbourhood for comparison of the ratio of signals, because they will take longer to pass through a patch of turbulence.

A spike usually consists of a number of contiguous samples. A region surrounding the spike is then replaced by a local mean calculated using data from both sides of the spike (but excluding the spike itself). Because the response of the shear probe to a collision with plankton -- the ringing of anomalously large amplitude – is a temporal response, the amount of data replaced by a local mean is usually of fixed duration and not fixed in length. Typically, the amount of data replaced is 20ms before a spike and 40ms after a spike. This algorithm is applied iteratively until no more spikes are detected.

The question of how many points around a spike should be removed is determined by the typical relaxation time of the shear-probe to a collision with zooplankton. Such anomalies seem to last about [math]\displaystyle{ 0.04\ \mathrm{s} }[/math], and so a good choice is [math]\displaystyle{ N=0.04\, f_s }[/math] where [math]\displaystyle{ f_s }[/math] is the sampling rate.

The iterative application permits longer anomalies to be removed, such as those that might occur because of collisions with jelly fish and seaweed. The fraction of the data that is altered by a de-spiking routine must be noted for each diss-length segment because this is a quality-control metric. Dissipation estimates should be treated with caution if the fraction of altered data exceeds a few percent. But, there is currently no standard for what is an acceptable fraction.




This algorithm is often applied iteratively until no more anomalies are detected.

One method consists of calculating the absolute value of the (high-pass filtered) shear. A copy of this absolute shear signal is smoothed with a low-pass filter that has a cut-off frequency that is approximately the expected minimum duration of turbulence patches (usually about 1 meter divided by the speed of profiling). When the ratio of the instantaneous absolute shear divided by the smoothed absolute shear exceeds a threshold, the data is deemed to be a spike. A typical threshold is 8.

A spike usually consists of a number of contiguous samples. A region surrounding the spike is then replaced by a local mean calculated using data from both sides of the spike (but excluding the spike itself). Because the response of the shear probe to a collision with plankton -- the ringing of anomalously large amplitude – is a temporal response, the amount of data replaced by a local mean is usually of fixed duration and not fixed in length. Typically, the amount of data replaced is 20ms before a spike and 40ms after a spike. This algorithm is applied iteratively until no more spikes are detected.

The iterative application permits longer anomalies to be removed, such as those that might occur because of collisions with jelly fish and seaweed. The fraction of the data that is altered by a de-spiking routine must be noted for each diss-length segment because this is a quality-control metric. Dissipation estimates should be treated with caution if the fraction of altered data exceeds a few percent. But, there is currently no standard for what is an acceptable fraction.




return to Flow chart for shear probes