Replacement strategies for missing velocities: Difference between revisions

Latest revision as of 19:10, 5 July 2022

{{#default_form:Processing}} {{#arraymap: Velocity point-measurements |,|x||}} {{#arraymap:level 2 segmented and quality controlled|,|x||}}

Quality-control of raw velocities results in data loss, which usually must be replaced before computing the spectra necessary for obtaining $ε$ . The number of missing samples that can be tolerated for computing reliable spectra was also investigated.

Data analysis tests

Techniques considered for replacing the missing samples

Linear interpolation
Using the variance of the signal, which is commonly used by those intending to compute eddy-covariances
Unevenly spaced least-square Fourier transform (i.e., no replacement at all)

Replacement strategy for tests

These replacement strategies were trialed with one of the cleanest benchmarks (Underice MAVS sampling at 8 Hz) for different

Number of missing samples to identify a threshold where the segment should be completely discarded from further analysis
- 10, 25, and 50% of the 30 min timeseries were removed
Data loss (gap) duration
- 1 sample
- 8 samples (1 s)
- 480 samples (60 s)
- 960 samples (120 s)

A total section of 30 min was chosen as this coincides with the required segment length for deriving an estimate of $ε$ .

Test Results

For all tests, the linear interpolation did the best job in recovering the original spectra, followed by the unevenly spaced techniques. However, when the data loss was intermittent, the unevenly spaced Fourier transforms behaved similarly (if not worse) than the variance replacement. Unsurprisingly, unevenly spaced techniques fair better if the data loss creates long continuous gaps.

Recommendations

Use linear interpolation, and record the percent of good samples in each segment in the NetCDF Level 3 data.
Reject and flag $ε$ associated with segments with more than 10% data loss.
Record the threshold used for flagging and report it in the NetCDF file at Level 4.
- The rejection threshold may be relaxed if the data loss forms long continuous chunks of several seconds, after testing the actual time series used to establish if the spectra can tolerate more missing samples. Recording the threshold used at Level 4 would permit others to exclude these data at a later time.

Return to Preparing quality-controlled velocities

@@ Line 6: / Line 6: @@
 == Data analysis tests ==
-===Considerations===
+===Techniques considered for replacing the missing samples===
-Several techniques were considered for replacing the missing samples:
 * Linear interpolation
 * Using the variance of the signal, which is commonly used by those intending to compute eddy-covariances
 * Unevenly spaced least-square Fourier transform (i.e., no replacement at all)
+[[File:Timeseries_replacement_strategies.png|thumbnail|600px|Example velocity time series where we randomly removed data in varying length gaps. Only the example of 1min (480 samples at 8Hz sampling) are illustrated. Removing chunks of 8 continuous samples (1s) looks identical to the original time series and is not illustrated.]]
+=== Replacement strategy for tests ===
 These replacement strategies were trialed with one of the cleanest benchmarks (Underice MAVS sampling at 8 Hz) for different
 * Number of missing samples to identify a threshold where the segment should be completely discarded from further analysis
-** 10, 25, and 50% of the {{FontColor|fg=white|bg=red|text=XX min}} timeseries was removed
+** 10, 25, and 50% of the 30 min timeseries were removed
 * Data loss (gap) duration
 ** 1 sample
@@ Line 21: / Line 23: @@
 ** 960 samples (120 s)
-===Results===
-{{FontColor|fg=white|bg=red|text=Insert graphs with example spectra for different tests}}
-For all tests, the linear interpolation did the best job in recovering the original spectra, followed by the unevenly spaced techniques. However, the unevenly spaced fourier transforms behaved similarly  (if not worse) than the variance replacement when the data loss was intermittent. Unsurprisingly, unevenly spaced techniques fair better if the data loss form long continuous gaps.
-* For a given data loss (e.g., 10%), the original spectra was easier to recover with longer continuous gaps.
+A total section of 30 min was chosen as this coincides with the required segment length for deriving an estimate of <math>\varepsilon</math>.
+==Test Results==
+For all tests, the linear interpolation did the best job in recovering the original spectra, followed by the unevenly spaced techniques. However, when the data loss was intermittent, the unevenly spaced Fourier transforms behaved similarly  (if not worse) than the variance replacement. Unsurprisingly, unevenly spaced techniques fair better if the data loss creates long continuous gaps.
+[[File:Spectra_replacement_strategies.png|center|thumbnail|800px|Spectral estimated from the data presented above after using the different replacement strategies: unevenly spaced least-square Fourier transforms (uneven), the variance of the signal (var), and linear interpolation (interp). The original time series refers to the data before removing (randomly) samples. For a given data loss (e.g., 10%), the original spectra were easier to recover with longer continuous gaps (d vs a)]]
 == Recommendations ==
-Always use linear interpolation, and reject epsilon segments with more than 10% data loss. This threshold may be relaxed (20%) if the data loss forms long continuous chunks of several seconds. The person processing the data should record the threshold used and report it in the NetCDF flags.
+* Use linear interpolation, and record the percent of good samples in each segment in the NetCDF [[Level 3 data (velocity point-measurements)| Level 3 data]].
+* Reject and flag <math>\varepsilon</math> associated with segments with more than 10% data loss.
+* Record the threshold used for flagging and report it in the NetCDF file at [[Level 4 data (velocity point-measurements)| Level 4]].
+** The rejection threshold may be relaxed if the data loss forms long continuous chunks of several seconds, after testing the actual time series used to establish if the spectra can tolerate more missing samples. Recording the threshold used at [[Level 4 data (velocity point-measurements)| Level 4]] would permit others to exclude these data at a later time.
+----
+Return to [[Preparing quality-controlled velocities]]

Anonymous

Search

Replacement strategies for missing velocities: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 19:10, 5 July 2022

Contents

Data analysis tests

Techniques considered for replacing the missing samples

Replacement strategy for tests

Test Results

Recommendations

Navigation

Navigation

ATOMIX

Other

Wiki tools

Wiki tools

Anonymous

Search

Replacement strategies for missing velocities: Difference between revisions

Latest revision as of 19:10, 5 July 2022

Data analysis tests

Techniques considered for replacing the missing samples

Replacement strategy for tests

Test Results

Recommendations

Navigation

Wiki tools

Page tools

Categories