Replacement strategies for missing velocities: Difference between revisions
mNo edit summary |
|||
Line 3: | Line 3: | ||
|level=level 2 segmented and quality controlled | |level=level 2 segmented and quality controlled | ||
}} | }} | ||
Quality-control of raw velocities results in data loss, which usually must be replaced before computing the spectra necessary for obtaining <math>\varepsilon</math>. Several techniques were considered for replacing the missing samples: | Quality-control of raw velocities results in data loss, which usually must be replaced before computing the spectra necessary for obtaining <math>\varepsilon</math>. The number of missing samples that can be tolerated for computing reliable spectra was also investigated. | ||
== Data analysis tests == | |||
===Considerations=== | |||
Several techniques were considered for replacing the missing samples: | |||
* Linear interpolation | * Linear interpolation | ||
* Using the variance of the signal | * Using the variance of the signal, which is commonly used by those intending to compute eddy-covariances | ||
* Unevenly spaced least-square Fourier transform (i.e., no replacement at all) | * Unevenly spaced least-square Fourier transform (i.e., no replacement at all) | ||
These replacement strategies were trialed with one of the cleanest benchmarks (Underice MAVS sampling at 8 Hz) for different | |||
* Number of missing samples to identify a threshold where the segment should be completely discarded from further analysis | |||
** 10, 25, and 50% of the {{FontColor|fg=white|bg=red|text=XX min}} timeseries was removed | |||
* Data loss (gap) duration | |||
** 1 sample | |||
** 8 samples (1 s) | |||
** | ** 480 samples (60 s) | ||
** 960 samples (120 s) | |||
===Results=== | ===Results=== | ||
{{FontColor|fg=white|bg=red|text=Insert graphs with example spectra for different tests}} | {{FontColor|fg=white|bg=red|text=Insert graphs with example spectra for different tests}} | ||
For all tests, the linear interpolation did the best job in recovering the original spectra, followed by the unevenly spaced techniques. However, the unevenly spaced fourier transforms behaved similarly (if not worse) than the variance replacement when the data loss was intermittent. Unsurprisingly, unevenly spaced techniques fair better if the data loss form long continuous gaps. | |||
* For a given data loss (e.g., 10%), the original spectra was easier to recover with longer continuous gaps. | |||
== Recommendations == | == Recommendations == | ||
Always use linear interpolation, and reject epsilon segments with more than 10% data loss. This threshold may be relaxed (20%) if the data loss forms long continuous chunks of several seconds. The person processing the data should record the threshold used and report it in the NetCDF flags. |
Revision as of 15:42, 4 July 2022
Quality-control of raw velocities results in data loss, which usually must be replaced before computing the spectra necessary for obtaining [math]\displaystyle{ \varepsilon }[/math]. The number of missing samples that can be tolerated for computing reliable spectra was also investigated.
Data analysis tests
Considerations
Several techniques were considered for replacing the missing samples:
- Linear interpolation
- Using the variance of the signal, which is commonly used by those intending to compute eddy-covariances
- Unevenly spaced least-square Fourier transform (i.e., no replacement at all)
These replacement strategies were trialed with one of the cleanest benchmarks (Underice MAVS sampling at 8 Hz) for different
- Number of missing samples to identify a threshold where the segment should be completely discarded from further analysis
- 10, 25, and 50% of the XX min timeseries was removed
- Data loss (gap) duration
- 1 sample
- 8 samples (1 s)
- 480 samples (60 s)
- 960 samples (120 s)
Results
Insert graphs with example spectra for different tests For all tests, the linear interpolation did the best job in recovering the original spectra, followed by the unevenly spaced techniques. However, the unevenly spaced fourier transforms behaved similarly (if not worse) than the variance replacement when the data loss was intermittent. Unsurprisingly, unevenly spaced techniques fair better if the data loss form long continuous gaps.
- For a given data loss (e.g., 10%), the original spectra was easier to recover with longer continuous gaps.
Recommendations
Always use linear interpolation, and reject epsilon segments with more than 10% data loss. This threshold may be relaxed (20%) if the data loss forms long continuous chunks of several seconds. The person processing the data should record the threshold used and report it in the NetCDF flags.