Segmenting datasets: Difference between revisions

From Atomix
mNo edit summary
mNo edit summary
Line 6: Line 6:
* [[Time and length scales of turbulence]]
* [[Time and length scales of turbulence]]
* [[Stationarity]] of the segment and [[Taylor's Frozen Turbulence| Taylor's frozen turbulence hypothesis]]
* [[Stationarity]] of the segment and [[Taylor's Frozen Turbulence| Taylor's frozen turbulence hypothesis]]
* Statistical significance of the resulting spectra. This consideration is important if you need to remove motion-induced contamination from the spectra.
* Required statistical significance of the resulting spectra (important if you need to remove motion-induced contamination from the spectra)


== Considerations ==
== Considerations ==
Line 12: Line 12:
* continuously, or in such long [[Burst sampling|bursts]] that they can be considered continuous
* continuously, or in such long [[Burst sampling|bursts]] that they can be considered continuous
* short [[Burst sampling|bursts]] that are typically  at most 2-3x the expected largest [[Time and length scales of turbulence|turbulence time scales]] (e.g., 10 min in ocean environments)
* short [[Burst sampling|bursts]] that are typically  at most 2-3x the expected largest [[Time and length scales of turbulence|turbulence time scales]] (e.g., 10 min in ocean environments)
This segmenting step dictates the minimum [[Burst sampling|burst]] duration when setting up your equipment. The act of chopping a time series into smaller subsets, i.e., segments, is effectively a form of low-pass (box-car) filtering. The length of the [[Segmenting datasets|segment]] in time is usually a more important consideration than [[Detrending time series|detrending the time series]] when estimating <math>\varepsilon</math> from the [[Velocity inertial subrange model|inertial subrange]] of the final spectra computed over each segment.  
This segmenting step dictates the minimum [[Burst sampling|burst]] duration when setting up your equipment. The act of chopping a time series into smaller subsets, i.e., segments, is effectively a form of low-pass (box-car) filtering. The length of the [[Segmenting datasets|segment]] in time is usually a more important consideration than [[Detrending time series|detrending the time series]] when estimating <math>\varepsilon</math> from the [[Velocity inertial subrange model|inertial subrange]] of the final spectra.  
<div><ul>
 
<li style="display: inline-block; vertical-align: top;"> [[File:Short timeseries.png|thumb|center|500px|512 s segment of the measured velocities after applying different [[Detrending time series|detrending methods]]]]  
[[File:Short timeseries.png|left|thumb|500px|512 s segment of the measured velocities after applying different [[Detrending time series|detrending methods]]]]  
</li>
[[File:Short_spectra.png|center|thumb|500px|Example velocity spectra of the short 512 s of records before and after different detrending techniques applied to the original 6h  time series. The impact of the detrending method can be seen at the lowest frequencies only]]  
<li style="display: inline-block; vertical-align: top;"> [[File:Short_spectra.png|thumb|center|500px|Example velocity spectra of the short 512 s of records before and after different detrending techniques applied to the original 6h  time series. The impact of the detrending method can be seen at the lowest frequencies only]] </li>
 
</ul></div>


==Constraints==
The shorter the segment, the higher the temporal resolution of the final <math>\varepsilon</math> time series, and the more likely the segment will be [[Stationarity|stationary]]. However, the spectrum's lowest resolved frequency and final resolution depend on the duration of the signal used to construct the spectrum. Therefore, the segment must remain sufficiently long such that the lowest wavenumber (frequencies) of the [[Velocity inertial subrange model|inertial subrange]] are retained by the spectra. This is particularly important when measurement noise drowns the highest wavenumber (frequencies) of the [[Velocity inertial subrange model|inertial subrange]]. Thus, using too short segments may inadvertently render the spectra unusable for deriving  <math>\varepsilon</math> from the [[Velocity inertial subrange model|inertial subrange]] by virtue of no longer resolving this subrange.
The shorter the segment, the higher the temporal resolution of the final <math>\varepsilon</math> time series, and the more likely the segment will be [[Stationarity|stationary]]. However, the spectrum's lowest resolved frequency and final resolution depend on the duration of the signal used to construct the spectrum. Therefore, the segment must remain sufficiently long such that the lowest wavenumber (frequencies) of the [[Velocity inertial subrange model|inertial subrange]] are retained by the spectra. This is particularly important when measurement noise drowns the highest wavenumber (frequencies) of the [[Velocity inertial subrange model|inertial subrange]]. Thus, using too short segments may inadvertently render the spectra unusable for deriving  <math>\varepsilon</math> from the [[Velocity inertial subrange model|inertial subrange]] by virtue of no longer resolving this subrange.


Line 26: Line 24:
{{FontColor|fg=white|bg=red|text= Are the peaks in the MAVS data vortex shedding from the rings. Check the motion sensors onboard?}}
{{FontColor|fg=white|bg=red|text= Are the peaks in the MAVS data vortex shedding from the rings. Check the motion sensors onboard?}}


[[File:Segment_anisotropy.png|left|thumbnail|350px|Fig. 1: Example theoretical velocity spectra for different  <math>\varepsilon</math> with the empirical limit <math>\hat{k}L_k\sim0.1</math>  denoted by the diamonds (<math>\hat{k}</math> is in rad/m). The inertial subrange extends to smaller wavenumber <math>k</math> [cpm] as <math>\varepsilon</math> increases. The lowest frequency resolved by a spectra is the inverse of the fft-length used when computing the spectra. The colored lines are spectral observations from a dataset with <span id="fastepsi">fast speeds and large</span> <math>\varepsilon</math>. In this example, we used relatively short segments (128s) to estimate the spectra from fft-length of 32 s (2048 samples @ 64 Hz). The impact of [[Velocity inertial subrange model#anisotropy|turbulence anisotropy]] is also visible through the flattening of the spectra around 1 cpm]]
[[File:Segment_anisotropy.png|left|thumbnail|350px|Fig. 1: Example theoretical velocity spectra for different  <math>\varepsilon</math> with the empirical limit <math>\hat{k}L_k\sim0.1</math>  denoted by the diamonds (<math>\hat{k}</math> is in rad/m). The inertial subrange extends to smaller wavenumber <math>k</math> [cpm] as <math>\varepsilon</math> increases. The lowest frequency resolved by a spectra is the inverse of the fft-length used when computing the spectra. The colored lines are spectral observations from a dataset with <span id="fastepsi">fast speeds and large</span> <math>\varepsilon</math>. In this example, we used relatively short segments (128s) to estimate the spectra from fft-length of 32 s (2048 samples @ 64 Hz). The impact of [[Velocity inertial subrange model#anisotropy|turbulence anisotropy]] is also visible through the flattening of the spectra around 1 cpm. The secondary x-axis show the corresponding frequencies for a range of mean speeds past the sensors]]


[[File:SegmentAnisotropyLowE.png|center|thumbnail|350px|Fig. 2: Same as Fig 1 but for a different  dataset with <span id="lowepsi">low speeds and low</span> <math>\varepsilon</math>, requiring the use of relatively long segments (1024s) to estimate the spectra from fft-length of 512 s (4096 samples @ 8 Hz).]]
[[File:SegmentAnisotropyLowE.png|center|thumbnail|350px|Fig. 2: Same as Fig 1 but for a different  dataset with <span id="lowepsi">low speeds and low</span> <math>\varepsilon</math>, requiring the use of relatively long segments (1024s) to estimate the spectra from fft-length of 512 s (4096 samples @ 8 Hz).]]

Revision as of 20:19, 10 July 2022


Once the raw observations have been quality-controlled, then you must split the time series into shorter segments by considering:

Considerations

Measurements are typically collected in the following two ways:

  • continuously, or in such long bursts that they can be considered continuous
  • short bursts that are typically at most 2-3x the expected largest turbulence time scales (e.g., 10 min in ocean environments)

This segmenting step dictates the minimum burst duration when setting up your equipment. The act of chopping a time series into smaller subsets, i.e., segments, is effectively a form of low-pass (box-car) filtering. The length of the segment in time is usually a more important consideration than detrending the time series when estimating [math]\displaystyle{ \varepsilon }[/math] from the inertial subrange of the final spectra.

512 s segment of the measured velocities after applying different detrending methods
Example velocity spectra of the short 512 s of records before and after different detrending techniques applied to the original 6h time series. The impact of the detrending method can be seen at the lowest frequencies only


The shorter the segment, the higher the temporal resolution of the final [math]\displaystyle{ \varepsilon }[/math] time series, and the more likely the segment will be stationary. However, the spectrum's lowest resolved frequency and final resolution depend on the duration of the signal used to construct the spectrum. Therefore, the segment must remain sufficiently long such that the lowest wavenumber (frequencies) of the inertial subrange are retained by the spectra. This is particularly important when measurement noise drowns the highest wavenumber (frequencies) of the inertial subrange. Thus, using too short segments may inadvertently render the spectra unusable for deriving [math]\displaystyle{ \varepsilon }[/math] from the inertial subrange by virtue of no longer resolving this subrange.

Recommendations

A good rule of thumb for tidally-influenced environments is 5 to 15 min segments, but this may be shorter in certain energetic and fast-moving flows (Fig. 1) and longer in less energetic environments (Fig.2). Are the peaks in the MAVS data vortex shedding from the rings. Check the motion sensors onboard?

Fig. 1: Example theoretical velocity spectra for different [math]\displaystyle{ \varepsilon }[/math] with the empirical limit [math]\displaystyle{ \hat{k}L_k\sim0.1 }[/math] denoted by the diamonds ([math]\displaystyle{ \hat{k} }[/math] is in rad/m). The inertial subrange extends to smaller wavenumber [math]\displaystyle{ k }[/math] [cpm] as [math]\displaystyle{ \varepsilon }[/math] increases. The lowest frequency resolved by a spectra is the inverse of the fft-length used when computing the spectra. The colored lines are spectral observations from a dataset with fast speeds and large [math]\displaystyle{ \varepsilon }[/math]. In this example, we used relatively short segments (128s) to estimate the spectra from fft-length of 32 s (2048 samples @ 64 Hz). The impact of turbulence anisotropy is also visible through the flattening of the spectra around 1 cpm. The secondary x-axis show the corresponding frequencies for a range of mean speeds past the sensors
Fig. 2: Same as Fig 1 but for a different dataset with low speeds and low [math]\displaystyle{ \varepsilon }[/math], requiring the use of relatively long segments (1024s) to estimate the spectra from fft-length of 512 s (4096 samples @ 8 Hz).

Return to Preparing_quality-controlled_velocities