Talk:Level 3 data (velocity profilers)

From Atomix
Revision as of 20:36, 21 March 2022 by Jmmcmillan (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Brian scannell (talk) 15:11, 30 December 2021 (CET) re. DLL_FLAGS - we have both the calculated DLL and the associated qaqc flags at the same level. It makes me wonder whether it would be better to have the R_VEL_FLAGS at level 1. They are currently at level 2 and therefore require either that the level 1 data is duplicated at level 2 or means they sit separate from the associated data. Why not have level 1 being the raw data with qaqc flags and level 2 being the data rearranged into segments (which may be the original bursts) with appropriate pre-processing (detrending) ready for the DLL calculation? It would seem cleaner and more consistent.

This would also allow for the possibility of separate qaqc flags to be defined at level 2 e.g. outlier detection based on the segmented data.

CynthiaBluteau (talk) 16:39, 17 January 2022 (CET) The flags were changed.

Brian scannell (talk) 16:47, 29 December 2021 (CET) re. TIME dimension comments - the requirement to define time bounds for each segment looks rather complex and I’m not sure that it adds anything. Presumably the requirement to specify bounds will not be mandatory?

Having introduced N_SEGMENT as a dimension at level 2, with TIME as a variable, we are now reverting to TIME as the dimension with N_SEGMENT as the variable. Given that TIME is now derived as the mean time for the observations in the segment, wouldn’t it be more appropriate to keep it as the variable?

CynthiaBluteau (talk) 16:39, 17 January 2022 (CET) Time bnds is based on CF-compliant dataset. Personally, TIME should be always a dimension as per CF standards. Not a dimension. It was changed at LEvl 2 because people preferred not calculating TIME at Level 2 (violating CF-standards), but NETCDF guis can handle quite nicely time (centered) variables.

Brian scannell (talk) 13:01, 30 December 2021 (CET) re. R_DEL / R_DEL5 dimension comments - R_DEL should be calculated as a function of R_DIST, which itself is a function of bin size and theta, but having defined R_DIST, it should now be the basis on which R_DEL is calculated. So for example (assuming Matlab indexing), for a central difference scheme evaluated at bin 10 i.e. R_DIST(10), the two-bin separation R_DEL(2) = R_DIST(11) - R_DIST(9), whereas for a forward difference scheme evaluated at bin 10, R_DEL(2) would be R_DIST(12) - R_DIST(10). The R_DEL(2) values will be identical, but the principle is that R_DEL is the separation distance distance the velocity observations being compared.

Also note that R_DEL units should be specified as (in meters).

Brian scannell (talk) 15:00, 30 December 2021 (CET) re. DLL_N comment - suggest reword as “number of instances when the velocity difference is evaluated, maximum is [number of profiles in segment - either max(N_SAMPLE) or possibly segment_length if redefined as number of profiles rather than time duration]"

CynthiaBluteau (talk) 16:45, 17 January 2022 (CET) The comments in these tables are for the wiki (teaching purposes). The highlights text in pink brings users to the up-to-date attribute tables. The use "number_of_observations" is a standard modifier for CF-standards and should not be changed. It means the number of usable (good) samples in that "chunk" of data i.e., segment.

Brian scannell (talk) 15:23, 17 January 2022 (CET) For levels 1 and 2 we have moved from R_DIST being a dimension to Z_DIST so that data for both angled and vertical beams can be combined in the R_VEL array. This requires that the profile times are the same for all beams, but leaves open the possibility of the vertical bin size being different between the angled and vertical beams - hence the dimension of BIN_SIZE at level 1 is N_BEAM. However, if we allow for that possibility, then we run into problems using R_DEL as a dimension at level 3 - since the values will differ between the beams and, as I understand it, we can’t have dimension R_DEL with dimensions N_BEAM and R_DEL.

If we constrain the flexibility such that the vertical beam data can only be combined with the angled beam data if both the sampling times and the vertical bin size are the same for both, then at level 3 we can define a dimension Z_DEL which the vertical separation distance between bins at which the mean of the squared velocity difference is calculated - this would be the same for both vertical and angled beams. It would have dimension Z_DEL. Then R_DEL would be a derived variable with dimensions N_BEAM and Z_DEL calculated as a function of THETA and Z_DEL.

If either the sampling times or the vertical bin size differed between the angled and vertical beams, the user would have to prepare separate data files for the two.

Brian scannell (talk) 10:26, 18 January 2022 (CET) Alternatively, we could specify the “R_DEL" dimension in “bin separations” i.e. integer values - although this might be better named as N_SEPARATION or something. R_DEL would then be a derived variable calculated from BIN_SIZE (dimension N_BEAM), THETA (dimension N_BEAM) and N_SEPARATION (dimension N_SEPARATION). Whilst this would allow for differences in BIN_SIZE, it would still be problematic if there were significant differences in the number of bins between the beams and sampling times still need to be the same for all beams.

Example of a SF fit using JMM's method with all possible combinations of bin separations leading to non-unique R_DEL values (Note: r on the x-axis i= R_DEL)

]


Jmmcmillan (talk) 23:18, 14 February 2022 (CET) I agree with Brian that we need a different dimension at level 3. I prefer using something like N_DEL or N_SEPARATION to specify the number of bins of separation as opposed to Z_DEL because it make more logical sense to work along the beams for level 3. In my specific implementation of the SF, my dimension R_DEL is actually non-unique (I take all possible combinations of separation distances within my turbulence 'blob'). For now, I've chosen to use N_R_DEL as my dimension which is a vector of integers from 1 to 28, where 28 is the maximum number of points I can have in my regression based on my choice of R_MAX. Then R_DEL is defined as a variable instead of a dimension. But, if it is possible to have non-unique values as a dimension then N_DEL could also work for me. I will do some testing to confirm.

Jmmcmillan (talk) 21:23, 21 March 2022 (CET) I've updated the dimension to be N_DEL instead of R_DEL on the wiki page.


Jmmcmillan (talk) 21:35, 21 March 2022 (CET) In the interest of reducing the number of required variables, we could remove:

  • R_DIST: can be calculated from Z_DIST and THETA, and isn't actually used for any future analysis.
  • N_SEGMENT: will always be a series of integers from 1 to length(TIME). I don't think this dimension from Level 2 needs to be repeated here.
  • BURST_NUMBER: Not required for continuous sampling data sets