Talk:Level 2 data (velocity profilers)

From Atomix

Brian scannell (talk) 13:57, 29 December 2021 (CET) I don’t think the definitions of the two level 2 levels work. With the current definitions, the detrended velocity is a variable under the optional segmented data level. But in practice the data is always segmented and (almost always) detrended, it is just that the “segment" is often the “burst”, which is why I think having both complicates matters.

I think we want to achieve two things. Firstly, we want to have a record of the qaqc of the level 1 velocity data. Secondly, we want to see the data structured and pre-processed as the input to the DLL calculation.

At level 1 we have dimensions TIME, R_DIST and N_BEAM with the key variable R_VEL and I’m proposing the index N_PROF (dimension TIME). The qaqc data is then R_VEL_FLAGS (dimensions TIME, R_DIST, N_BEAM). I’m not sure it is necessary to replicate the level 1 R_VEL variable itself at level 2 - surely that just makes the large data files even larger?

The dimensions for the segmented data are now N_SEGMENT (integer count of segment number), N_SAMPLE (integer count of the profiles used in each segment), plus the existing R_DIST and N_BEAM. I would suggest that we use the variable N_PROF with dimension (N_SEGMENT, N_SAMPLE) to show the unique number of the profiles as used in the segmented data. I’m not sure whether we should be using the same variable name, but the principle is that you could select any segment and read the profile numbers used for each sample and be able to relate this directly to the level 1 R_VEL with the level 2 R_VEL_FLAGS qaqc criteria applied.

If the data was originally collected in bursts of 300 profiles, then N_PROF would simply increment 1 to 300 for segment 1, 301 to 600 for segment 2 etc. If instead, the data was collected continuously and the chosen segment length was 300 with a 50% overlap, N_PROF would be 1 to 300 for segment 1, 151 to 450 for segment 2, 301 to 600 for segment 3 etc.

If we weren’t duplicating the level 1 R_VEL, we could define the level 2 R_VEL as having dimensions (N_SEGMENT, N_SAMPLE, R_DIST, N_BEAM) with the qaqc flags and any detrending / preprocessing applied.

TIME would now be a variable either with dimensions (N_SEGMENT, N_SAMPLE) containing the individual profile timestamps (effectively replicating N_PROF) as suggested or it could be the segment mean time with dimension (N_SEGMENT).

I think this approach simplifies things as well as providing an audit trail



Responses by CynthiaBluteau (talk) 16:24, 9 January 2022 (CET)

  • Lots to ponder but I agree that Level2_qaqc could be removed entirely. Level1 should store velocities (R_VEL in its rawest form i.e., no data removed), along with the R_VEL_FLAGS. A user can then grab R_VEL and R_VEL_FLAGS, find all R_VEL_FLAGS>0 (since 0=good) and set those indices to NaN. The R_VEL_FLAGS isn't velocities but numbers b/w 0-255 denoting an 8-bit boolean flag. It;s very tedious to re-write the data 2x, and people will want to compare the flags (e.g., low corr) with perhaps plotting the correlation. Forcing people to load data b/w groups should be avoided.
  • Dimensions
    • I would remove R_DIST from Level 1-2 and replace it with Z_DIST for Level1-Level2. R_DIST should appear as a calculated variable using THETA=[NBEAMS array] and Z_DIST at Level3 where it's used (R_DIST would be a matrix of N_beams* Z_dist). This change enables storing all 5 beams in the same matrix and avoids the very complex (very tedious/difficult) of writing a NETCDF file where each variable will need to be manually assigned the dimensions since there will be CORRV, ABSCV, R_VELV, R_VEL_FLAGSV that will have ever so slightly different dimensions than CORR, ABSC etc when all beams have the same Z reference (z being the instrument's z). Dimension variables for CF-compliant dictate that time, lat, long and z (or geographical) should be given priority.
  • N_PROF - I would call it N_PROFILE but yes agree with what you've proposed.

Brian scannell (talk) 11:57, 17 January 2022 (CET) Alternative text for comments field:

  • N_SEGMENT array of 1 to number of data segments for analysis
  • N_SAMPLE array of 1 to number of samples in each segment

Also:

  • all instances of R_DIST dimension should now be Z_DIST
  • BIN_SIZE should have dimension N_BEAM as per level 1
  • not sure we need R_VEL repeated at this level - surely the R_VEL_DETRENDED is sufficient?
  • shouldn’t we have R_VEL_DETRENDED_FLAGS for any qaqc checks applied at this stage?
  • suggest TIME has dimension N_SEGMENT and is the mean of the level 1 TIME values for the individual profiles - the information detailing which profiles are included in each segment being provided by PROFILE_NUMBER (dimensions N_SEGMENT and N_SAMPLE) - this will then be the TIME dimension for levels 3 and 4 (although I’m unsure whether the bounds attribute can be set for a variable).