Talk:Level 2 data (velocity profilers): Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[[User:Brian scannell|Brian scannell]] ([[User talk:Brian scannell|talk]]) | [[User:Brian scannell|Brian scannell]] ([[User talk:Brian scannell|talk]]) 13:57, 29 December 2021 (CET) I don’t think the definitions of the two level 2 levels work. With the current definitions, the detrended velocity is a variable under the optional segmented data level. But in practice the data is always segmented and (almost always) detrended, it is just that the “segment" is often the “burst”, which is why I think having both complicates matters. | ||
I think we want to achieve two things. Firstly, we want to have a record of the qaqc of the level 1 velocity data. Secondly, we want to see the data structured and pre-processed as the input to the DLL calculation. | |||
At level 1 we have dimensions TIME, R_DIST and N_BEAM with the key variable R_VEL and I’m proposing the index N_PROF (dimension TIME). The qaqc data is then R_VEL_FLAGS (dimensions TIME, R_DIST, N_BEAM). I’m not sure it is necessary to replicate the level 1 R_VEL variable itself at level 2 - surely that just makes the large data files even larger? | |||
The dimensions for the segmented data are now N_SEGMENT (integer count of segment number), N_SAMPLE (integer count of the profiles used in each segment), plus the existing R_DIST and N_BEAM. I would suggest that we use the variable N_PROF with dimension (N_SEGMENT, N_SAMPLE) to show the unique number of the profiles as used in the segmented data. I’m not sure whether we should be using the same variable name, but the principle is that you could select any segment and read the profile numbers used for each sample and be able to relate this directly to the level 1 R_VEL with the level 2 R_VEL_FLAGS qaqc criteria applied. | |||
If the data was originally collected in bursts of 300 profiles, then N_PROF would simply increment 1 to 300 for segment 1, 301 to 600 for segment 2 etc. If instead, the data was collected continuously and the chosen segment length was 300 with a 50% overlap, N_PROF would be 1 to 300 for segment 1, 151 to 450 for segment 2, 301 to 600 for segment 3 etc. | |||
If we weren’t duplicating the level 1 R_VEL, we could define the level 2 R_VEL as having dimensions (N_SEGMENT, N_SAMPLE, R_DIST, N_BEAM) with the qaqc flags and any detrending / preprocessing applied. | |||
TIME would now be a variable either with dimensions (N_SEGMENT, N_SAMPLE) containing the individual profile timestamps (effectively replicating N_PROF) as suggested or it could be the segment mean time with dimension (N_SEGMENT). | |||
I think this approach simplifies things as well as providing an audit trail |
Revision as of 12:57, 29 December 2021
Brian scannell (talk) 13:57, 29 December 2021 (CET) I don’t think the definitions of the two level 2 levels work. With the current definitions, the detrended velocity is a variable under the optional segmented data level. But in practice the data is always segmented and (almost always) detrended, it is just that the “segment" is often the “burst”, which is why I think having both complicates matters.
I think we want to achieve two things. Firstly, we want to have a record of the qaqc of the level 1 velocity data. Secondly, we want to see the data structured and pre-processed as the input to the DLL calculation.
At level 1 we have dimensions TIME, R_DIST and N_BEAM with the key variable R_VEL and I’m proposing the index N_PROF (dimension TIME). The qaqc data is then R_VEL_FLAGS (dimensions TIME, R_DIST, N_BEAM). I’m not sure it is necessary to replicate the level 1 R_VEL variable itself at level 2 - surely that just makes the large data files even larger?
The dimensions for the segmented data are now N_SEGMENT (integer count of segment number), N_SAMPLE (integer count of the profiles used in each segment), plus the existing R_DIST and N_BEAM. I would suggest that we use the variable N_PROF with dimension (N_SEGMENT, N_SAMPLE) to show the unique number of the profiles as used in the segmented data. I’m not sure whether we should be using the same variable name, but the principle is that you could select any segment and read the profile numbers used for each sample and be able to relate this directly to the level 1 R_VEL with the level 2 R_VEL_FLAGS qaqc criteria applied.
If the data was originally collected in bursts of 300 profiles, then N_PROF would simply increment 1 to 300 for segment 1, 301 to 600 for segment 2 etc. If instead, the data was collected continuously and the chosen segment length was 300 with a 50% overlap, N_PROF would be 1 to 300 for segment 1, 151 to 450 for segment 2, 301 to 600 for segment 3 etc.
If we weren’t duplicating the level 1 R_VEL, we could define the level 2 R_VEL as having dimensions (N_SEGMENT, N_SAMPLE, R_DIST, N_BEAM) with the qaqc flags and any detrending / preprocessing applied.
TIME would now be a variable either with dimensions (N_SEGMENT, N_SAMPLE) containing the individual profile timestamps (effectively replicating N_PROF) as suggested or it could be the segment mean time with dimension (N_SEGMENT).
I think this approach simplifies things as well as providing an audit trail