The document discusses improving spatial data quality through data conflation and assessing the accuracy of interactive data quality interpretation for remote sensing data. It describes using data usability as a new quality measure that is interactively interpreted based on technical errors, unusable image segments, clouds, and other factors. The document also outlines equations and graphs for assessing data quality, the evaluation of subjectivity, comparing interpretations to standard deviations and metadata, and generating quick-look products with cloud masks for monitoring processing status.
Axa Assurance Maroc - Insurer Innovation Award 2024
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Data Quality Interpretation
1.
2.
3.
4.
5. Derivation of test data Receiving circle of Neustrelitz ground station Spatial distribution of Landsat 7/ETM+ database
6.
7.
8. Equations Assessment of evaluation subjectivity Acceptable error 0.05 (significance level of 5 %) z-value Standard normal distribution 1.96, P Population 11,828 quadrants, P(1-P) Maximum value 0.25 n ≥ 384 400 d i Absolute error SD DU Standard deviation
9.
10.
11.
12.
13. Operational aspects Extended quick-look product with cloud mask, operator vote and automaton vote
15. Thank you for your attention! Questions, comments, feedback? [email_address] [email_address] ICCSA 2011 | GEOG-AN-MOD 2011 | University of Santander | 20-23/06/2011
Hinweis der Redaktion
Increased availability of remote sensing data is accompanied by an increasing demand for additional information metadata facilitate the search for appropriate data within remote sensing databases of a variety of providers (e.g. European Space Agency, ESA) Typical metadata information are geographical coordinates, acquisition time, calibration parameters, and data quality.
Ground stations Kiruna (Sweden), Fucino (Italy), and Neustrelitz (Germany) constitute ESA Earthnet for the LANDSAT 7 / ETM+ satellite. Stations are compatibly equipped. A detailed description of the national ground segment for LANDSAT 7 in Neustrelitz, representative for all stations of the ESA network Cloud cover degree is an accepted indicator for quantifying data quality. This simple criterion is inadequate as quality measure Quality of optical remote sensing data depends on number of cloud pixels and their distribution within a scene Taking this into account, data quality assessment within the ESA LANDSAT 7 ground segment was interactively by interpreters data quality assessments is labour-intensive and subjective
Investigations presented are based on evaluation of 2,957 quick-look data (11,828 quadrants) and corresponding metadata in the period from 13.03.2002 to 15.01.2003. Ground resolution of quick-look data is usually strongly reduced to ground resolution figure shows the spatial distribution of the LANDSAT 7 / ETM+ data base. Colour code in figure represents the number of available scenes for the marked geographical location of the scene (path/row). _________________________________________________________ Quick-look data are preview images derived from original remote sensing data Quadrants are an evaluation unit of a remote sensing data scene
Figure illustrates the influence of cloud distribution on data usability for a cloud cover degree of 25 percent Top: concentrated clouds Bottom: distributed clouds
Sample for quick-look product for interactive visual data usability assessment by interpreters (the quadrant numbers are 1 to 4).
a stratified sample was collected. the population was 11,828 quadrants stratified sample describes characteristic development when the population exhibits strong differences representative sample size is necessary to characterise the population minimum sample size can be calculated by eq. 1 to n ≥ 384 quadrants _________________________________________________________ test was performed by 6 interpreters in 3 test series (400 quadrants) so 18 usability values for each quadrant for arithmetic average this value is denoted as mean data usability (see eq. 2) _________________________________________________________ assuming that the mean data usability delivers objective assessment, subjectivity of individual interpreter can be quantified mean data usability can be used for computing absolute error δi between mean data usability and assessment mi by individual interpreter (eq 3) _________________________________________________________ beside maximum of absolute error δi its distribution over all data usability classes is of particular interest variability over all 18 assessment results can be described by using standard deviation (eq. 4)
interactive visual evaluation of data quality performed by interpreters is subjectively biased data usability assessment provided by single interpreter has to be compared with mean data usability frequency distribution and cumulative frequency curve of modulus | δi | are plotted top: Mean data usability vs. single interpretation distribution of | δi | shows for ≈ 60 % of all quadrants a | δi | = 0, for ≈ 90% a | δi | ≤ 10, and for ≈ 98% a | δi | ≤ 20 bottom: Mean data usability vs. metadata assessment distribution of | δi | shows for ≈ 50 % of all quadrants a | δi | = 0, for ≈ 79% a | δi | ≤ 10, and for ≈ 90% a | δi | ≤ 20 the result demonstrate the level of subjectivity: in minimum for 50 % of all quadrants a difference of +/- one class
figure shows a diagram, where data usability mi is plotted against the standard deviation. intervals [0; 10] and [80; 90] small standard deviation is computed interval [20; 70] standard deviation is considerably higher each point of diagram can represent various assessments, therefore the trend is drawn as 3rd degree polynomial for better orientation In this range the average standard deviation is in the order of 10 to 15 order of magnitude corresponds to a difference in data usability assessment of approximately one class or two classes this is of particular interest, since the order of magnitude of the evaluation differences is contained in the metadata.
scatter diagram: interpreter assessment extracted from metadata is plotted against mean data usability derived of 18 data usability values for each quadrant graph shows the relation of both data sets each point in scatter diagram can represent more than one evaluation dashed 1:1-line has been plotted into diagram for better orientation regression line is relatively closed to the 1:1-line determination coefficient for the linear regression only amounts to 0.8553 data usability of metadata is underestimated in interval [0, 50] data usability of metadata is overestimated in interval [70, 90].
comparison of mean data usability and data usability in metadata is shown in figure In this context, it is of special interest to quantify the differences of the single interpreter assessments. average absolute error of assessment results for each single interpreter and standard deviation (3 series; 400 quadrants) were computed diagram supports the following statements: average deviation of all interpreter results varies in interval of [-7; +6] and standard deviation varies in the interval of [+6; +12]. results of single interpreter are relatively concentrated in a small evaluation interval maximum range of average deviation for a single interpreter is [-4.5; -2.5] and maximum range of standard deviation is [9.9, 11.9]
Considering these results above a harmonisation function could be developed for adjusting the interpretation results of single interpreter: problem: there is no information in metadata identifying the interpreter who did the data quality assessment Without this information there is no possibility to minimize the subjective influence by individual interpreter on the data usability provided in the metadata. or interactive visual interpretation can be supported by an automated supporting system Samples of modified quicklook data are shown here