This document discusses generating training data for machine learning models from noisy measurements of land cover classifications. It describes a workflow that uses Sentinel-2 satellite imagery and GlobeLand30 land cover labels to train a random forests model for land cover classification. Key points include: - Sentinel-2 and GlobeLand30 data are used as input, with GlobeLand30 labels filtered and resampled to the Sentinel-2 grid to create reference labels. - A random forests model is trained separately for each Sentinel-2 scene using stratified samples of pixels. - Initial results show 88.75% average accuracy across scenes, with some classes like water predicting well and others like wetlands being more difficult.