Embedded Seamless Data#
- class torchgeo.datasets.EmbeddedSeamlessData(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True, time_series=False)[source]#
Bases:
RasterDatasetEmbedded Seamless Data (ESD).
The Embedded Seamless Data (ESD) is a global, analysis-ready Earth embedding dataset at 30-meter resolution, designed to overcome the computational and storage challenges of planetary-scale Earth system science. By transforming multi-sensor satellite observations into compact, quantized latent vectors, ESD reduces the original data volume (~1 PB for a full year of global land surfaces) to approximately 2.4 TB, enabling decadal-scale analysis on standard workstations.
Key features:
Longitudinal Consistency: Provides a continuous record from 2000 to 2024, harmonized from Landsat 5, 7, 8, 9, MODIS Terra and NASADEM imagery.
High Reconstructive Fidelity: Achieves a Mean Absolute Error (MAE) of 0.013 across six spectral bands, ensuring the embeddings retain physically meaningful surface information.
Semantic Intelligence: Captures complex land surface patterns, outperforming raw sensor fusion data for land-cover classification (global accuracy 79.74%).
Implicit Denoising: Filters transient noise such as clouds and shadows via the ESDNet architecture, producing clean signals suitable for temporal and environmental monitoring.
Few-Shot Proficiency: Supports robust learning with minimal labeled data, ideal for regions with scarce ground-truth measurements.
Compact and Vectorized: Each 30-meter pixel is represented by a high-dimensional embedding vector, which can be aggregated, compared, or analyzed efficiently without reconstructing raw imagery.
The dataset covers terrestrial land surfaces, shallow waters, intertidal and reef zones, inland waterways, and coastal regions. Polar coverage is limited by satellite orbits and sensor availability.
Produced by the ESDNet framework, ESD provides an ultra-lightweight, globally consistent representation of surface conditions, enabling flexible, high-resolution analysis of land surface dynamics over decades.
If you use this dataset in your research, please refer to:
Code: shuangchencc/ESD
Added in version 0.9.
- filename_glob = 'SDC30_EBD_*'#
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- filename_regex = '.*_(?P<date>\\d{4})'#
Regular expression used to extract date from filename.
The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:
date: used to calculatemintandmaxtforindexinsertionstart: used to calculatemintforindexinsertionstop: used to calculatemaxtforindexinsertion
When
separate_filesis True, the following additional groups are searched for to find other files:band: replaced with requested band name
- date_format = '%Y'#
Date format string used to parse date from filename.
Not used if
filename_regexdoes not contain adategroup orstartandstopgroups.
- __getitem__(index)[source]#
Retrieve input, target, and/or metadata indexed by spatiotemporal slice.
- Parameters:
index (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.
- Returns:
Sample of input, target, and/or metadata at that index.
- Raises:
IndexError – If index is not found in the dataset.
- Return type: