Google Satellite Embedding#

class torchgeo.datasets.GoogleSatelliteEmbedding(paths='data', crs=None, res=None, bands=None, transforms=None, cache=True, time_series=False)[source]#

Bases: RasterDataset

Google Satellite Embedding dataset.

The Google Satellite Embedding dataset is a global, analysis-ready collection of learned geospatial embeddings. Each 10-meter pixel in this dataset is a 64-dimensional representation, or “embedding vector”, that encodes temporal trajectories of surface conditions at and around that pixel as measured by various Earth observation instruments and datasets, over a single calendar year.

The dataset covers terrestrial land surfaces and shallow waters, including intertidal and reef zones, inland waterways, and coastal waterways. Coverage at the poles is limited by satellite orbits and instrument coverage.

The embeddings are unit-length, meaning they have a magnitude of 1 and do not require any additional normalization, and are distributed across the unit sphere, making them well-suited for use with clustering algorithms and tree-based classifiers. The embedding space is also consistent across years, and embeddings from different years can be used for condition change detection by considering the dot product or angle between two embedding vectors. Furthermore, the embeddings are designed to be linearly composable, i.e., they can be aggregated to produce embeddings at coarser spatial resolutions or transformed with vector arithmetic, and still retain their semantic meaning and distance relationships.

The Satellite Embedding dataset was produced by AlphaEarth Foundations, a geospatial embedding model that assimilates multiple datastreams including optical, radar, LiDAR, and other sources.

If you use this dataset in your research, please cite the following paper:

Note

The dataset can be downloaded from a number of locations:

Added in version 0.9.

all_bands: tuple[str, ...] = ('A00', 'A01', 'A02', 'A03', 'A04', 'A05', 'A06', 'A07', 'A08', 'A09', 'A10', 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A20', 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A30', 'A31', 'A32', 'A33', 'A34', 'A35', 'A36', 'A37', 'A38', 'A39', 'A40', 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A50', 'A51', 'A52', 'A53', 'A54', 'A55', 'A56', 'A57', 'A58', 'A59', 'A60', 'A61', 'A62', 'A63')#

Names of all available bands in the dataset

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Warning

Visualizations are generated using PCA on each image individually, and are thus not comparable across images. The plot method is provided for visualization purposes only and should not be used to draw conclusions.

Parameters:
  • sample (dict[str, Any]) – a sample returned by RasterDataset.__getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure