Copernicus-Embed#
- class torchgeo.datasets.CopernicusEmbed(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False, time_series=False)[source]#
Bases:
RasterDatasetCopernicus-Embed dataset.
Copernicus-Embed is an embedding dataset that gives each 0.25x0.25 grid one embedding vector, aggregated over all available modalities from the whole Copernicus-Pretrain dataset (721x1440x768, filling empty ocean grids with 0). This dataset can be seen as a semantic representation product that integrates various sources of satellite observations at an extremely high compression ratio. It also makes it very convenient to link Earth’s surface to the atmosphere (e.g., as improved static variables adding to ERA5), unlocking new possibilities in the development of weather/climate foundation models.
If you use this dataset in your research, please cite the following paper:
Added in version 0.9.
- filename_glob = 'embed_map_*'#
Glob expression used to search for files.
This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.
- __init__(paths='data', crs=None, res=None, transforms=None, cache=True, download=False, checksum=False, time_series=False)[source]#
Initialize a new CopernicusEmbed instance.
- Parameters:
paths (str | PathLike[str] | Iterable[str | PathLike[str]]) – one or more root directories to search or files to load
crs (CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)
res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)
transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version
cache (bool) – if True, cache file handle to speed up repeated sampling
download (bool) – if True, download dataset and store it in the root directory
checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)
time_series (bool) – if True, stack data along the time series dimension [T, C, H, W]. If False, merge data into a [C, H, W] mosaic.
- Raises:
DatasetNotFoundError – If dataset is not found and download is False.
Added in version 0.9: The time_series parameter.
- plot(sample, show_titles=True, suptitle=None)[source]#
Plot a sample from the dataset.
Warning
Visualizations are generated using PCA on each image individually, and are thus not comparable across images. The plot method is provided for visualization purposes only and should not be used to draw conclusions.