SSL4EO-L Benchmark#

class torchgeo.datasets.SSL4EOLBenchmark(root='data', sensor='oli_sr', product='cdl', split='train', classes=None, transforms=None, download=False, checksum=False)[source]#

Bases: NonGeoDataset

SSL4EO Landsat Benchmark Evaluation Dataset.

Dataset is intended to be used for evaluation of SSL techniques. Each benchmark dataset consists of 25,000 images with corresponding land cover classification masks.

Dataset format:

  • Input landsat image and single channel mask

  • 25,000 total samples split into train, val, test (70%, 15%, 15%)

  • NLCD dataset version has 17 classes

  • CDL dataset version has 134 classes

Each patch has the following properties:

  • 264 x 264 pixels

  • Resampled to 30 m resolution (7920 x 7920 m)

  • Single multispectral GeoTIFF file

If you use this dataset in your research, please cite the following paper:

Added in version 0.5.

__init__(root='data', sensor='oli_sr', product='cdl', split='train', classes=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new SSL4EO Landsat Benchmark instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • sensor (str) – one of [‘etm_toa’, ‘etm_sr’, ‘oli_tirs_toa, ‘oli_sr’]

  • product (str) – mask target, one of [‘cdl’, ‘nlcd’]

  • split (str) – dataset split, one of [‘train’, ‘val’, ‘test’]

  • classes (list[int] | None) – list of classes to include, the rest will be mapped to 0 (defaults to all classes for the chosen product)

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:
__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

image and sample

Return type:

dict[str, Any]

__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

retrieve_sample_collection()[source]#

Retrieve paths to samples in data directory.

__annotate_func__()#

The type of the None singleton.

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure