SSL4EO#

class torchgeo.datasets.SSL4EO[source]#

Bases: NonGeoDataset

Base class for all SSL4EO datasets.

Self-Supervised Learning for Earth Observation (SSL4EO) is a collection of large-scale multimodal multitemporal datasets for unsupervised/self-supervised pre-training in Earth observation.

Added in version 0.5.

class torchgeo.datasets.SSL4EOL(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]#

Bases: SSL4EO

SSL4EO-L dataset.

Landsat version of SSL4EO.

The dataset consists of a parallel corpus (same locations and dates for SR/TOA) for the following sensors:

Split

Satellites

Sensors

Level

# Bands

Link

tm_toa

Landsat 4–5

TM

TOA

7

GEE

etm_sr

Landsat 7

ETM+

SR

6

GEE

etm_toa

Landsat 7

ETM+

TOA

9

GEE

oli_tirs_toa

Landsat 8–9

OLI+TIRS

TOA

11

GEE

oli_sr

Landsat 8–9

OLI

SR

7

GEE

Each patch has the following properties:

  • 264 x 264 pixels

  • Resampled to 30 m resolution (7920 x 7920 m)

  • 4 seasonal timestamps

  • Single multispectral GeoTIFF file

Note

Each split is 300–400 GB and requires 3x that to concatenate and extract tarballs. Tarballs can be safely deleted after extraction to save space. The dataset takes about 1.5 hrs to download and checksum and another 3 hrs to extract.

If you use this dataset in your research, please cite the following paper:

Added in version 0.5.

__init__(root='data', split='oli_sr', seasons=1, transforms=None, download=False, checksum=False)[source]#

Initialize a new SSL4EOL instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • split (Literal['tm_toa', 'etm_toa', 'etm_sr', 'oli_tirs_toa', 'oli_sr']) – one of [‘tm_toa’, ‘etm_toa’, ‘etm_sr’, ‘oli_tirs_toa’, ‘oli_sr’]

  • seasons (Literal[1, 2, 3, 4]) – number of seasonal patches to sample per location, 1–4

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

image sample

Return type:

dict[str, Any]

__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

__annotate_func__()#

The type of the None singleton.

class torchgeo.datasets.SSL4EOS12(root='data', split='s2c', seasons=1, transforms=None, download=False, checksum=False)[source]#

Bases: SSL4EO

SSL4EO-S12 dataset.

Sentinel-1/2 version of SSL4EO.

The dataset consists of a parallel corpus (same locations and dates) for the following satellites:

Split

Satellite

Level

# Bands

Link

s1

Sentinel-1

GRD

2

GEE

s2c

Sentinel-2

TOA

13

GEE

s2a

Sentinel-2

SR

12

GEE

Each patch has the following properties:

  • 264 x 264 pixels

  • Resampled to 10 m resolution (2640 x 2640 m)

  • 4 seasonal timestamps

If you use this dataset in your research, please cite the following paper:

Note

The dataset is about 1.5 TB when compressed and 3.7 TB when uncompressed.

Added in version 0.5.

__init__(root='data', split='s2c', seasons=1, transforms=None, download=False, checksum=False)[source]#

Initialize a new SSL4EOS12 instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • split (Literal['s1', 's2c', 's2a']) – one of “s1” (Sentinel-1 GRD dual-pol SAR), “s2c” (Sentinel-2 Level-1C top-of-atmosphere reflectance), or “s2a” (Sentinel-2 Level-2A surface reflectance)

  • seasons (Literal[1, 2, 3, 4]) – number of seasonal patches to sample per location, 1–4

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Added in version 0.7: The download parameter.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

image sample

Return type:

dict[str, Any]

__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

__annotate_func__()#

The type of the None singleton.