Copernicus-Bench#

class torchgeo.datasets.CopernicusBench(name, *args, **kwargs)[source]#

Bases: NonGeoDataset

Copernicus-Bench datasets.

This wrapper supports dynamically loading datasets in Copernicus-Bench.

If you use this dataset in your research, please cite the following paper:

Added in version 0.7.

__init__(name, *args, **kwargs)[source]#

Initialize a new CopernicusBench instance.

Parameters:
  • name (Literal['cloud_s2', 'cloud_s3', 'eurosat_s1', 'eurosat_s2', 'bigearthnet_s1', 'bigearthnet_s2', 'lc100cls_s3', 'lc100seg_s3', 'dfc2020_s1', 'dfc2020_s2', 'flood_s1', 'lcz_s2', 'biomass_s3', 'aq_no2_s5p', 'aq_o3_s5p']) – Name of the dataset to load.

  • *args (Any) – Arguments to pass to dataset class.

  • **kwargs (Any) – Keyword arguments to pass to dataset class.

__len__()[source]#

Return the length of the dataset.

Returns:

Length of the dataset.

Return type:

int

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

__getattr__(name)[source]#

Wrapper around dataset object.

class torchgeo.datasets.CopernicusBenchBase(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: NonGeoDataset, ABC

Abstract base class for all Copernicus-Bench datasets.

If you use this dataset in your research, please cite the following paper:

Added in version 0.7.

abstract property url: str#

Download URL.

md5: str#

MD5 checksum.

zipfile: str#

Zip file name.

directory: str#

Subdirectory containing split files.

filename = '{}.csv'#

Filename format of split files.

dtype: dtype = torch.int64#

Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '.*'#

Regular expression used to extract date from filename.

date_format = '%Y%m%dT%H%M%S'#

Date format string used to parse date from filename.

abstract property all_bands: tuple[str, ...]#

All spectral channels.

abstract property rgb_bands: tuple[str, ...]#

Spectral channels used to make RGB plots.

cmap: str | Colormap#

Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...]#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBase instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]#

Return the length of the dataset.

Returns:

Length of the dataset.

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – A sample returned by NonGeoDataset.__getitem__().

  • show_titles (bool) – Flag indicating whether to show titles above each panel.

  • suptitle (str | None) – Optional string to use as a suptitle.

Returns:

A matplotlib Figure with the rendered sample.

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

__annotate_func__()#

The type of the None singleton.

class torchgeo.datasets.CopernicusBenchCloudS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench Cloud-S2 dataset.

Cloud-S2 is a multi-class cloud segmentation dataset derived from CloudSEN12+, one of the largest Sentinel-2 cloud and cloud shadow detection datasets with expert-labeled pixels. We take 25% samples with high-quality labels, and split them into 1699/567/551 train/val/test subsets.

Classes#

Code

Class

Description

0

Clear

Pixels without cloud and cloud shadow contamination.

1

Thick Cloud

Opaque clouds that block all the reflected light from the Earth’s surface.

2

Thin Cloud

Semitransparent clouds that alter the surface spectral signal but still allow to recognize the background. This is the hardest class to identify.

3

Cloud Shadow

Dark pixels where light is occluded by thick or thin clouds.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '39a1f966e76455549a3e6c209ba751c1'#

MD5 checksum.

zipfile: str = 'cloud_s2.zip'#

Zip file name.

directory: str = 'cloud_s2'#

Subdirectory containing split files.

filename_regex = 'ROI_\\d{5}__(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('Clear', 'Thick Cloud', 'Thin Cloud', 'Cloud Shadow')#

List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchCloudS3(root='data', split='train', mode='multi', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench Cloud-S3 dataset.

Cloud-S3 is a cloud segmentation dataset with raw images from Sentinel-3 OLCI and labels from the IdePix classification algorithm.

This dataset has two modes:

Multiclass Classification#

Code

Class

Description

0

Invalid

Invalid pixels, should be ignored during training.

1

Clear

Land, coastline, or water pixels.

2

Cloud-Ambiguous

Semi-transparent clouds, or clouds where the detection level is uncertain.

3

Cloud-Sure

Fully-opaque clouds with full confidence of their detection.

4

Cloud Shadow

Pixels are affected by a cloud shadow.

5

Snow/Ice

Clear snow/ice pixels.

Binary Classification#

Code

Class

Description

0

Invalid

Invalid pixels, should be ignored during training.

1

Clear

Land, coastline, water, snow, or ice pixels.

2

Cloud

Pixels which are either cloud-sure or cloud-ambiguous.

If you use this dataset in your research, please cite the following paper:

Added in version 0.7.

md5: str = '1f82a8ccf16a0c44f0b1729e523e343a'#

MD5 checksum.

zipfile: str = 'cloud_s3.zip'#

Zip file name.

directory: str = 'cloud_s3'#

Subdirectory containing split files.

filename_regex = 'S3[AB]_OL_1_EFR____(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

__init__(root='data', split='train', mode='multi', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBase instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['binary', 'multi']) – One of ‘binary’ or ‘multi’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

classes: tuple[str, ...] = ('Invalid', 'Clear', 'Cloud-Ambiguous', 'Cloud-Sure', 'Cloud Shadow', 'Snow/Ice')#

List of classes for classification, semantic segmentation, and change detection.

__annotate_func__()#

The type of the None singleton.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchEuroSATS1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench EuroSAT-S1 dataset.

EuroSAT-S1 is a multi-class land use/land cover classification dataset, and is functionally identical to EuroSAT-SAR.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = 'e7e7f8fc68fc55a7a689cb654912ff3f'#

MD5 checksum.

zipfile: str = 'eurosat_s1.zip'#

Zip file name.

directory: str = 'eurosat_s1'#

Subdirectory containing split files.

filename = 'eurosat-{}.txt'#

Filename format of split files.

classes: tuple[str, ...] = ('AnnualCrop', 'HerbaceousVegetation', 'Industrial', 'PermanentCrop', 'River', 'Forest', 'Highway', 'Pasture', 'Residential', 'SeaLake')#

List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchEuroSATS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench EuroSAT-S2 dataset.

EuroSAT-S2 is a multi-class land use/land cover classification dataset, and is functionally identical to EuroSAT-MS.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = 'b2be02ca9767554c717f2e9bd15bbd23'#

MD5 checksum.

zipfile: str = 'eurosat_s2.zip'#

Zip file name.

directory: str = 'eurosat_s2'#

Subdirectory containing split files.

filename = 'eurosat-{}.txt'#

Filename format of split files.

classes: tuple[str, ...] = ('AnnualCrop', 'HerbaceousVegetation', 'Industrial', 'PermanentCrop', 'River', 'Forest', 'Highway', 'Pasture', 'Residential', 'SeaLake')#

List of classes for classification, semantic segmentation, and change detection.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchBigEarthNetS1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench BigEarthNet-S1 dataset.

BigEarthNet-S1 is a multilabel land use/land cover classification dataset composed of 5% of the Sentinel-1 data of BigEarthNet-v2.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '269355db0449e0da7213c95f30c346d4'#

MD5 checksum.

zipfile: str = 'bigearthnetv2.zip'#

Zip file name.

directory: str = 'bigearthnet_s1s2'#

Subdirectory containing split files.

filename = 'multilabel-{}.csv'#

Filename format of split files.

filename_regex = '.{16}_(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Urban fabric', 'Industrial or commercial units', 'Arable land', 'Permanent crops', 'Pastures', 'Complex cultivation patterns', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Agro-forestry areas', 'Broad-leaved forest', 'Coniferous forest', 'Mixed forest', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Transitional woodland, shrub', 'Beaches, dunes, sands', 'Inland wetlands', 'Coastal wetlands', 'Inland waters', 'Marine waters')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBigEarthNetS1 instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchBigEarthNetS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench BigEarthNet-S2 dataset.

BigEarthNet-S2 is a multilabel land use/land cover classification dataset composed of 5% of the Sentinel-2 data of BigEarthNet-v2.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '269355db0449e0da7213c95f30c346d4'#

MD5 checksum.

zipfile: str = 'bigearthnetv2.zip'#

Zip file name.

directory: str = 'bigearthnet_s1s2'#

Subdirectory containing split files.

filename = 'multilabel-{}.csv'#

Filename format of split files.

filename_regex = '.{10}_(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Urban fabric', 'Industrial or commercial units', 'Arable land', 'Permanent crops', 'Pastures', 'Complex cultivation patterns', 'Land principally occupied by agriculture, with significant areas of natural vegetation', 'Agro-forestry areas', 'Broad-leaved forest', 'Coniferous forest', 'Mixed forest', 'Natural grassland and sparsely vegetated areas', 'Moors, heathland and sclerophyllous vegetation', 'Transitional woodland, shrub', 'Beaches, dunes, sands', 'Inland wetlands', 'Coastal wetlands', 'Inland waters', 'Marine waters')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBigEarthNetS2 instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchLC100ClsS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench LC100Cls-S3 dataset.

LC100Cls-S3 is a multilabel land use/land cover classification dataset based on Sentinel-3 OLCI images and CGLS-LC100 land cover maps. CGLS-LC100 is a product in the Copernicus Global Land Service (CGLS) portfolio and delivers a global 23-class land cover map at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

Classes#

Value

Description

0

Unknown. No or not enough satellite data available.

20

Shrubs. Woody perennial plants with persistent and woody stems and without any defined main stem being less than 5 m tall. The shrub foliage can be either evergreen or deciduous.

30

Herbaceous vegetation. Plants without persistent stem or shoots above ground and lacking definite firm structure. Tree and shrub cover is less than 10 %.

40

Cultivated and managed vegetation / agriculture. Lands covered with temporary crops followed by harvest and a bare soil period (e.g., single and multiple cropping systems). Note that perennial woody crops will be classified as the appropriate forest or shrub land cover type.

50

Urban / built up. Land covered by buildings and other man-made structures.

60

Bare / sparse vegetation. Lands with exposed soil, sand, or rocks and never has more than 10 % vegetated cover during any time of the year.

70

Snow and ice. Lands under snow or ice cover throughout the year.

80

Permanent water bodies. Lakes, reservoirs, and rivers. Can be either fresh or salt-water bodies.

90

Herbaceous wetland. Lands with a permanent mixture of water and herbaceous or woody vegetation. The vegetation can be present in either salt, brackish, or fresh water.

100

Moss and lichen.

111

Closed forest, evergreen needle leaf. Tree canopy >70 %, almost all needle leaf trees remain green all year. Canopy is never without green foliage.

112

Closed forest, evergreen broad leaf. Tree canopy >70 %, almost all broadleaf trees remain green year round. Canopy is never without green foliage.

113

Closed forest, deciduous needle leaf. Tree canopy >70 %, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.

114

Closed forest, deciduous broad leaf. Tree canopy >70 %, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.

115

Closed forest, mixed.

116

Closed forest, not matching any of the other definitions.

121

Open forest, evergreen needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all needle leaf trees remain green all year. Canopy is never without green foliage.

122

Open forest, evergreen broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all broadleaf trees remain green year round. Canopy is never without green foliage.

123

Open forest, deciduous needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.

124

Open forest, deciduous broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.

125

Open forest, mixed.

126

Open forest, not matching any of the other definitions.

200

Oceans, seas. Can be either fresh or salt-water bodies.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '967d1da6286e0d0e346e425a8f3800e9'#

MD5 checksum.

zipfile: str = 'lc100_s3.zip'#

Zip file name.

filename = 'multilabel-{}.csv'#

Filename format of split files.

directory: str = 'lc100_s3'#

Subdirectory containing split files.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

classes: tuple[str, ...] = ('Unknown', 'Shrubs', 'Herbaceous vegetation', 'Cultivated and managed vegetation / agriculture', 'Urban / built up', 'Bare / sparse vegetation', 'Snow and ice', 'Permanent water bodies', 'Herbaceous wetland', 'Moss and lichen', 'Closed forest, evergreen needle leaf', 'Closed forest, evergreen broad leaf', 'Closed forest, deciduous needle leaf', 'Closed forest, deciduous broad leaf', 'Closed forest, mixed', 'Closed forest, not matching any of the other definitions', 'Open forest, evergreen needle leaf', 'Open forest, evergreen broad leaf', 'Open forest, deciduous needle leaf', 'Open forest, deciduous broad leaf', 'Open forest, mixed', 'Open forest, not matching any of the other definitions', 'Oceans, seas')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchLC100ClsS3 instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchLC100SegS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench LC100Seg-S3 dataset.

LC100Seg-S3 is a multilabel land use/land cover segmentation dataset based on Sentinel-3 OLCI images and CGLS-LC100 land cover maps. CGLS-LC100 is a product in the Copernicus Global Land Service (CGLS) portfolio and delivers a global 23-class land cover map at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

Classes#

Value

Description

0

Unknown. No or not enough satellite data available.

20

Shrubs. Woody perennial plants with persistent and woody stems and without any defined main stem being less than 5 m tall. The shrub foliage can be either evergreen or deciduous.

30

Herbaceous vegetation. Plants without persistent stem or shoots above ground and lacking definite firm structure. Tree and shrub cover is less than 10 %.

40

Cultivated and managed vegetation / agriculture. Lands covered with temporary crops followed by harvest and a bare soil period (e.g., single and multiple cropping systems). Note that perennial woody crops will be classified as the appropriate forest or shrub land cover type.

50

Urban / built up. Land covered by buildings and other man-made structures.

60

Bare / sparse vegetation. Lands with exposed soil, sand, or rocks and never has more than 10 % vegetated cover during any time of the year.

70

Snow and ice. Lands under snow or ice cover throughout the year.

80

Permanent water bodies. Lakes, reservoirs, and rivers. Can be either fresh or salt-water bodies.

90

Herbaceous wetland. Lands with a permanent mixture of water and herbaceous or woody vegetation. The vegetation can be present in either salt, brackish, or fresh water.

100

Moss and lichen.

111

Closed forest, evergreen needle leaf. Tree canopy >70 %, almost all needle leaf trees remain green all year. Canopy is never without green foliage.

112

Closed forest, evergreen broad leaf. Tree canopy >70 %, almost all broadleaf trees remain green year round. Canopy is never without green foliage.

113

Closed forest, deciduous needle leaf. Tree canopy >70 %, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.

114

Closed forest, deciduous broad leaf. Tree canopy >70 %, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.

115

Closed forest, mixed.

116

Closed forest, not matching any of the other definitions.

121

Open forest, evergreen needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all needle leaf trees remain green all year. Canopy is never without green foliage.

122

Open forest, evergreen broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, almost all broadleaf trees remain green year round. Canopy is never without green foliage.

123

Open forest, deciduous needle leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal needle leaf tree communities with an annual cycle of leaf-on and leaf-off periods.

124

Open forest, deciduous broad leaf. Top layer- trees 15-70 % and second layer- mixed of shrubs and grassland, consists of seasonal broadleaf tree communities with an annual cycle of leaf-on and leaf-off periods.

125

Open forest, mixed.

126

Open forest, not matching any of the other definitions.

200

Oceans, seas. Can be either fresh or salt-water bodies.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '967d1da6286e0d0e346e425a8f3800e9'#

MD5 checksum.

zipfile: str = 'lc100_s3.zip'#

Zip file name.

filename = 'multilabel-{}.csv'#

Filename format of split files.

directory: str = 'lc100_s3'#

Subdirectory containing split files.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('Unknown', 'Shrubs', 'Herbaceous vegetation', 'Cultivated and managed vegetation / agriculture', 'Urban / built up', 'Bare / sparse vegetation', 'Snow and ice', 'Permanent water bodies', 'Herbaceous wetland', 'Moss and lichen', 'Closed forest, evergreen needle leaf', 'Closed forest, evergreen broad leaf', 'Closed forest, deciduous needle leaf', 'Closed forest, deciduous broad leaf', 'Closed forest, mixed', 'Closed forest, not matching any of the other definitions', 'Open forest, evergreen needle leaf', 'Open forest, evergreen broad leaf', 'Open forest, deciduous needle leaf', 'Open forest, deciduous broad leaf', 'Open forest, mixed', 'Open forest, not matching any of the other definitions', 'Oceans, seas')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchLC100SegS3 instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

__annotate_func__()#

The type of the None singleton.

class torchgeo.datasets.CopernicusBenchDFC2020S1(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench DFC2020-S1 dataset.

DFC2020-S1 is a land use/land cover segmentation datasets derived from the IEEE GRSS Data Fusion Contest 2020 (DFC2020).

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = 'f10ba017dab6f38b7a6857b169ea924b'#

MD5 checksum.

zipfile: str = 'dfc2020.zip'#

Zip file name.

directory: str = 'dfc2020_s1s2'#

Subdirectory containing split files.

filename = 'dfc-{}-new.csv'#

Filename format of split files.

classes: tuple[str, ...] = ('Background', 'Forest', 'Shrubland', 'Savanna', 'Grassland', 'Wetlands', 'Croplands', 'Urban/Built-up', 'Snow/Ice', 'Barren', 'Water')#

List of classes for classification, semantic segmentation, and change detection.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchDFC2020S2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench DFC2020-S2 dataset.

DFC2020-S2 is a land use/land cover segmentation datasets derived from the IEEE GRSS Data Fusion Contest 2020 (DFC2020).

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = 'f10ba017dab6f38b7a6857b169ea924b'#

MD5 checksum.

zipfile: str = 'dfc2020.zip'#

Zip file name.

directory: str = 'dfc2020_s1s2'#

Subdirectory containing split files.

filename = 'dfc-{}-new.csv'#

Filename format of split files.

classes: tuple[str, ...] = ('Background', 'Forest', 'Shrubland', 'Savanna', 'Grassland', 'Wetlands', 'Croplands', 'Urban/Built-up', 'Snow/Ice', 'Barren', 'Water')#

List of classes for classification, semantic segmentation, and change detection.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchFloodS1(root='data', split='train', mode=1, bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench Flood-S1 dataset.

Flood-S1 is a flood segmentation dataset extracted from a large flood mapping dataset Kuro Siwo.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = 'f4337fee5e90203c6d0c3efeb0b97b8a'#

MD5 checksum.

zipfile: str = 'flood_s1.zip'#

Zip file name.

directory: str = 'flood_s1'#

Subdirectory containing split files.

filename = 'grid_dict_{}.json'#

Filename format of split files.

filename_regex = '.{18}_(?P<date>\\d{8})'#

Regular expression used to extract date from filename.

date_format = '%Y%m%d'#

Date format string used to parse date from filename.

cmap: str | Colormap = <matplotlib.colors.ListedColormap object>#

Matplotlib color map for semantic segmentation and change detection plots.

classes: tuple[str, ...] = ('No Water', 'Permanent Waters', 'Floods')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', mode=1, bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBase instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal[1, 2]) – Number of pre-flood images, 1 or 2.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchLCZS2(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench LCZ-S2 dataset.

LCZ-S2 is a multi-class scene classification dataset derived from So2Sat-LCZ42, a large-scale local climate zone classification dataset.

If you use this dataset in your research, please cite the following papers:

Note

This dataset requires the following additional library to be installed:

Added in version 0.7.

filename = 'lcz_{}.h5'#

Filename format of split files.

classes: tuple[str, ...] = ('Compact high rise', 'Compact mid rise', 'Compact low rise', 'Open high rise', 'Open mid rise', 'Open low rise', 'Lightweight low rise', 'Large low rise', 'Sparsely built', 'Heavy industry', 'Dense trees', 'Scattered trees', 'Bush, scrub', 'Low plants', 'Bare rock or paved', 'Bare soil or sand', 'Water')#

List of classes for classification, semantic segmentation, and change detection.

__init__(root='data', split='train', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBase instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__len__()[source]#

Return the length of the dataset.

Returns:

Length of the dataset.

Return type:

int

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

__annotate_func__()#

The type of the None singleton.

class torchgeo.datasets.CopernicusBenchBiomassS3(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench Biomass-S3 dataset.

Biomass-S3 is a regression dataset based on Sentinel-3 OLCI images and CCI biomass. The biomass product is part of the European Space Agency’s Climate Change Initiative (CCI) program and delivers global forest above-ground biomass at 100 m spatial resolution.

This benchmark supports both static (1 image/location) and time series (1-4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '4769ab8c2c23cd8957b99e15e071931c'#

MD5 checksum.

zipfile: str = 'biomass_s3.zip'#

Zip file name.

directory: str = 'biomass_s3'#

Subdirectory containing split files.

filename = 'static_fnames-{}.csv'#

Filename format of split files.

dtype: dtype = torch.float32#

Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = 'S3[AB]_(?P<date>\\d{8}T\\d{6})'#

Regular expression used to extract date from filename.

cmap: str | Colormap = 'YlGn'#

Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='static', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchBiomassS3 instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['static', 'time-series']) – One of ‘static’ or ‘time-series’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchAQNO2S5P(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench AQ-NO2-S5P dataset.

AQ-NO2-S5P is a regression dataset based on Sentinel-5P NO2 images and EEA air quality data products. Specifically, this dataset combines 2021 measurements of NO2 (annual average concentration) from EEA with S5P NO2 (“tropospheric NO2 column number density”) from GEE.

This benchmark supports both annual (1 image/location) and seasonal (4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '92081c7437c5c1daf783868ad7669877'#

MD5 checksum.

zipfile: str = 'airquality_s5p.zip'#

Zip file name.

directory: str = 'airquality_s5p/no2'#

Subdirectory containing split files.

filename = '{}.csv'#

Filename format of split files.

dtype: dtype = torch.float32#

Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '(?P<start>\\d{4}-\\d{2}-\\d{2})_(?P<stop>\\d{4}-\\d{2}-\\d{2})'#

Regular expression used to extract date from filename.

date_format = '%Y-%m-%d'#

Date format string used to parse date from filename.

cmap: str | Colormap = 'Wistia'#

Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchAQNO2S5P instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['annual', 'seasonal']) – One of ‘annual’ or ‘seasonal’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]

class torchgeo.datasets.CopernicusBenchAQO3S5P(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]#

Bases: CopernicusBenchBase

Copernicus-Bench AQ-O3-S5P dataset.

AQ-O3-S5P is a regression dataset based on Sentinel-5P O3 images and EEA air quality data products. Specifically, this dataset combines 2021 measurements of O3 (93.2 percentile of maximum daily 8-hour means, SOMO35) from EEA with S5P O3 (“O3 column number density”) from GEE.

This benchmark supports both annual (1 image/location) and seasonal (4 images/location) modes, the former is used in the original benchmark.

If you use this dataset in your research, please cite the following papers:

Added in version 0.7.

md5: str = '92081c7437c5c1daf783868ad7669877'#

MD5 checksum.

zipfile: str = 'airquality_s5p.zip'#

Zip file name.

directory: str = 'airquality_s5p/o3'#

Subdirectory containing split files.

filename = '{}.csv'#

Filename format of split files.

dtype: dtype = torch.float32#

Mask dtype to cast to, either torch.long for classification or torch.float for regression.

filename_regex = '(?P<start>\\d{4}-\\d{2}-\\d{2})_(?P<stop>\\d{4}-\\d{2}-\\d{2})'#

Regular expression used to extract date from filename.

date_format = '%Y-%m-%d'#

Date format string used to parse date from filename.

cmap: str | Colormap = 'Wistia'#

Matplotlib color map for semantic segmentation and change detection plots.

__init__(root='data', split='train', mode='annual', bands=None, transforms=None, download=False, checksum=False)[source]#

Initialize a new CopernicusBenchAQO3S5P instance.

Parameters:
  • root (str | PathLike[str]) – Root directory where dataset can be found.

  • split (Literal['train', 'val', 'test']) – One of ‘train’, ‘val’, or ‘test’.

  • mode (Literal['annual', 'seasonal']) – One of ‘annual’ or ‘seasonal’.

  • bands (Sequence[str] | None) – Sequence of band names to load (defaults to all bands).

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function/transform that takes input sample and its target as entry and returns a transformed version.

  • download (bool) – If True, download dataset and store it in the root directory.

  • checksum (bool) – If True, check the MD5 of the downloaded files (may be slow).

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – Index to return.

Returns:

Data and labels at that index.

Return type:

dict[str, Any]