BigEarthNet#

class torchgeo.datasets.BigEarthNet(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]#

Bases: NonGeoDataset

BigEarthNet dataset.

The BigEarthNet dataset is a dataset for multilabel remote sensing image scene classification.

Dataset features:

  • 590,326 patches from 125 Sentinel-1 and Sentinel-2 tiles

  • Imagery from tiles in Europe between Jun 2017 - May 2018

  • 12 spectral bands with 10-60 m per pixel resolution (base 120x120 px)

  • 2 synthetic aperture radar bands (120x120 px)

  • 43 or 19 scene classes from the 2018 CORINE Land Cover database (CLC 2018)

Dataset format:

  • images are composed of multiple single channel geotiffs

  • labels are multiclass, stored in a single json file per image

  • mapping of Sentinel-1 to Sentinel-2 patches are within Sentinel-1 json files

  • Sentinel-1 bands: (VV, VH)

  • Sentinel-2 bands: (B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)

  • All bands: (VV, VH, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12)

  • Sentinel-2 bands are of different spatial resolutions and upsampled to 10m

Dataset classes (43):

  1. Continuous urban fabric

  2. Discontinuous urban fabric

  3. Industrial or commercial units

  4. Road and rail networks and associated land

  5. Port areas

  6. Airports

  7. Mineral extraction sites

  8. Dump sites

  9. Construction sites

  10. Green urban areas

  11. Sport and leisure facilities

  12. Non-irrigated arable land

  13. Permanently irrigated land

  14. Rice fields

  15. Vineyards

  16. Fruit trees and berry plantations

  17. Olive groves

  18. Pastures

  19. Annual crops associated with permanent crops

  20. Complex cultivation patterns

  21. Land principally occupied by agriculture, with significant areas of natural vegetation

  22. Agro-forestry areas

  23. Broad-leaved forest

  24. Coniferous forest

  25. Mixed forest

  26. Natural grassland

  27. Moors and heathland

  28. Sclerophyllous vegetation

  29. Transitional woodland/shrub

  30. Beaches, dunes, sands

  31. Bare rock

  32. Sparsely vegetated areas

  33. Burnt areas

  34. Inland marshes

  35. Peatbogs

  36. Salt marshes

  37. Salines

  38. Intertidal flats

  39. Water courses

  40. Water bodies

  41. Coastal lagoons

  42. Estuaries

  43. Sea and ocean

Dataset classes (19):

  1. Urban fabric

  2. Industrial or commercial units

  3. Arable land

  4. Permanent crops

  5. Pastures

  6. Complex cultivation patterns

  7. Land principally occupied by agriculture, with significant areas of natural vegetation

  8. Agro-forestry areas

  9. Broad-leaved forest

  10. Coniferous forest

  11. Mixed forest

  12. Natural grassland and sparsely vegetated areas

  13. Moors, heathland and sclerophyllous vegetation

  14. Transitional woodland, shrub

  15. Beaches, dunes, sands

  16. Inland wetlands

  17. Coastal wetlands

  18. Inland waters

  19. Marine waters

The source for the above dataset classes, their respective ordering, and 43-to-19-class mappings can be found here:

If you use this dataset in your research, please cite the following paper:

__init__(root='data', split='train', bands='all', num_classes=19, transforms=None, download=False, checksum=False)[source]#

Initialize a new BigEarthNet dataset instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • split (str) – train/val/test split to load

  • bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}

  • num_classes (int) – number of classes to load in target. one of {19, 43}

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, Any]

__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

Added in version 0.2.

__annotate_func__()#

The type of the None singleton.

class torchgeo.datasets.BigEarthNetV2(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]#

Bases: NonGeoDataset

BigEarthNetV2 dataset.

The BigEarthNet V2 dataset contains improved labels, improved geospatial data splits and additionally pixel-level labels from CORINE Land Cover (CLC) map of 2018. Additionally, some problematic patches from V1 have been removed.

If you use this dataset in your research, please cite the following paper:

Added in version 0.7.

__init__(root='data', split='train', bands='all', transforms=None, download=False, checksum=False)[source]#

Initialize a new BigEarthNet V2 dataset instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • split (str) – train/val/test split to load

  • bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:
__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, Any]

__annotate_func__()#

The type of the None singleton.

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure