So2Sat#

class torchgeo.datasets.So2Sat(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]#

Bases: NonGeoDataset

So2Sat dataset.

The So2Sat dataset consists of corresponding synthetic aperture radar and multispectral optical image data acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and a corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world, and comes with a variety of different splits.

This implementation covers the 2nd and 3rd versions of the dataset as described in the author’s github repository: zhu-xlab/So2Sat-LCZ42.

The different versions are as follows:

Version 2: This version contains imagery from 52 cities and is split into train/val/test as follows:

  • Training: 42 cities around the world

  • Validation: western half of 10 other cities covering 10 cultural zones

  • Testing: eastern half of the 10 other cities

Version 3: A version of the dataset with 3 different train/test splits, as follows:

  • Random split: every city 80% training / 20% testing (randomly sampled)

  • Block split: every city is split in a geospatial 80%/20%-manner

  • Cultural 10: 10 cities from different cultural zones are held back for testing purposes

Dataset classes:

  1. Compact high rise

  2. Compact middle rise

  3. Compact low rise

  4. Open high rise

  5. Open mid rise

  6. Open low rise

  7. Lightweight low rise

  8. Large low rise

  9. Sparsely built

  10. Heavy industry

  11. Dense trees

  12. Scattered trees

  13. Bush, scrub

  14. Low plants

  15. Bare rock or paved

  16. Bare soil or sand

  17. Water

If you use this dataset in your research, please cite the following paper:

Note

The version 2 dataset can be automatically downloaded using the following bash script:

for split in training validation testing
do
    wget ftp://m1483140:[email protected]/$split.h5
done

or manually downloaded from https://dataserv.ub.tum.de/index.php/s/m1483140 This download will likely take several hours.

The version 3 datasets can be downloaded using the following bash script:

for version in random block culture_10
do
  for split in training testing
  do
    wget -P $version/ ftp://m1613658:[email protected]/$version/$split.h5
  done
done

or manually downloaded from https://mediatum.ub.tum.de/1613658

Note

This dataset requires the following additional library to be installed:

__init__(root='data', version='2', split='train', bands=('S1_B1', 'S1_B2', 'S1_B3', 'S1_B4', 'S1_B5', 'S1_B6', 'S1_B7', 'S1_B8', 'S2_B02', 'S2_B03', 'S2_B04', 'S2_B05', 'S2_B06', 'S2_B07', 'S2_B08', 'S2_B8A', 'S2_B11', 'S2_B12'), transforms=None, checksum=False)[source]#

Initialize a new So2Sat dataset instance.

Parameters:
  • root (str | PathLike[str]) – root directory where dataset can be found

  • version (str) – one of “2”, “3_random”, “3_block”, or “3_culture_10”

  • split (str) – one of “train”, “validation”, or “test”

  • bands (Sequence[str]) – a sequence of band names to use where the indices correspond to the array index of combined Sentinel 1 and Sentinel 2

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes input sample and its target as entry and returns a transformed version

  • checksum (bool) – if True, check the MD5 of the downloaded files (may be slow)

Raises:

Added in version 0.3: The bands parameter.

Added in version 0.5: The version parameter.

__getitem__(index)[source]#

Return an index within the dataset.

Parameters:

index (int) – index to return

Returns:

data and label at that index

Return type:

dict[str, Any]

__len__()[source]#

Return the number of data points in the dataset.

Returns:

length of the dataset

Return type:

int

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Raises:

RGBBandsMissingError – If bands does not include all RGB bands.

Return type:

Figure

Added in version 0.2.

__annotate_func__()#

The type of the None singleton.