torchgeo.datasets
=================
.. module:: torchgeo.datasets
.. toctree::
:maxdepth: 0
:hidden:
:glob:
datasets/*
TorchGeo defines several kinds of datasets for geospatial data.
Benchmark Datasets
------------------
Curated benchmark datasets allow for model training and evaluation. They typically provide both input images and output labels, and target a variety of downstream applications.
.. csv-table:: C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning
:header-rows: 1
:align: center
:file: datasets/benchmark.csv
Copernicus-Bench
^^^^^^^^^^^^^^^^
Copernicus-Bench is a comprehensive evaluation benchmark with 15 downstream tasks hierarchically organized across preprocessing (e.g., cloud removal), base applications (e.g., land cover classification), and specialized applications (e.g., air quality estimation). This benchmark enables systematic assessment of foundation model performances across various Sentinel missions on different levels of practical applications.
.. csv-table:: C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning
:header-rows: 1
:align: center
:file: datasets/copernicus_bench.csv
SpaceNet
^^^^^^^^
The `SpaceNet Dataset `_ is hosted as an Amazon Web Services (AWS) `Public Dataset `_. It contains ~67,000 square km of very high-resolution imagery, >11M building footprints, and ~20,000 km of road labels to ensure that there is adequate open source data available for geospatial machine learning research. SpaceNet Challenge Dataset's have a combination of very high resolution satellite imagery and high quality corresponding labels for foundational mapping features such as building footprints or road networks.
.. csv-table:: C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning
:header-rows: 1
:align: center
:file: datasets/spacenet.csv
Pre-Training Datasets
---------------------
Pre-training datasets are designed for foundation model development, providing millions of input images with global distributions. These datasets may come with output labels for supervised pre-training, or come without output labels for self-supervised pre-training.
.. csv-table:: C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning
:header-rows: 1
:align: center
:file: datasets/pretraining.csv
Embeddings Datasets
-------------------
Embeddings are low-dimensional representations generated by foundation models. There are both patch-based embeddings designed for similarity search and pixel-based embeddings designed for applications like land cover mapping.
.. csv-table:: Global coverage only implies land surfaces. Temporal resolution is divided into "snapshot" for embeddings generated from a single mosaic and "annual" for embeddings generated from annual time series data. \*Product has sparse spatial or temporal coverage.
:header-rows: 1
:align: center
:file: datasets/embeddings.csv
Image Sources
-------------
Uncurated raster imagery can be used within TorchGeo, either for inference using a pre-trained model, or for training by combination with mask labels.
.. csv-table::
:header-rows: 1
:align: center
:file: datasets/images.csv
Mask Labels
-----------
Uncurated raster and vector masks can be used within TorchGeo, typically in combination with an image source for model training.
.. csv-table::
:header-rows: 1
:align: center
:file: datasets/masks.csv
Toy Datasets
------------
Toy datasets are tiny, ~100 image datasets designed for tutorials, demos, or few-shot learning.
.. csv-table:: C = classification, R = regression, S = semantic segmentation, I = instance segmentation, T = time series, CD = change detection, OD = object detection, IC = image captioning
:header-rows: 1
:align: center
:file: datasets/toys.csv
.. _Base Classes:
Base Classes
------------
If you want to write your own custom dataset, you can extend one of these abstract base classes.
GeoDataset
^^^^^^^^^^
.. autoclass:: GeoDataset
RasterDataset
^^^^^^^^^^^^^
.. autoclass:: RasterDataset
VectorDataset
^^^^^^^^^^^^^
.. autoclass:: VectorDataset
NonGeoDataset
^^^^^^^^^^^^^
.. autoclass:: NonGeoDataset
NonGeoClassificationDataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: NonGeoClassificationDataset
IntersectionDataset
^^^^^^^^^^^^^^^^^^^
.. autoclass:: IntersectionDataset
UnionDataset
^^^^^^^^^^^^
.. autoclass:: UnionDataset
Utilities
---------
Collation Functions
^^^^^^^^^^^^^^^^^^^
.. autofunction:: stack_samples
.. autofunction:: concat_samples
.. autofunction:: merge_samples
.. autofunction:: unbind_samples
Splitting Functions
^^^^^^^^^^^^^^^^^^^
.. autofunction:: random_bbox_assignment
.. autofunction:: random_bbox_splitting
.. autofunction:: random_grid_cell_assignment
.. autofunction:: roi_split
.. autofunction:: time_series_split
Errors
------
.. autoclass:: DatasetNotFoundError
.. autoclass:: DependencyNotFoundError
.. autoclass:: RGBBandsMissingError