NCCM#

class torchgeo.datasets.NCCM(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False, time_series=False)[source]#

Bases: RasterDataset

The Northeastern China Crop Map Dataset.

Link: https://www.nature.com/articles/s41597-021-00827-9

This dataset produced annual 10-m crop maps of the major crops (maize, soybean, and rice) in Northeast China from 2017 to 2019, using hierarchial mapping strategies, random forest classifiers, interpolated and smoothed 10-day Sentinel-2 time series data and optimized features from spectral, temporal and textural characteristics of the land surface. The resultant maps have high overall accuracies (OA) based on ground truth data. The dataset contains information specific to three years: 2017, 2018, 2019.

The dataset contains 5 classes:

  1. paddy rice

  2. maize

  3. soybean

  4. others crops and lands

  5. nodata

Dataset format:

  • Three .TIF files containing the labels

  • JavaScript code to download images from the dataset.

If you use this dataset in your research, please cite the following paper:

Added in version 0.6.

filename_regex = 'CDL(?P<date>\\d{4})_clip'#

Regular expression used to extract date from filename.

The expression should use named groups. The expression may contain any number of groups. The following groups are specifically searched for by the base class:

  • date: used to calculate mint and maxt for index insertion

  • start: used to calculate mint for index insertion

  • stop: used to calculate maxt for index insertion

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

filename_glob = 'CDL*.*'#

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

date_format = '%Y'#

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group or start and stop groups.

is_image = False#

True if the dataset only contains model inputs (such as images). False if the dataset only contains ground truth model outputs (such as segmentation masks).

The sample returned by the dataset/data loader will use the “image” key if is_image is True, otherwise it will use the “mask” key.

For datasets with both model inputs and outputs, the recommended approach is to use 2 RasterDataset instances and combine them using an IntersectionDataset.

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {0: (0, 255, 0, 255), 1: (255, 0, 0, 255), 2: (255, 255, 0, 255), 3: (128, 128, 128, 255), 15: (255, 255, 255, 255)}#

Color map for the dataset, used for plotting

__init__(paths='data', crs=None, res=None, years=[2019], transforms=None, cache=True, download=False, checksum=False, time_series=False)[source]#

Initialize a new dataset.

Parameters:
  • paths (str | PathLike[str] | Iterable[str | PathLike[str]]) – one or more root directories to search or files to load

  • crs (CRS | None) – coordinate reference system (CRS) to warp to (defaults to the CRS of the first file found)

  • res (float | tuple[float, float] | None) – resolution of the dataset in units of CRS in (xres, yres) format. If a single float is provided, it is used for both the x and y resolution. (defaults to the resolution of the first file found)

  • years (list[int]) – list of years for which to use nccm layers

  • transforms (Callable[[dict[str, Any]], dict[str, Any]] | None) – a function/transform that takes an input sample and returns a transformed version

  • cache (bool) – if True, cache file handle to speed up repeated sampling

  • download (bool) – if True, download dataset and store it in the root directory

  • checksum (bool) – if True, check the MD5 after downloading files (may be slow)

  • time_series (bool) – if True, stack data along the time series dimension [T, C, H, W]. If False, merge data into a [C, H, W] mosaic.

Raises:

DatasetNotFoundError – If dataset is not found and download is False.

Added in version 0.9: The time_series parameter.

__getitem__(index)[source]#

Retrieve input, target, and/or metadata indexed by spatiotemporal slice.

Parameters:

index (slice | tuple[slice] | tuple[slice, slice] | tuple[slice, slice, slice]) – [xmin:xmax:xres, ymin:ymax:yres, tmin:tmax:tres] coordinates to index.

Returns:

Sample of input, target, and/or metadata at that index.

Raises:

IndexError – If index is not found in the dataset.

Return type:

dict[str, Any]

plot(sample, show_titles=True, suptitle=None)[source]#

Plot a sample from the dataset.

Parameters:
  • sample (dict[str, Any]) – a sample returned by __getitem__()

  • show_titles (bool) – flag indicating whether to show titles above each panel

  • suptitle (str | None) – optional string to use as a suptitle

Returns:

a matplotlib Figure with the rendered sample

Return type:

Figure

__annotate_func__()#

The type of the None singleton.