Copernicus-FM#
- class torchgeo.models.CopernicusFM(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, hyper_dim=128, num_classes=0, global_pool=True, mlp_ratio=4.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]#
Bases:
ModuleCopernicusFM: VisionTransformer backbone.
Example
1. Spectral Mode (Using Wavelength and Bandwidth):
>>> model = CopernicusFM() >>> x = torch.randn(1, 4, 224, 224) # input image >>> metadata = torch.full( ... (1, 4), float('nan') ... ) # [lon (degree), lat (degree), delta_time (days since 1970/1/1), patch_token_area (km^2)], assume unknown >>> wavelengths = [ ... 490, ... 560, ... 665, ... 842, ... ] # wavelength (nm): B,G,R,NIR (Sentinel 2) >>> bandwidths = [65, 35, 30, 115] # bandwidth (nm): B,G,R,NIR (Sentinel 2) >>> kernel_size = 16 # expected patch size >>> input_mode = 'spectral' >>> logit = model( ... x, ... metadata, ... wavelengths=wavelengths, ... bandwidths=bandwidths, ... input_mode=input_mode, ... kernel_size=kernel_size, ... ) >>> print(logit.shape)
2. Variable Mode (Using language embedding):
>>> model = CopernicusFM() >>> varname = 'Sentinel 5P Nitrogen Dioxide' # variable name (as input to a LLM for language embed) >>> x = torch.randn(1, 1, 56, 56) # input image >>> metadata = torch.full( ... (1, 4), float('nan') ... ) # [lon (degree), lat (degree), delta_time (days since 1970/1/1), patch_token_area (km^2)], assume unknown >>> language_embed = torch.randn( ... 2048 ... ) # language embedding: encode varname with a LLM (e.g. Llama) >>> kernel_size = 4 # expected patch size >>> input_mode = 'variable' >>> logit = model( ... x, ... metadata, ... language_embed=language_embed, ... input_mode=input_mode, ... kernel_size=kernel_size, ... ) >>> print(logit.shape)
- __init__(img_size=224, patch_size=16, drop_rate=0.0, embed_dim=1024, depth=24, num_heads=16, hyper_dim=128, num_classes=0, global_pool=True, mlp_ratio=4.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]#
Initialize a new CopernicusFM instance.
- Parameters:
img_size (int) – Input image size.
patch_size (int) – Patch size.
drop_rate (float) – Head dropout rate.
embed_dim (int) – Transformer embedding dimension.
depth (int) – Depth of transformer.
num_heads (int) – Number of attention heads.
hyper_dim (int) – Dimensions of dynamic weight generator.
num_classes (int) – Number of classes for classification head.
global_pool (bool) – Whether or not to perform global pooling.
mlp_ratio (float) – Ratio of MLP hidden dim to embedding dim.
- forward_features(x, metadata, wavelengths=None, bandwidths=None, language_embed=None, input_mode='spectral', kernel_size=None)[source]#
Forward pass of the feature embedding layer.
- Parameters:
x (Tensor) – Input mini-batch.
metadata (Tensor) – Longitudes (degree), latitudes (degree), times (days since 1970/1/1), and areas (km^2) of each patch. Use NaN for unknown metadata.
wavelengths (Sequence[float] | None) – Wavelengths of each spectral band (nm). Only used if input_mode==’spectral’.
bandwidths (Sequence[float] | None) – Bandwidths in nm. Only used if input_mode==’spectral’.
language_embed (Tensor | None) – Language embedding tensor from Llama 3.2 1B (length 2048). Only used if input_mode==’variable’.
input_mode (Literal['spectral', 'variable']) – One of ‘spectral’ or ‘variable’.
kernel_size (int | None) – If provided and differs from the initialized kernel size, the generated patch embed kernel weights are resized accordingly.
- Returns:
Output mini-batch.
- Return type:
- forward(x, metadata, wavelengths=None, bandwidths=None, language_embed=None, input_mode='spectral', kernel_size=None)[source]#
Forward pass of the model.
- Parameters:
x (Tensor) – Input mini-batch.
metadata (Tensor) – Longitudes (degree), latitudes (degree), times (days since 1970/1/1), and areas (km^2) of each patch. Use NaN for unknown metadata.
wavelengths (Sequence[float] | None) – Wavelengths of each spectral band (nm). Only used if input_mode==’spectral’.
bandwidths (Sequence[float] | None) – Bandwidths in nm. Only used if input_mode==’spectral’.
language_embed (Tensor | None) – Language embedding tensor from Llama 3.2 1B (length 2048). Only used if input_mode==’variable’.
input_mode (Literal['spectral', 'variable']) – One of ‘spectral’ or ‘variable’.
kernel_size (int | None) – If provided and differs from the initialized kernel size, the generated patch embed kernel weights are resized accordingly.
- Returns:
Output mini-batch.
- Return type:
- torchgeo.models.copernicusfm_base(weights=None, *args, **kwargs)[source]#
CopernicusFM vit-base model.
If you use this model in your research, please cite the following paper:
Added in version 0.7.
- Parameters:
weights (CopernicusFM_Base_Weights | None) – Pre-trained model weights to use.
*args (Any) – Additional arguments to pass to
CopernicusFM.**kwargs (Any) – Additional keyword arguments to pass to
CopernicusFM.
- Returns:
A CopernicusFM base model.
- Return type: