SkySense++ (Few-Shot Checkpoint)

SkySense++ foundation model checkpoint for few-shot inference. Converted from skysensepp_release.ckpt to Hugging Face format.

Multi-modal remote sensing model fusing high-resolution optical (HR), Sentinel-2 (S2), and Sentinel-1 SAR (S1) through backbones, modality-completion VAE, and a fusion encoder.

Model Metadata

Attribute Value
Model type Multi-modal segmentation (HR + S2 + S1)
Use case Few-shot inference, representation extraction
Paper SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation
License Apache-2.0
Input modalities High-resolution optical, Sentinel-2, Sentinel-1
HR input size 512Γ—512
S2/S1 patch size 16Γ—16

Repository Structure

.
β”œβ”€β”€ config.json
β”œβ”€β”€ model.safetensors              # Main weights (~6.4GB)
β”œβ”€β”€ modality_vae/                 # VAE (ConvVQVAEv2 legacy)
β”‚   β”œβ”€β”€ config.json
β”‚   └── diffusion_pytorch_model.safetensors
β”œβ”€β”€ modeling_skysensepp.py
β”œβ”€β”€ configuration_skysensepp.py
β”œβ”€β”€ pipeline_skysensepp.py
└── sky_sensepp_impl/             # ModalityCompletionVAE, necks

Installation

pip install transformers torch safetensors diffusers

Usage

Load from Hugging Face

from transformers import AutoModel

# Replace with your HF model ID, e.g. username/SkySensepp-fewshot
model = AutoModel.from_pretrained("path/to/SkySensepp-fewshot", trust_remote_code=True)
model = model.eval().to("cuda")

Few-Shot Inference (SkySensePlusPlus repo)

# Clone SkySensePlusPlus and run 1-shot evaluation
bash tools/run_1shot.sh <gpu_idx> flood3i

Point your config's model_path to this checkpoint or load via AutoModel.from_pretrained() in your predictor.

Feature Extraction

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("path/to/SkySensepp-fewshot", trust_remote_code=True)
model = model.eval().to("cuda")

hr_img = torch.randn(1, 3, 512, 512, device="cuda")
s2_img = torch.randn(1, 10, 2, 256, 256, device="cuda")
s1_img = torch.randn(1, 2, 2, 256, 256, device="cuda")
modalities = torch.ones(1, 3, dtype=torch.bool, device="cuda")

with torch.no_grad():
    out = model(
        hr_img=hr_img,
        s2_img=s2_img,
        s1_img=s1_img,
        modality_flag_hr=modalities[:, :1],
        modality_flag_s2=modalities[:, 1:2],
        modality_flag_s1=modalities[:, 2:],
        return_features=True,
    )

features_fusion = out["features_fusion"]  # (B, 1024, H, W)

Input Formats

Modality Shape Description
hr_img (B, 3, H, W) RGB, H=W=512 typical
s2_img (B, 10, S, H, W) Sentinel-2, 10 bands, S steps
s1_img (B, 2, S, H, W) Sentinel-1 VV/VH, S steps

Citation

@article{skysensepp2025,
  title={SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation},
  journal={Nature Machine Intelligence},
  year={2025},
  url={https://www.nature.com/articles/s42256-025-01078-8}
}
Downloads last month
111
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support