SkySense++ (Few-Shot Checkpoint)
SkySense++ foundation model checkpoint for few-shot inference. Converted from skysensepp_release.ckpt to Hugging Face format.
Multi-modal remote sensing model fusing high-resolution optical (HR), Sentinel-2 (S2), and Sentinel-1 SAR (S1) through backbones, modality-completion VAE, and a fusion encoder.
Model Metadata
| Attribute | Value |
|---|---|
| Model type | Multi-modal segmentation (HR + S2 + S1) |
| Use case | Few-shot inference, representation extraction |
| Paper | SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation |
| License | Apache-2.0 |
| Input modalities | High-resolution optical, Sentinel-2, Sentinel-1 |
| HR input size | 512Γ512 |
| S2/S1 patch size | 16Γ16 |
Repository Structure
.
βββ config.json
βββ model.safetensors # Main weights (~6.4GB)
βββ modality_vae/ # VAE (ConvVQVAEv2 legacy)
β βββ config.json
β βββ diffusion_pytorch_model.safetensors
βββ modeling_skysensepp.py
βββ configuration_skysensepp.py
βββ pipeline_skysensepp.py
βββ sky_sensepp_impl/ # ModalityCompletionVAE, necks
Installation
pip install transformers torch safetensors diffusers
Usage
Load from Hugging Face
from transformers import AutoModel
# Replace with your HF model ID, e.g. username/SkySensepp-fewshot
model = AutoModel.from_pretrained("path/to/SkySensepp-fewshot", trust_remote_code=True)
model = model.eval().to("cuda")
Few-Shot Inference (SkySensePlusPlus repo)
# Clone SkySensePlusPlus and run 1-shot evaluation
bash tools/run_1shot.sh <gpu_idx> flood3i
Point your config's model_path to this checkpoint or load via AutoModel.from_pretrained() in your predictor.
Feature Extraction
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("path/to/SkySensepp-fewshot", trust_remote_code=True)
model = model.eval().to("cuda")
hr_img = torch.randn(1, 3, 512, 512, device="cuda")
s2_img = torch.randn(1, 10, 2, 256, 256, device="cuda")
s1_img = torch.randn(1, 2, 2, 256, 256, device="cuda")
modalities = torch.ones(1, 3, dtype=torch.bool, device="cuda")
with torch.no_grad():
out = model(
hr_img=hr_img,
s2_img=s2_img,
s1_img=s1_img,
modality_flag_hr=modalities[:, :1],
modality_flag_s2=modalities[:, 1:2],
modality_flag_s1=modalities[:, 2:],
return_features=True,
)
features_fusion = out["features_fusion"] # (B, 1024, H, W)
Input Formats
| Modality | Shape | Description |
|---|---|---|
| hr_img | (B, 3, H, W) |
RGB, H=W=512 typical |
| s2_img | (B, 10, S, H, W) |
Sentinel-2, 10 bands, S steps |
| s1_img | (B, 2, S, H, W) |
Sentinel-1 VV/VH, S steps |
Citation
@article{skysensepp2025,
title={SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation},
journal={Nature Machine Intelligence},
year={2025},
url={https://www.nature.com/articles/s42256-025-01078-8}
}
- Downloads last month
- 111