CAT-Pixelseal-Single-Step

This is the CAT (Compositional Adversarial Training) checkpoint for PixelSeal with a single-step adversary (T=1).

Overview

CAT is a plug-in training framework that replaces random augmentation with a learned sequential adversary to improve watermark robustness. Instead of sampling augmentations uniformly and independently, CAT trains a lightweight adversary—a frozen DINOv2 backbone + GRU controller + MLP heads—that adaptively selects the augmentation most likely to break the current watermark model at each training step.

This checkpoint corresponds to the T=1 (single-step) setting with:

Param head type: random_uniform — augmentation parameters are sampled uniformly at random within learned ranges
Adversary depth: 1 (single adversarial augmentation per training step)
Entropy weight (λ_ent): 0.1 — entropy regularization to prevent collapse to a single attack
VI alpha (α): 0.0

Architecture

Component	Config
Embedder	`unet_small2_yuv_quant` — U-Net operating in YUV space, 8 blocks, batch norm
Extractor	`convnext_tiny` — ConvNeXt-Tiny encoder + pixel decoder
Payload	16 bits
Adversary	DINOv2 backbone + GRU + MLP, hidden dim 256, Gumbel-Softmax τ=1.0

Training

Trained with CAT (T=1) on SA-1B using 4 GPUs:

OMP_NUM_THREADS=40 torchrun --nproc_per_node=4 train.py --local_rank 0 \
    --video_dataset none --image_dataset sa-1b-full-resized --workers 4 \
    --extractor_model convnext_tiny --embedder_model unet_small2_yuv_quant \
    --hidden_size_multiplier 1 --nbits 16 \
    --scaling_w_schedule Cosine,scaling_min=0.2,start_epoch=200,epochs=200 \
    --scaling_w 1.0 --scaling_i 1.0 --attenuation jnd_1_1 \
    --epochs 500 --iter_per_epoch 1000 \
    --scheduler CosineLRScheduler,lr_min=1e-6,t_initial=500,warmup_lr_init=1e-8,warmup_t=5 \
    --optimizer AdamW,lr=5e-4 \
    --lambda_dec 1.0 --lambda_d 0.1 --lambda_i 0.1 --perceptual_loss yuv \
    --num_augs 1 --augmentation_config configs/all_augs.yaml \
    --disc_in_channels 1 --disc_start 50 \
    --use_adversary True \
    --adversary_entropy_weight 0.1 \
    --adversary_hidden_dim 256 \
    --adversary_gumbel_temperature 1.0 \
    --adversary_param_head_type random_uniform \
    --adversary_vi_alpha 0.0 \
    --adversary_start_epoch 5

Augmentation Library

Trained against 15 augmentation families (see configs/augs.yaml): identity, JPEG, crop, rotate, rotate90, horizontal flip, perspective, Gaussian blur, brightness, contrast, saturation, hue, H.264, H.264 RGB, H.265.

Usage

See the CAT repository and configs/ for architecture details. Load with:

from videoseal.models.videoseal import Videoseal
model = Videoseal.load_from_checkpoint("adv_d1_phtrandom_uniform_a0.0_ew0.10/checkpoint.pth")

Citation

@article{cat2025,
  title={Compositional Adversarial Training for Robust Visual Watermarking},
  author={Anonymous Authors},
  year={2025},
}

License

Inherits the license from the VideoSeal repository (Meta Platforms, Inc.).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including asatheesh/CAT-Pixelseal-Single-Step

CAT

Collection

4 items • Updated 3 days ago