CAT-Pixelseal-Single-Step
This is the CAT (Compositional Adversarial Training) checkpoint for PixelSeal with a single-step adversary (T=1).
Overview
CAT is a plug-in training framework that replaces random augmentation with a learned sequential adversary to improve watermark robustness. Instead of sampling augmentations uniformly and independently, CAT trains a lightweight adversary—a frozen DINOv2 backbone + GRU controller + MLP heads—that adaptively selects the augmentation most likely to break the current watermark model at each training step.
This checkpoint corresponds to the T=1 (single-step) setting with:
- Param head type:
random_uniform— augmentation parameters are sampled uniformly at random within learned ranges - Adversary depth: 1 (single adversarial augmentation per training step)
- Entropy weight (λ_ent): 0.1 — entropy regularization to prevent collapse to a single attack
- VI alpha (α): 0.0
Architecture
| Component | Config |
|---|---|
| Embedder | unet_small2_yuv_quant — U-Net operating in YUV space, 8 blocks, batch norm |
| Extractor | convnext_tiny — ConvNeXt-Tiny encoder + pixel decoder |
| Payload | 16 bits |
| Adversary | DINOv2 backbone + GRU + MLP, hidden dim 256, Gumbel-Softmax τ=1.0 |
Training
Trained with CAT (T=1) on SA-1B using 4 GPUs:
OMP_NUM_THREADS=40 torchrun --nproc_per_node=4 train.py --local_rank 0 \
--video_dataset none --image_dataset sa-1b-full-resized --workers 4 \
--extractor_model convnext_tiny --embedder_model unet_small2_yuv_quant \
--hidden_size_multiplier 1 --nbits 16 \
--scaling_w_schedule Cosine,scaling_min=0.2,start_epoch=200,epochs=200 \
--scaling_w 1.0 --scaling_i 1.0 --attenuation jnd_1_1 \
--epochs 500 --iter_per_epoch 1000 \
--scheduler CosineLRScheduler,lr_min=1e-6,t_initial=500,warmup_lr_init=1e-8,warmup_t=5 \
--optimizer AdamW,lr=5e-4 \
--lambda_dec 1.0 --lambda_d 0.1 --lambda_i 0.1 --perceptual_loss yuv \
--num_augs 1 --augmentation_config configs/all_augs.yaml \
--disc_in_channels 1 --disc_start 50 \
--use_adversary True \
--adversary_entropy_weight 0.1 \
--adversary_hidden_dim 256 \
--adversary_gumbel_temperature 1.0 \
--adversary_param_head_type random_uniform \
--adversary_vi_alpha 0.0 \
--adversary_start_epoch 5
Augmentation Library
Trained against 15 augmentation families (see configs/augs.yaml): identity, JPEG, crop, rotate, rotate90, horizontal flip, perspective, Gaussian blur, brightness, contrast, saturation, hue, H.264, H.264 RGB, H.265.
Usage
See the CAT repository and configs/ for architecture details. Load with:
from videoseal.models.videoseal import Videoseal
model = Videoseal.load_from_checkpoint("adv_d1_phtrandom_uniform_a0.0_ew0.10/checkpoint.pth")
Citation
@article{cat2025,
title={Compositional Adversarial Training for Robust Visual Watermarking},
author={Anonymous Authors},
year={2025},
}
License
Inherits the license from the VideoSeal repository (Meta Platforms, Inc.).