Upload AI Image Detector model

75aaabf verified 3 months ago

6.01 kB

	---
	license: mit
	tags:
	- computer-vision
	- anomaly-detection
	- deep-svdd
	- ai-generated-images
	- image-classification
	- pytorch-lightning
	datasets:
	- cifar10
	library_name: pytorch-lightning
	pipeline_tag: image-classification
	---

	# 🔍 AI Image Detector - Deep SVDD

	<div align="center">

	One-Class Deep Learning Model for Detecting AI-Generated Images

	[![Model](https://img.shields.io/badge/Model-Deep%20SVDD-blue)](https://huggingface.co/ash12321/ai-image-detector-deepsvdd)
	[![Framework](https://img.shields.io/badge/Framework-PyTorch%20Lightning-red)](https://lightning.ai/)
	[![Dataset](https://img.shields.io/badge/Dataset-CIFAR--10-green)](https://www.cs.toronto.edu/~kriz/cifar.html)

	</div>

	## 📖 Model Description

	This model detects AI-generated images using Deep Support Vector Data Description (SVDD), a one-class learning approach. It was trained exclusively on real images to learn what "real" looks like, allowing it to identify synthetic/AI-generated images as anomalies.

	### Key Features

	- ✅ Enhanced Deep SVDD Architecture with channel attention mechanisms
	- ✅ Trained on 35,000 real images from CIFAR-10 dataset
	- ✅ L4 GPU Optimized with mixed precision training (16-bit)
	- ✅ Advanced Augmentation: Mixup, multi-scale, contrastive learning
	- ✅ Robust Evaluation: 70/15/15 train/val/test split with unseen test data

	## 🎯 Performance Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Test Loss \| 0.7637 \|
	\| Mean Distance \| 0.7637 \|
	\| Std Distance \| 0.0024 \|
	\| 95th Percentile \| 0.7700 \|
	\| Radius Threshold \| 0.7747 \|

	## 🚀 Quick Start

	### Installation

	```bash
	pip install torch torchvision pytorch-lightning huggingface-hub pillow
	```

	### Basic Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from PIL import Image
	import torchvision.transforms as transforms

	# Download model
	model_path = hf_hub_download(
	repo_id="ash12321/ai-image-detector-deepsvdd",
	filename="model.ckpt"
	)

	# Load model (you'll need the model class definition)
	from model import AdvancedDeepSVDD

	model = AdvancedDeepSVDD.load_from_checkpoint(model_path)
	model.eval()

	# Prepare image
	transform = transforms.Compose([
	transforms.Resize((32, 32)),
	transforms.ToTensor(),
	transforms.Normalize(
	mean=[0.4914, 0.4822, 0.4465],
	std=[0.2470, 0.2435, 0.2616]
	)
	])

	image = Image.open('test_image.jpg').convert('RGB')
	image_tensor = transform(image).unsqueeze(0)

	# Predict
	is_fake, scores, distances = model.predict_anomaly(image_tensor)

	print(f"AI-Generated: {is_fake[0].item()}")
	print(f"Confidence: {scores[0].item()*100:.1f}%")
	print(f"Anomaly Score: {scores[0].item():.4f}")
	```

	### Using with Gradio

	```python
	import gradio as gr

	def predict(image):
	img_tensor = transform(image).unsqueeze(0)
	is_fake, scores, _ = model.predict_anomaly(img_tensor)

	result = "🚨 AI-Generated" if is_fake[0] else "✅ Real Image"
	confidence = f"{scores[0].item()*100:.1f}%"

	return f"{result} (Confidence: {confidence})"

	demo = gr.Interface(
	fn=predict,
	inputs=gr.Image(type="pil"),
	outputs=gr.Markdown(),
	title="AI Image Detector"
	)

	demo.launch()
	```

	## 🏗️ Architecture Details

	### Enhanced Deep SVDD Encoder

	```
	Input (3x32x32)
	→ Stem Conv (64 channels)
	→ Layer1 (64→128) + Channel Attention
	→ Layer2 (128→256) + Channel Attention
	→ Layer3 (256→512) + Channel Attention
	→ Dual Pooling (Avg + Max)
	→ Projection Head (1024→512→128)
	→ Output (128-dim latent space)
	```

	### Training Optimizations

	- Optimizer: AdamW (lr=1e-3, weight_decay=1e-3)
	- Scheduler: OneCycleLR with cosine annealing
	- Batch Size: 128 (L4 GPU optimized)
	- Augmentation: Mixup (α=0.2), multi-scale, extensive transforms
	- Loss: SVDD objective + contrastive diversity + L2 regularization

	## 📊 Training Configuration

	```python
	Model Parameters: 5.3M trainable
	Epochs: 30
	Training Samples: 35,000 (70%)
	Validation Samples: 7,500 (15%)
	Test Samples: 7,500 (15%)
	Precision: 16-bit mixed precision
	GPU: NVIDIA L4 with Tensor Cores
	```

	## 🎨 Data Augmentation Pipeline

	Training Augmentations:
	- Multi-scale resizing (32, 64, 96 pixels)
	- Random resized crop (scale: 0.5-1.0)
	- Random horizontal/vertical flips
	- Random rotation (±20°)
	- Color jitter (brightness, contrast, saturation, hue)
	- Gaussian blur
	- Random erasing
	- Mixup augmentation

	Validation/Test:
	- Simple resize to 32x32
	- Normalize with CIFAR-10 statistics

	## 💡 Use Cases

	- Content Moderation: Identify AI-generated images in uploads
	- Digital Forensics: Verify authenticity of images
	- Research: Study differences between real and synthetic images
	- Education: Demonstrate one-class learning techniques

	## ⚠️ Limitations

	- Training Domain: Optimized for natural images similar to CIFAR-10
	- Image Size: Trained on 32x32 images (resize larger images)
	- Generalization: May require fine-tuning for specific domains
	- False Positives: Unusual real images may be flagged as AI-generated
	- Not Foolproof: Sophisticated AI images may evade detection

	## 📚 Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{ai-image-detector-deepsvdd-2024,
	author = {ash12321},
	title = {AI Image Detector using Deep SVDD},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/ash12321/ai-image-detector-deepsvdd}},
	}
	```

	## 📄 License

	This model is released under the MIT License.

	## 🤝 Contributing

	Contributions, issues, and feature requests are welcome!

	## 👤 Author

	ash12321
	- Hugging Face: [@ash12321](https://huggingface.co/ash12321)

	## 🙏 Acknowledgments

	- CIFAR-10 dataset creators
	- PyTorch Lightning team
	- Deep SVDD paper authors
	- Hugging Face for hosting infrastructure

	---

	<div align="center">

	[Try it on Hugging Face Spaces](https://huggingface.co/spaces/ash12321/ai-image-detector-demo)

	</div>