File size: 6,328 Bytes
e9ea0c8
 
 
 
 
 
 
 
 
 
 
 
54d8cd4
164c490
54d8cd4
 
4350516
 
 
 
 
 
54d8cd4
 
31aedae
 
5b327d7
fddd53b
 
31aedae
 
3502533
31aedae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61c1fc0
 
 
 
 
 
 
 
 
 
 
 
 
8aee03d
61c1fc0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e589e40
61c1fc0
cf00a89
 
31aedae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c67db0f
 
 
 
 
 
 
 
 
 
 
31aedae
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
license: mit
datasets:
- Parveshiiii/AI-vs-Real
base_model:
- microsoft/swinv2-tiny-patch4-window16-256
pipeline_tag: image-classification
library_name: transformers
tags:
- safety
- XenArcAI
- SoTA
---
# XenArcAI

<p align="center">
  <img 
    src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/m7ddTjuxxLUntdXVk0t5N.png" 
    alt="AIRealNet Banner" 
    width="90%" 
    style="border-radius:15px;"
  />
</p>

---

- [GitHub Repository](https://github.com/XenArcAI/AIRealNet)
- [Live Demo](https://huggingface.co/spaces/Parveshiiii/AIRealNet)
  
## Overview

In an era of rapidly advancing AI-generated imagery, deepfakes, and synthetic media, the need for reliable detection tools has never been higher. **AIRealNet** is a binary image classifier explicitly designed to distinguish **AI-generated images** from **real human photographs**. This model is optimized to detect conventional AI-generated content while adhering to strict privacy standards—avoiding personal or sensitive images.

* **Class 0:** AI-generated image
* **Class 1:** Real human image

By leveraging the robust **SwinV2 Tiny** architecture as its backbone, AIRealNet achieves a high degree of accuracy while remaining lightweight enough for practical deployment.

---

## Key Features

1. **High Accuracy on Public Datasets:**
   Despite using a **14k-image fine-tuning split(Part of main fine tuning split)**, AIRealNet demonstrates exceptional accuracy and robustness in detecting AI-generated images.

2. **Balanced Training Split:**
   The dataset contains a balanced number of AI-generated and real images, ensuring unbiased training and minimizing class imbalance issues.

   * **AI-Generated:** 60%
   * **Human-Images:** 40%

4. **Ethical Design:**
   No personal photos were included, even if edited or AI-modified, respecting privacy and ethical AI principles.

5. **Fast and Scalable:**
   Based on a transformer vision model, AIRealNet can be deployed efficiently in both research and production environments.

---

## Training Data

* **Dataset:** `Parveshiiii/AI-vs-Real` (open-sourced subset of main dataset )
* **Size:** 14k images (balanced between AI and human)
* **Split:** Used the train split for fine-tuning; validation performed on a separate balanced subset.
* **Notes:** Images sourced from public datasets and AI generation tools. Edited personal photos were intentionally excluded.

---

## Limitations

While AIRealNet performs exceptionally well on typical AI-generated images, users should note:

1. **Subtle Edits:** The model struggles with nano-scale edits or ultra-precise modifications, like “nano banana” edits.
2. **Edited Personal Images(over precise):** Images of real people that have been AI-modified are **not detected**, aligning with privacy and ethical guidelines.
3. **Domain Generalization:** Performance may vary on images from completely unseen AI generators or extremely unconventional content.

---

## Performance Metrics

> Metrics shown are from **Epoch 2**, chosen to illustrate stable performance after fine-tuning.

<p align="center">
  <img 
    src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/3NVa0KLX0iAxTP2e6IlGH.png" 
    alt="AIRealNet Banner" 
    width="90%" 
    style="border-radius:15px;"
  />
</p>

**Note:** Extremely low loss and high accuracy are due to the controlled dataset environment. Real-world performance may be lower depending on the image domain.(In our testing this is model is over accurate despite it can't detect Nano-Banana images(only edited fully generated images can be detected over accurately))

---

## Demo and Usage

1. **Installing dependecies**

```python
pip install -U transformers
```
2. **Loading and running a demo**

```python
from transformers import pipeline

pipe = pipeline("image-classification", model="XenArcAI/AIRealNet")
pipe("https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/eVkKUTdiInUl6pbIUghQC.png")# example image
```
# Demo

* **Given Image**(Checkout Maths best filtered dataset focused on reasoning on XenArcAI)

<p align="center">
  <img 
    src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/eVkKUTdiInUl6pbIUghQC.png" 
    alt="AIRealNet Banner" 
    width="90%" 
    style="border-radius:15px;"
  />
</p>

* **Model Output**

```bash
[{'label': 'artificial', 'score': 0.9865425825119019},
 {'label': 'real', 'score': 0.013457471504807472}]
```
**Note:** its correct as the image was generated by a diffusion model

---

## Intended Use

* Detect AI-generated imagery on social media, research publications, and digital media platforms.
* Assist content moderators, researchers, and fact-checkers in identifying synthetic media.
* **Not intended** for legal verification without human corroboration.

---

## Ethical Considerations

* **Privacy-first Approach:** Personal photos, even if AI-edited, were excluded.
* **Responsible Deployment:** Users should combine model predictions with human review to avoid false positives or negatives.
* **Transparency:** The model card openly communicates its limitations and dataset design to prevent misuse.

---

## How It Works

1. Images are preprocessed and resized to `256x256`.
2. Features are extracted using the **SwinV2 Tiny** vision transformer backbone.
3. A binary classification head outputs probabilities for AI-generated vs real human images.
4. Predictions are interpreted as class 0 (AI) or class 1 (Human).

---

## Future Work

Future iterations aim to:

* Improve detection of subtle AI-generated edits and “nano banana” modifications.
* Expand training data with diverse AI generators to enhance generalization.
* Explore multi-modal detection capabilities (e.g., video, metadata, and image combined).

---

### Citation
```bibtex
@misc{xenarcai_sparkembedding_2025,
    title={SparkEmbedding-300m: A Fine-Tuned Multilingual Embedding Model for Cross-Lingual Retrieval},
    author={Parvesh Rawal},
    publisher={Hugging Face},
    year={2025},
    url={https://huggingface.co/XenArcAI/AIRealNet}
}
```

## References

* Microsoft SwinV2 Tiny: [https://github.com/microsoft/Swin-Transformer](https://github.com/microsoft/Swin-Transformer)
* Parveshiiii/AI-vs-Real dataset (subset): Open-sourced by our team member

---