File size: 6,315 Bytes
5f179f7
 
 
 
 
 
3d9c701
 
397693f
 
e4b2462
3d9c701
397693f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ec4d5d
 
397693f
 
 
 
 
 
c6939ae
397693f
 
 
 
 
 
 
 
 
 
 
dd25841
397693f
 
 
 
 
 
 
 
3d9c701
 
 
 
 
397693f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bbf44a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
397693f
 
 
 
 
c6939ae
397693f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abcc698
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: mit
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
---
# SarcasmDiffusion — SDXL Fused Meme Generator

**Model type:** Stable Diffusion XL (Base 1.0) fine‑tuned via **LoRA** (merged/fused) to learn the *visual* style of sarcastic/ironic memes.  
**Author:** Ricardo Urdaneta (github.com/Ricardouchub)  


---

## Overview

SarcasmDiffusion is a diffusion-based generative model focused on producing **clean meme-style photographs** that are suitable for **caption overlays** (text is added *after* generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the *Hateful Memes* dataset to capture stylistic cues of humorous/ironic memes while **avoiding offensive content**.

- **Base:** `stabilityai/stable-diffusion-xl-base-1.0`
- **Fine‑tuning:** LoRA on the **UNet** only; **VAE** and **text encoders** are frozen.
- **Exported artifact:** **Fused SDXL** (no external LoRA required at inference).

> This model focuses on **style transfer for meme aesthetics** (composition, lighting, “stock-photo vibe”), *not* on rendering text inside images. Add titles/subtitles with your own overlay function or editor.

---

## Intended Use

- Generating **meme-ready images** with space at the top/bottom for captions.
- Creative exploration of humorous/ironic visual setups controlled by prompts.
- Educational/portfolio use for **LoRA fine‑tuning workflows** with SDXL.

### Out of Scope / Limitations
- **No text rendering inside the image** (explicitly discouraged via negative prompts).  
- May produce **stock-like** aesthetics by design.  
- Not suitable for generating or amplifying **harmful, hateful, or NSFW** content.  
- As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.

---

## Training Summary

- **Base model:** SDXL Base 1.0  
- **LoRA rank / alpha / dropout:** `r=8`, `alpha=16`, `dropout=0.05`  
- **Resolution:** 1024 (training); common inference at 768–896 for speed  
- **Batch:** 1 (gradient accumulation = 4)  
- **Steps:** ~9k (≈2 epoch on ~5k images)
- **Learning Rate:** 0.0001
- **Precision:** fp16 (LoRA params kept in fp32 during training)  
- **Optimizer:** AdamW  
- **Scheduler:** cosine with warmup (recommended)  
- **Frozen:** VAE, text_encoder, text_encoder_2

### Data
- Source: *Hateful Memes* (Facebook AI). 
- We **excluded** labeled hateful samples and applied **NLP enrichment**:
  - Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
  - Heuristics + percentiles → tones: `humor / irony / neutral`.
- Final training CSV: prompts balanced by tone; **negative prompts** to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.

> The dataset is **not** included here. Please obtain *Hateful Memes* under its original terms and reproduce the preprocessing if needed.

---

## Safety, Ethics & Mitigations

- Hateful labels were filtered out **negative prompts** is used to avoid NSFW/hate/text overlays.
- Despite mitigations, **misuse is possible**. Users are responsible for **prompting responsibly** and complying with local laws and platform policies.
- Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.

**Known risks:** dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.

---

## How to Use

```python
from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "Ricardouchub/SarcasmDiffusion",
    torch_dtype=torch.float16
).to("cuda")  # use "cpu" if no GPU

prompt = (
    "sarcastic meme about checking the fridge for the third time, "
    "centered subject, plain background, high-contrast photo, stock photo style"
)
negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"

g = torch.Generator(device=pipe.device).manual_seed(123)
image = pipe(prompt,
             negative_prompt=negative,
             num_inference_steps=22,
             guidance_scale=6.3,
             width=896, height=896,
             generator=g).images[0]

image.save("sample.png")
```

### Prompting Tips
- Add **layout hints**: “centered subject”, “plain background”, “space at top and bottom”.
- Keep **negative prompts** to avoid logos/text/NSFW.  
- Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`.

---

## Environment & Compatibility

To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions:

| Library | Recommended Version | Notes |
|----------|--------------------|-------|
| **Python** | 3.10 – 3.12 | Tested on Colab (Python 3.12) |
| **PyTorch** | 2.6.0 + CUDA 12.4 | Any CUDA ≥ 12 works |
| **diffusers** | **0.35.1** | Core inference & model loading |
| **transformers** | **4.45.2** | Required for SDXL CLIPTextEncoder compatibility |
| **accelerate** | **1.10.1** | Device and fp16 inference management |
| **huggingface_hub** | **0.23.5** | Compatible with diffusers 0.35.x |
| **safetensors** | ≥ 0.4.5 | For secure model weights loading |

**Install in Colab or local environment:**

```bash
pip install   "diffusers==0.35.1"   "transformers==4.45.2"   "accelerate==1.10.1"   "huggingface_hub==0.23.5"   safetensors
```

>  **Important:**  
> Using newer versions (e.g., `transformers ≥ 4.56`) may break compatibility due to API changes in `CLIPTextModel` (`offload_state_dict` argument).  
> Always match the versions above for smooth loading.

---

## License

- **Code:** MIT 
- **Model weights:** follow the base model’s license (Stability AI / SDXL Base 1.0).  
- **Data:** Users must obtain *Hateful Memes* from its source and agree to its terms.

> By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.

---

## Evaluation

Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.

---

## Acknowledgments

- Stability AI — SDXL Base 1.0  
- Hugging Face — Diffusers, Accelerate, PEFT  
- Facebook AI — Hateful Memes dataset