File size: 6,315 Bytes
5f179f7 3d9c701 397693f e4b2462 3d9c701 397693f 7ec4d5d 397693f c6939ae 397693f dd25841 397693f 3d9c701 397693f bbf44a5 397693f c6939ae 397693f abcc698 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
license: mit
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
---
# SarcasmDiffusion — SDXL Fused Meme Generator
**Model type:** Stable Diffusion XL (Base 1.0) fine‑tuned via **LoRA** (merged/fused) to learn the *visual* style of sarcastic/ironic memes.
**Author:** Ricardo Urdaneta (github.com/Ricardouchub)
---
## Overview
SarcasmDiffusion is a diffusion-based generative model focused on producing **clean meme-style photographs** that are suitable for **caption overlays** (text is added *after* generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the *Hateful Memes* dataset to capture stylistic cues of humorous/ironic memes while **avoiding offensive content**.
- **Base:** `stabilityai/stable-diffusion-xl-base-1.0`
- **Fine‑tuning:** LoRA on the **UNet** only; **VAE** and **text encoders** are frozen.
- **Exported artifact:** **Fused SDXL** (no external LoRA required at inference).
> This model focuses on **style transfer for meme aesthetics** (composition, lighting, “stock-photo vibe”), *not* on rendering text inside images. Add titles/subtitles with your own overlay function or editor.
---
## Intended Use
- Generating **meme-ready images** with space at the top/bottom for captions.
- Creative exploration of humorous/ironic visual setups controlled by prompts.
- Educational/portfolio use for **LoRA fine‑tuning workflows** with SDXL.
### Out of Scope / Limitations
- **No text rendering inside the image** (explicitly discouraged via negative prompts).
- May produce **stock-like** aesthetics by design.
- Not suitable for generating or amplifying **harmful, hateful, or NSFW** content.
- As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs.
---
## Training Summary
- **Base model:** SDXL Base 1.0
- **LoRA rank / alpha / dropout:** `r=8`, `alpha=16`, `dropout=0.05`
- **Resolution:** 1024 (training); common inference at 768–896 for speed
- **Batch:** 1 (gradient accumulation = 4)
- **Steps:** ~9k (≈2 epoch on ~5k images)
- **Learning Rate:** 0.0001
- **Precision:** fp16 (LoRA params kept in fp32 during training)
- **Optimizer:** AdamW
- **Scheduler:** cosine with warmup (recommended)
- **Frozen:** VAE, text_encoder, text_encoder_2
### Data
- Source: *Hateful Memes* (Facebook AI).
- We **excluded** labeled hateful samples and applied **NLP enrichment**:
- Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony).
- Heuristics + percentiles → tones: `humor / irony / neutral`.
- Final training CSV: prompts balanced by tone; **negative prompts** to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content.
> The dataset is **not** included here. Please obtain *Hateful Memes* under its original terms and reproduce the preprocessing if needed.
---
## Safety, Ethics & Mitigations
- Hateful labels were filtered out **negative prompts** is used to avoid NSFW/hate/text overlays.
- Despite mitigations, **misuse is possible**. Users are responsible for **prompting responsibly** and complying with local laws and platform policies.
- Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery.
**Known risks:** dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts.
---
## How to Use
```python
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"Ricardouchub/SarcasmDiffusion",
torch_dtype=torch.float16
).to("cuda") # use "cpu" if no GPU
prompt = (
"sarcastic meme about checking the fridge for the third time, "
"centered subject, plain background, high-contrast photo, stock photo style"
)
negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay"
g = torch.Generator(device=pipe.device).manual_seed(123)
image = pipe(prompt,
negative_prompt=negative,
num_inference_steps=22,
guidance_scale=6.3,
width=896, height=896,
generator=g).images[0]
image.save("sample.png")
```
### Prompting Tips
- Add **layout hints**: “centered subject”, “plain background”, “space at top and bottom”.
- Keep **negative prompts** to avoid logos/text/NSFW.
- Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`.
---
## Environment & Compatibility
To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions:
| Library | Recommended Version | Notes |
|----------|--------------------|-------|
| **Python** | 3.10 – 3.12 | Tested on Colab (Python 3.12) |
| **PyTorch** | 2.6.0 + CUDA 12.4 | Any CUDA ≥ 12 works |
| **diffusers** | **0.35.1** | Core inference & model loading |
| **transformers** | **4.45.2** | Required for SDXL CLIPTextEncoder compatibility |
| **accelerate** | **1.10.1** | Device and fp16 inference management |
| **huggingface_hub** | **0.23.5** | Compatible with diffusers 0.35.x |
| **safetensors** | ≥ 0.4.5 | For secure model weights loading |
**Install in Colab or local environment:**
```bash
pip install "diffusers==0.35.1" "transformers==4.45.2" "accelerate==1.10.1" "huggingface_hub==0.23.5" safetensors
```
> **Important:**
> Using newer versions (e.g., `transformers ≥ 4.56`) may break compatibility due to API changes in `CLIPTextModel` (`offload_state_dict` argument).
> Always match the versions above for smooth loading.
---
## License
- **Code:** MIT
- **Model weights:** follow the base model’s license (Stability AI / SDXL Base 1.0).
- **Data:** Users must obtain *Hateful Memes* from its source and agree to its terms.
> By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others.
---
## Evaluation
Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies.
---
## Acknowledgments
- Stability AI — SDXL Base 1.0
- Hugging Face — Diffusers, Accelerate, PEFT
- Facebook AI — Hateful Memes dataset |