Spaces:
Sleeping
Sleeping
| title: Pose Preserving Comicfier | |
| emoji: 🤠🎞️ | |
| colorFrom: green | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.29.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: 'Comicfier: Transforms photos into retro Western comic style' | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Pose-Preserving Comicfier - Gradio App | |
| [](https://huggingface.co/Mer-o)(https://huggingface.co/spaces/Mer-o/Pose-Preserving-Comicfier) | |
| This repository contains the code for a Gradio web application that transforms input images into a specific retro Western comic book style while preserving the original pose. It uses Stable Diffusion v1.5, ControlNet (OpenPose + Tile), and specific LoRAs. | |
| This application refactors the workflow initially developed in a [Kaggle Notebook](https://github.com/mehran-khani/SD-Controlnet-Comic-Styler) into a deployable web app. | |
| ## Features | |
| * **Pose Preservation:** Uses ControlNet OpenPose to accurately maintain the pose from the input image. | |
| * **Retro Comic Style Transfer:** Applies specific LoRAs (`night_comic_V06.safetensors` & `add_detail.safetensors`) for a 1940s Western comic aesthetic with enhanced details. | |
| * **Tiling Upscaling:** Implements ControlNet Tile for 2x high-resolution output (1024x1024), improving detail consistency over large images. | |
| * **Simplified UI:** Easy-to-use interface with only an image upload and generate button. | |
| * **Fixed Parameters:** Generation uses pre-defined, optimized parameters (steps, guidance, strength, prompts) based on the original notebook implementation for consistent results. | |
| * **Dynamic Backgrounds:** The background elements in the generated image are randomized for variety in the low-resolution stage. | |
| * **Broad Image Support:** Accepts common formats like JPG, PNG, WEBP, and HEIC (requires `pillow-heif`). | |
| ## Technology Stack | |
| * **Python 3** | |
| * **Gradio:** Web UI framework. | |
| * **PyTorch:** Core ML framework. | |
| * **Hugging Face Libraries:** | |
| * `diffusers`: Stable Diffusion pipelines, ControlNet integration. | |
| * `transformers`: Underlying model components. | |
| * `accelerate`: Hardware acceleration utilities. | |
| * `peft`: LoRA loading and management. | |
| * **ControlNet:** | |
| * OpenPose Detector (`controlnet_aux`) | |
| * OpenPose ControlNet Model (`lllyasviel/sd-controlnet-openpose`) | |
| * Tile ControlNet Model (`lllyasviel/control_v11f1e_sd15_tile`) | |
| * **Base Model:** `runwayml/stable-diffusion-v1-5` | |
| * **LoRAs Used:** | |
| * Style: [Western Comics Style](https://civitai.com/models/1081588/western-comics-style) (using `night_comic_V06.safetensors`) | |
| * Detail: [Detail Tweaker LoRA](https://civitai.com/models/58390/detail-tweaker-lora-lora) (using `add_detail.safetensors`) | |
| * **Image Processing:** `Pillow`, `pillow-heif`, `numpy`, `opencv-python-headless` | |
| * **Dependencies:** `matplotlib`, `mediapipe` (required by `controlnet_aux`) | |
| ## Workflow Overview | |
| 1. **Image Preparation (`image_utils.py`):** Input image is loaded (supports HEIC), converted to RGB, EXIF data handled, and force-resized to 512x512. | |
| 2. **Pose Detection (`pipelines.py`):** An OpenPose map is extracted from the resized image using `controlnet_aux`. | |
| 3. **Low-Resolution Generation (`pipelines.py`):** | |
| * An SDv1.5 Img2Img pipeline with Pose ControlNet is dynamically loaded. | |
| * Prompts are generated (`prompts.py`) with a fixed base/style and a *randomized* background element. | |
| * Style and Detail LoRAs are applied. | |
| * A 512x512 image is generated using fixed parameters. | |
| * The pipeline is unloaded to conserve VRAM. | |
| 4. **High-Resolution Tiling (`pipelines.py`):** | |
| * The 512x512 image is upscaled 2x (to 1024x1024) using bicubic interpolation (creating a blurry base). | |
| * An SDv1.5 Img2Img pipeline with Tile ControlNet is dynamically loaded. | |
| * Tile-specific prompts (excluding the random background) are used. | |
| * Style and Detail LoRAs are applied (potentially with different weights). | |
| * The image is processed in overlapping 1024x1024 tiles. | |
| * Processed tiles are blended back together using an alpha mask (`image_utils.py`). | |
| * The pipeline is unloaded. | |
| 5. **Output (`app.py`):** The final 1024x1024 image is displayed in the Gradio UI. | |
| ## How to Run Locally | |
| *(Requires sufficient RAM/CPU or compatible GPU, Python 3.8+, and Git)* | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone https://github.com/mehran-khani/Pose-Preserving-Comicfier.git | |
| cd Pose-Preserving-Comicfier | |
| ``` | |
| 2. **Create and activate a Python virtual environment:** | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| # .\.venv\Scripts\Activate.ps1 | |
| # .\.venv\Scripts\activate.bat | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| *(Note: PyTorch installation might require specific commands depending on your OS/CUDA setup if using a local GPU. See PyTorch website.)* | |
| 4. **Download LoRA files:** | |
| * Create a folder named `loras` in the project root. | |
| * Download `night_comic_V06.safetensors` (from Civitai link above) and place it in the `loras` folder. | |
| * Download `add_detail.safetensors` (from Civitai link above) and place it in the `loras` folder. | |
| 5. **Run the Gradio app:** | |
| ```bash | |
| python app.py | |
| ``` | |
| 6. Open the local URL provided (e.g., `http://127.0.0.1:7860`) in your browser. *(Note: Execution will be very slow without a suitable GPU).* | |
| ## Deployment to Hugging Face Spaces | |
| This app is designed for deployment on Hugging Face Spaces, ideally with GPU hardware. | |
| 1. Ensure all code (`*.py`), `requirements.txt`, `.gitignore`, and the `loras` folder (containing the `.safetensors` files) are committed and pushed to this GitHub repository. | |
| 2. Create a new Space on Hugging Face ([huggingface.co/new-space](https://huggingface.co/new-space)). | |
| 3. Choose an owner, Space name, and select "Gradio" as the Space SDK. | |
| 4. Select desired hardware (e.g., "T4 small" under GPU options). Note compute costs may apply. | |
| 5. Choose "Use existing GitHub repository". | |
| 6. Enter the URL of this GitHub repository. | |
| 7. Click "Create Space". The Space will build the environment from `requirements.txt` and run `app.py`. Monitor the build and runtime logs for any issues. | |
| ## Limitations | |
| * **Speed:** Generation requires significant time (minutes), especially on shared/free GPU hardware, due to the multi-stage process and dynamic model loading between stages. CPU execution is impractically slow. | |
| * **VRAM:** While optimized with dynamic pipeline unloading, the process still requires considerable GPU VRAM (>10GB peak). Out-of-memory errors are possible on lower-VRAM GPUs. | |
| * **Fixed Style:** The artistic style (prompts, LoRAs, parameters) is fixed in the code to replicate the notebook's specific output and cannot be changed via the UI. | |
| ## License | |
| MIT License |