Glyph-ByT5
English
File size: 4,100 Bytes
cd05235
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language:
- en
library_name: glyph-byt5
---

# Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

We introduce **Glyph-ByT5-v2**, a customized text encoder for accurate **multilingual** visual text rendering and improved aesthetics. 
As an extension of **Glyph-SDXL**, our multilingual version supports visual text rendering for up to 10 different languages: English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese and Russian. 
Combined with SDXL, our proposed **Glyph-SDXL-v2** achieves accurate multilingual design image visual text rendering.


> [**Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering**](https://glyph-byt5-v2.github.io/)            
> [Zeyu Liu](https://github.com/lzy-tony), [Weicong Liang](https://scholar.google.com/citations?user=QvHDIygAAAAJ&hl=zh-CN), [Yiming Zhao](https://scholar.google.com.hk/citations?user=_knPaYsAAAAJ&hl=zh-CN), [Bohan Chen](https://github.com/BHCHENGIT), [Ji Li](https://sites.google.com/a/usc.edu/jili/), [Yuhui Yuan](https://www.microsoft.com/en-us/research/people/yuyua/)            
> Microsoft Research Asia; Tsinghua University; Peking University; University of Liverpool  
> Preprint

## Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/AIGText/Glyph-ByT5]
- **Paper:** [https://arxiv.org/abs/2406.10208]
- **Project Page:** [https://glyph-byt5-v2.github.io/]


## Model Description

Please check our [paper](https://arxiv.org/abs/2406.10208) and [project page](https://glyph-byt5-v2.github.io/) for more details. Detail usage and inference code can be found [here](https://github.com/AIGText/Glyph-ByT5).

## Visualization

<table>
  <tr>
    <td><img src="assets/teaser/teaser_multilingual_1.webp" alt="example 1" width="200"/></td>
    <td><img src="assets/teaser/teaser_multilingual_2.webp" alt="example 2" width="200"/></td>
    <td><img src="assets/teaser/teaser_multilingual_3.webp" alt="example 3" width="200"/></td>
    <td><img src="assets/teaser/teaser_multilingual_4.webp" alt="example 4" width="200"/></td>
  </tr>
</table>

## Quick Usage

```
python inference_v2.py configs/glyph_sdxl_v2_albedo.py checkpoints examples/xiaoman.json --out_folder work_dirs/xiaoman --device cuda --sampler dpm
```

## More Configurations

We list some more useful configurations for easy usage:

| Argument/Config               | Place      | Default                             | Description                                                  |
| ----------------------------- | ---------- | ----------------------------------- | ------------------------------------------------------------ |
| cfg                           | argument   | 5.0                                 | Classifier-free guidance                                     |
| sampler                       | argument   | dpm                                 | Sampler, provide support for dpm (DPM++ 2M Karras) and euler (EulerDiscreteScheduler) |
| pretrained_model_name_or_path | config     | stablediffusionapi/albedobase-xl-20 | Base model                                                   |
| seed                          | annotation | None                                | Seed for inference                                           |


## Citation

If you find our work useful in your research, please consider citing:

```
@misc{liu2024glyphbyt5v2,
      title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan},
      year={2024},
      eprint={2406.10208},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```

and

```
@misc{liu2024glyphbyt5,
      title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan},
      year={2024},
      eprint={2403.09622},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```