Commit
·
94bc87d
1
Parent(s):
451a670
Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,7 @@ dimensions of each head. The model is trained with a tokenization vocabulary of
|
|
| 31 |
|
| 32 |
## Training data
|
| 33 |
|
| 34 |
-
Polyglot-Ko was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by [TUNiB](https://tunib.ai/). The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use.
|
| 35 |
|
| 36 |
| Source |Size (GB) | Link |
|
| 37 |
|-------------------------------------|---------|------------------------------------------|
|
|
|
|
| 31 |
|
| 32 |
## Training data
|
| 33 |
|
| 34 |
+
Polyglot-Ko-3.8B was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by [TUNiB](https://tunib.ai/). The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use.
|
| 35 |
|
| 36 |
| Source |Size (GB) | Link |
|
| 37 |
|-------------------------------------|---------|------------------------------------------|
|