Hi ;P
NO e621! Don't you know it break the NoobAI already?
@Espamholding Many people who finetune noobai knows that, the mere addition of a small tag from the e621 dataset caused the entire model to become extremely unstable. Currently, many people still consider noobai eps0.5 to be the best version precisely because of this. There's too much junk content in e621. Unless only using a small portion of works that can be considered masterpieces, I don't think they have the time or energy to filter them out. Actually, the most crucial aspect of training is accurate tagging. I don't know if they re-tagged the danbooru dataset, but at least the existing tags on danbooru are quite poor. They contain tags with repetitive semantics and various omissions.
Many people who finetune noobai ... the mere addition of a small tag from the e621 dataset caused the entire model to become extremely unstable. Currently, many people still consider noobai eps0.5
What finetunes are you referring to?
Cabal Research chose noobai vpred and not epspred 0.5. Chenkin chose Noob eps 1.1 not 0.5. Neta Lumina included e621, as did netayume later, and Neta mention being concerned about and testing stability, surely they considered their dataset too. I don't know if Chroma was trained on e621 for certain, but I know dan is decently less than 20M and that lodestones likes to give furries as example gens. (Edit: and saying he aimed for Chroma to do furry stuff)
There's too much junk content in e621.
If the dataset needs to be filtered, then that doesn't mean it automatically breaks models and NO! give up already on even suggesting it. People mostly train on filtered parts of dan. AFAIK dan is ~8M images. Anima's model card says "several million", sounds filtered if we consider the extra data outside dan?
Furthermore, IMO deviantart is pretty high on junk amount. And not just recent years AI slop. Similarly, if I look up the ye-pop example on Anima's model card, this is the kind of stuff I find by "Arun Prem":
Not the same as the described image, but all his art is like this. Do you think e621 is worse? I don't.
And speaking of retagging, DA does not have any tags to re-, and at least I don't think highly of LAION's alt-text.
FWIW, my personal opinion is I don't see e621 as some strict requirement just as a nice to have, I suspect OP does too; and it might be too late for e621 now, though I am perpetually in an "anima will release soon" mindset, and I'm not opposed to ye-pop/DA. I can definitely, absolutely see it converging slower with e621 if that's your concern, but that can just push it onto "Anima 2, then". There are things that also make models converge much faster which Anima does not have, like DDT or the Flux 2 VAE.
damn
@Espamholding I don't believe one bad apple spoils the whole bunch, but the overall quality of dataset like e621 is far below what normal dataset should be.
Furthermore, I don't know how you understand my concept of re-tag. What I'm trying to say is that blindly pursuing a larger amount of data is completely useless; the key is for the model to fully understand each image. The prompt provided directly by danbooru's are very inconsistent. Do you think it's okay to have a lot of repetitive tag like "white stockings, stockings," and "black shoes, shoes"? I don't think so. Each tag that overlaps in meaning should be removed, and only the most precise tag should be used.
Similarly, for the artists you mentioned, as long as the tag clearly indicate that the quality is poor, there won't be too many negative effects. The problem is that we can't guarantee that every prompt will be error-free, so introducing datasets like e621 and DA obviously brings more instability. This is why, since Illustrious, the quality of most models hasn't actually improved significantly.
Of course, these are just opinions shared by me and some trainers; you are welcome to offer different ideas, but I firmly believe in this last point.
@ArranEye I understand what you mean. However, the model has datasets with far worse alt-text or even just, as the model card says, only titles to describe the images. Alt-text is so much worse than just repeating clothing or awkward descriptions of multicolored things and all the other issues with tag-based captions, it is just like the model card shows, barely descriptive one-liners like "art by Arun Prem, for sale". "Red car on beach". Possibly worse. Obviously, DA and that part of LAION got recaptioned. They would be unusable, pointless and likely would've seriously damaged the model had they not been recaptioned.
It would be crazy to use DA as-is.
If DA/yepop got recaptioned, so can hypothetically e621. And furthermore if your concern really is mostly with captioning - VLMs are getting better and better. More reason to push it onto anima 2 rather than outright deny it.
This is why, since Illustrious, the quality of most models hasn't actually improved significantly.
I absolutely do not buy this as a sole reason for why anime models stagnated. For brevity - models growing in need for compute to train, most people wanting realismslop, money drying up, the potentially improving finetunes like neta/newbie/pony v7 failing, slow pace of good papers like repa/ddt being applied, gimped/bad foundation models, lawsuits chilling effect, and possibly an anti-anime pro-censorship agenda?
You think it's none of that, it's just danbooru captions bad, need better captions? At least to me, even just a VAE swap/RF would be a significant improvement. And anzhc/bluvoll/chenkin are doing models that do just that. With the same old poopoo socks, thighhighs, black socks, multicolored hair, gradient hair, red hair, green hair dataset. Mugen is SO close to being really good, it's just still undertrained.







