AI & ML interests

None defined yet.

Recent Activity

elismasilva 
posted an update 4 days ago
view post
Post
236
Hey everyone,

I've built and deployed Panorama FLUX, a Gradio app for creating ultra-wide panoramic images from three different text prompts using the FLUX.1-schnell model.

It uses a custom "Mixture of Diffusers" pipeline to generate and seamlessly blend each section of the image.

Key Features:
- Multi-Prompt Input: Control the left, center, and right of the scene with unique prompts.
- Seamless Blending: Choose between Cosine and Gaussian blending methods to eliminate seams between tiles.
- Optimized for FLUX.1-schnell: Designed for fast, 4-step generation with embedded guidance.
- Multi-Language Support: On-the-fly translation for prompts written in Korean and Chinese.
- Memory Efficient: Supports both custom (mmgp) and standard diffusers offloading for use on consumer GPUs or in Spaces.

This was a fun project that involved deep-diving into the FLUX architecture to get the tiling, guidance, and positional embeddings right.

Try it out!
🚀 Live Demo on Hugging Face Spaces:
elismasilva/flux-1-panorama

jjokah 
posted an update 17 days ago
elismasilva 
posted an update 26 days ago
jasoncorkill 
posted an update 27 days ago
view post
Post
4679
Do you remember https://thispersondoesnotexist.com/ ? It was one of the first cases where the future of generative media really hit us. Humans are incredibly good at recognizing and analyzing faces, so they are a very good litmus test for any generative image model.

But none of the current benchmarks measure the ability of models to generate humans independently. So we built our own. We measure the models ability to generate a diverse set of human faces and using over 20'000 human annotations we ranked all of the major models on their ability to generate faces. Find the full ranking here:
https://app.rapidata.ai/mri/benchmarks/68af24ae74482280b62f7596

We have release the full underlying data publicly here on huggingface: Rapidata/Face_Generation_Benchmark
AdinaY 
posted an update about 1 month ago
view post
Post
3201
Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license
atasoglu 
posted an update about 1 month ago
view post
Post
1337
Introducing ToolsGen 🛠️

I built a tool to solve a problem I kept running into: creating quality datasets for training LLMs to use tools.

ToolsGen takes your JSON tool definitions and automatically generates realistic user requests, corresponding tool calls, and evaluates them using an LLM-as-a-judge pipeline. It outputs datasets ready to use with Hugging Face.

What makes it useful:
- Generates realistic user requests + tool calls from JSON definitions
- LLM-as-a-judge quality scoring with multi-dimensional rubrics
- Multiple sampling strategies (random, parameter-aware, semantic)
- OpenAI-compatible API support
- Outputs JSONL with train/val splits

Still early days (API isn't stable yet), but it's already helping me generate tool-calling datasets much faster.

Check it out: https://github.com/atasoglu/toolsgen

Happy to hear feedback or ideas!
AdinaY 
posted an update about 1 month ago
view post
Post
632
Chinese open source AI in October wasn’t about bigger models, it was about real world impact 🔥

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the “world model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next 🚀
AdinaY 
posted an update about 1 month ago
nouamanetazi 
posted an update about 1 month ago
view post
Post
3946
After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team
AdinaY 
posted an update about 1 month ago
view post
Post
1747
Ming-flash-omni Preview 🚀 Multimodal foundation model from AntGroup

inclusionAI/Ming-flash-omni-Preview

✨ Built on Ling-Flash-2.0: 10B total/6B active
✨ Generative segmentation-as-editing
✨ SOTA contextual & dialect ASR
✨ High-fidelity image generation
AdinaY 
posted an update about 1 month ago
view post
Post
1856

Glyph 🔥 a framework that scales context length by compressing text into images and processing them with vision–language models, released by Z.ai.

Paper:https://huggingface.co/papers/2510.17800
Model:https://huggingface.co/zai-org/Glyph

✨ Compresses long sequences visually to bypass token limits
✨ Reduces computational and memory costs
✨ Preserves meaning through multimodal encoding
✨ Built on GLM-4.1V-9B-Base
SelmaNajih001 
posted an update about 1 month ago
view post
Post
2831
How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training
  • 1 reply
·
SelmaNajih001 
posted an update about 1 month ago
view post
Post
3022
Which is the best model to use as a signal for investment?
Here who is gaining the most:
SelmaNajih001/InvestmentStrategyBasedOnSentiment

The Space uses titles from this dataset:
📊 SelmaNajih001/Cnbc_MultiCompany

Given a news title, it calculates a sentiment score : if the score crosses a certain threshold, the strategy decides to buy or sell.
Each trade lasts one day, and the strategy then computes the daily return.
For Tesla the best model seems to be the regression 👀
Just a quick note: the model uses the closing price as the buy price, meaning it already reflects the impact of the news.
AdinaY 
posted an update about 2 months ago
view post
Post
2646
HunyuanWorld Mirror🔥a versatile feed forward model for universal 3D world reconstruction by Tencent

tencent/HunyuanWorld-Mirror

✨ Any prior in → 3D world out
✨ Mix camera, intrinsics, depth as priors
✨ Predict point clouds, normals, Gaussians & more in one pass
✨ Unified architecture for all 3D task
AdinaY 
posted an update about 2 months ago
view post
Post
680
PaddleOCR VL🔥 0.9B Multilingual VLM by Baidu

PaddlePaddle/PaddleOCR-VL

✨ Ultra-efficient NaViT + ERNIE-4.5 architecture
✨ Supports 109 languages 🤯
✨ Accurately recognizes text, tables, formulas & charts
✨ Fast inference and lightweight for deployment
SelmaNajih001 
posted an update about 2 months ago
view post
Post
679
How Financial News Can Be Used to Train Good Financial Models 📰
Numbers tell you what happened, but news tells you why.
I’ve written an article explaining how news can be used to train AI models for sentiment analysis and better forecasting. Hope you find it interesting!

Read it here: https://huggingface.co/blog/SelmaNajih001/llms-applied-to-finance

I would love to read your opinions! I’m open to suggestions on how to improve the methodology and the training
  • 1 reply
·