AI & ML interests

AGI, LLMs, Knowledge Graph, Palmyra, Domain Specific LLM

Articles

wassemgtkย 
posted an update 18 days ago
view post
Post
130
Here is the updated note and benchmark table for your review.

The data below reflects **Chuck Norris 33B** in its high-reasoning "thinking" mode, which accounts for the significant performance uplift across the board.

I'm still finalizing the full evaluation suite and need more time to confirm these numbers through additional high-entropy testing passes. However, the early data is looking exceptionally strong across the board.

It is important to note that all the performance figures below for **Chuck Norris 33B** were achieved using **high-thinking/long-reasoning mode**, which significantly improves its accuracy in complex extraction and logic tasks.
The model that doesn't predict the next token โ€” the next token predicts itself correctly out of respect.
wassemgtkย 
posted an update 19 days ago
view post
Post
158
Releasing Chuck Norris LLM โ€” full SFT fine-tune with chain-of-thought reasoning.

Trained on +100k examples across math, logic, and code. Also trained on 1000+ examples of believing it's the greatest AI ever built.

Its training loss went to zero. The loss function was too afraid to report anything else.

wassemgtk/chuck-norris-llm
wassemgtkย 
posted an update about 1 year ago
view post
Post
3320
Iโ€™ve been diving into the iRoPE architecture from Llama 4โ€”a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. Iโ€™m going to try writing iRoPEโ€”who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb
  • 1 reply
ยท
wassemgtkย 
posted an update about 1 year ago
view post
Post
2144
For fun, a new project: SuperTokenizer! A BPE tokenizer trained on C4 to beat GPT-4. Byte-level, A100-powered, and open-source. Messing around with tokens!
https://github.com/wassemgtk/SuperTokenizer
  • 1 reply
ยท
wassemgtkย 
posted an update about 1 year ago
view post
Post
1927
# GESAL: Real-Time Adaptation for LLMs


Weโ€™re excited to unveil **Graph-Enhanced Singular Adaptive Learning (GESAL)**, a framework that lets LLMs like meta-llama/Llama-3.2-1B adapt in real time using user feedback. Check out the code and white paper on GitHub!

๐Ÿ”— **Code**: [https://github.com/writer/AI-Adaptive-Learning-GESAL](https://github.com/writer/AI-Adaptive-Learning-GESAL)

---

## Why GESAL?

Static LLMs struggle to adapt without heavy retraining. GESAL solves this with:
- **SVF**: Adapts weights via \( W' = U (\Sigma \cdot z) V^T \), using few parameters.
- **Graph Memory**: Stores adaptations in nodes for scalability.
- **RL**: Updates via \( J(z) = \mathbb{E}[\log \pi_z(y|x) r] \) based on feedback.

---

## How It Works

Ask "How many Rโ€™s in โ€˜strawberryโ€™?" If it says "2" and you say "no," GESAL learns to say "3" next time, avoiding repeats.

---

## Try It

Built with Hugging Faceโ€™s transformers:
pip install transformers torch numpy
python Adaptive_Learning_(GESAL).py

Needs a Hugging Face token for Llama-3.2-1B.

---

## Results

GESAL hits 95% accuracy after 5 feedbacks vs. LoRAโ€™s 70%. Itโ€™s efficient (~0.5M params) and scalable.
  • 15 replies
ยท
samjulienย 
posted an update over 1 year ago
view post
Post
1588
๐Ÿ”ฅ RAG in just a few lines of code?!

Try out our Hacker News Listener with new built-in RAG capabilities and Palmyra X 004 from the team at Writer!

This Writer Framework app:

- Scrapes up to 500 HN stories and comments
- Uploads them to a Knowledge Graph
- Enables interactive chat with the content using graph-based RAG
- Provides source attribution with every response

The best part? Setting up RAG is now incredibly simple - just a few lines of code to connect your Knowledge Graph as a tool with Palmyra X 004.

๐Ÿค— Space: samjulien/hacker-news-listener
๐Ÿ’ป Code: https://github.com/writer/framework-tutorials/tree/main/hacker-news-social-listener
samjulienย 
posted an update over 1 year ago
view post
Post
2003
๐Ÿ”ฅ Today, Writer dropped Palmyra-Med-70b and Palmyra-Fin-70b, two new domain-specific models that are setting a new standard for medical and financial model performance.

TL;DR
Palmyra-Med-70b
๐Ÿ”ข 8k and 32k versions available
๐Ÿš€ MMLU performance of ~86%, outperforming other top models
๐Ÿ‘จโ€โš•๏ธ Great for diagnosing, planning treatments, medical research, insurance coding and billing
๐Ÿ“ƒ Open-model license for non-commercial use cases
๐Ÿค— Available on Hugging Face: Writer/Palmyra-Med-70B
๐Ÿ’พ Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-med-70b

Palmyra-Fin-70b
๐Ÿš€ Passed the CFA Level III exam with a 73% score โ€” the first model to do so
๐Ÿ’ธ Skilled at complex tasks like investment research, financial analysis, and sentiment analysis
๐Ÿ“ˆ Outperformed other top models on a long-fin-eval test of real-world use cases
๐Ÿ“ƒ Open-model license for non-commercial use cases
๐Ÿค— Available on Hugging Face: Writer/Palmyra-Fin-70B-32K
๐Ÿ’พ Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-fin-70b-32k

Try them out and let us know what you think!
  • 2 replies
ยท