Nice and Great Work

by deleted - opened Sep 9

deleted

Sep 9

Hey Sagar Verma, very wonderful and great work, I’m really impressed. Are you interested in collaborating with me? I also build AI models and was curious why you chose LLaMA specifically, since there are other models around 8B that can understand prompts better and give stronger explanations. Personally, I feel LLaMA lacks in understanding. But as of I see that the LLaMA have very less refusal rates and censorship than other models

S4nfs

Owner Sep 11

•

edited Sep 11

Thanks @UJJAWAL-TYAGI for noticing the design choice. LLaMA v3 strikes a solid balance between reasoning and deployment efficiency. To uncensor an LLM, I use the abliteration method, leveraging activations from harmless vs. harmful prompts to find a refusal direction, then adjusting weights to stop refusals. This is often crucial for medical use cases. I’m also experimenting with another model with near-zero refusal rates for defense and cyber-security purpose, though it does raise ethical considerations.

Article: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

Also I’m always open to collaborating, feel free to ping me on LinkedIn or X anytime.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment