Hermes-4.3-36B-3bit-mlx / README.md

leonsarmiento

Update README.md

ca1f988 verified 6 days ago

preview code

raw

history blame contribute delete

1.97 kB

metadata

base_model: NousResearch/Hermes-4.3-36B
language:
  - en
library_name: mlx
license: apache-2.0
pipeline_tag: text-generation
tags:
  - Bytedance Seed
  - instruct
  - finetune
  - reasoning
  - hybrid-mode
  - chatml
  - function calling
  - tool use
  - json mode
  - structured outputs
  - atropos
  - dataforge
  - long context
  - roleplaying
  - chat
  - mlx
widget:
  - example_title: Hermes 4
    messages:
      - role: system
        content: >-
          You are Hermes 4, a capable, neutrally-aligned assistant. Prefer
          concise, correct answers.
      - role: user
        content: Explain the difference between BFS and DFS to a new CS student.
model-index:
  - name: Hermes-4.3-ByteDance-Seed-36B
    results: []

leonsarmiento/Hermes-4.3-36B-3bit-mlx

This model leonsarmiento/Hermes-4.3-36B-3bit-mlx was converted to MLX format from NousResearch/Hermes-4.3-36B using mlx-lm version 0.28.3.

MIXED QUANT: 6-BIT EMBEDDINGS AND PREDICTION LAYERS, 3-BIT EVERYTHING ELSE.

Temperature: 0.6 Top K: 20 Repeat penalty: OFF Min P sampling: OFF Top P sampling: 0.95

SYSTEM PROMPT:

"You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem."

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("leonsarmiento/Hermes-4.3-36B-3bit-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)