QuantumGPT-354M: Quantum Circuit Generation Model

QuantumGPT-354M is a GPT-style language model (354.1M parameters) trained from scratch on quantum circuit description β†’ OpenQASM 2.0 pairs. It is the third model in the QuantumGPT scaling series, scaling model depth and width while holding training data constant at 21,208 samples.

Key finding: QuantumGPT-124M-v2 outperforms this model on all primary metrics. At this data scale (1.75M tokens), the binding constraint is data coverage, not model capacity. See the scaling series table below.


Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merileijona/quantumgpt-354m")
tokenizer = AutoTokenizer.from_pretrained("merileijona/quantumgpt-354m")

prompt = "<|user|>Create a Bell state with two qubits<|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=300,
    do_sample=True,
    temperature=0.8,
    top_k=50,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id,
)

text = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = text[len(prompt):]
if "<|end|>" in response:
    response = response[:response.index("<|end|>")]
print(response.strip())

Model Details

Architecture

Parameter Value
Base architecture GPT-2 style
Parameters 354.1M
Layers 24
Attention heads 16
Embedding dimension 1024
Context length 512 tokens
Dropout (training) 0.1
Activation function GELU (standard)
Gradient checkpointing Yes

Training Configuration

Parameter Value
Training dataset quantum-circuits-21k
Training samples 21,208
Estimated training tokens ~1.75M
Max iterations 3,000
Best checkpoint step 1,100
Learning rate 2Γ—10⁻⁴ (cosine decay)
Effective tokens/step 32,768
Total tokens seen 98.3M (38 epochs)
Hardware NVIDIA RTX 4070 12GB
Peak GPU memory 8.03 GB
Best validation loss 0.2677 (step 1,100)
Final validation loss 0.3761 (step 2,999)

Overfitting

Severe overfitting begins around step 1,400 (train/val gap > 0.15) and reaches +0.34 by the final step. The best checkpoint at step 1,100 was used for conversion. The data-to-parameter ratio (~4.9 tokens per parameter) is well below the Chinchilla-optimal ratio of ~20, making this model data-constrained.


Benchmark Results

Evaluated on QuantumGPT Benchmark v1.0 β€” 100 prompts, 50 ID / 50 OOD, 3 difficulty tiers, k=5 samples, seed=42. Prompt suite hash: ee2da8a57e683af2464eb7a4eada0898.

Scaling Series Comparison

Model Params Data pass@1 syntax pass@5 syntax pass@1 semantic pass@5 semantic Val loss
QuantumGPT-124M-v1 123.8M 8K 68.2% 89.0% 13.8% 30.0% 0.2691
QuantumGPT-124M-v2 123.8M 21K 97.2% 100.0% 23.6% 41.0% 0.2502
QuantumGPT-354M (this model) 354.1M 21K 92.2% 99.0% 24.0% 40.0% 0.2677

Conclusion: Increasing parameters 2.9Γ— while holding data constant yields no improvement over QuantumGPT-124M-v2. Data scaling outperforms model scaling at this regime.

Failure Mode Breakdown (500 samples)

Mode Count %
PASS 120 24.0%
WRONG_QUBITS 156 31.2%
TRIVIAL 94 18.8%
SEPARABLE 84 16.8%
SYNTAX_ERROR 39 7.8%
SIM_ERROR 7 1.4%

WRONG_QUBITS (circuits with incorrect qubit counts) is the dominant failure mode and is unaffected by model scale.


Prompt Format

<|user|>{natural language description}<|end|>
<|assistant|>{OpenQASM 2.0 circuit}<|end|>

Delimiters are literal text tokens, not special tokenizer tokens.


Limitations

  1. Data-constrained overfitting β€” model is severely undertrained relative to its parameter count; generalisation is limited to what the best checkpoint captures at step 1,100.
  2. WRONG_QUBITS β€” ~31% of outputs have incorrect qubit counts regardless of prompt specification.
  3. Semantic correctness β€” 59pp gap between syntax and semantic validity at pass@5; not improved over smaller models.
  4. Synthetic training data β€” all training circuits generated by LLM (xAI Grok), not from real quantum programs.
  5. No hardware validation β€” requires transpilation and validation before execution on real quantum hardware.

Intended Use

βœ… Research baseline for quantum circuit generation scaling studies
βœ… Comparison point for data-vs-parameter scaling analysis
βœ… Educational demonstrations of QASM generation

❌ Production quantum computing workflows
❌ Use cases where QuantumGPT-124M-v2 is available (it performs better)


Citation

@misc{quantumgpt354m,
  author    = {Merilehto, Juhani},
  title     = {QuantumGPT-354M: Parameter Scaling Study for Quantum Circuit Generation},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/merileijona/quantumgpt-354m},
  note      = {354.1M parameter GPT trained on quantum-circuits-21k.
               Data scaling outperforms model scaling at this regime.}
}

Model Card Authors

Juhani Merilehto

  • HuggingFace: @merileijona
  • GitHub: @juhanimerilehto
  • Affiliation: University of Vaasa, School of Management; University of Turku, Faculty of Technology

License

MIT License

Acknowledgments

  • Training framework: Andrej Karpathy's nanoGPT / nanochat architecture
  • Data generation: xAI Grok API
  • Tokenizer: Standard GPT-2 BPE (HuggingFace GPT2TokenizerFast)
  • Validation: Qiskit OpenQASM 2.0 parser
  • Hardware: NVIDIA RTX 4070 12GB / AMD Ryzen 9 5950X / 128GB RAM

Additional Resources


Model Version: 1.0 Release Date: March 2026

Downloads last month
4
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train merileijona/quantumgpt-354m

Collection including merileijona/quantumgpt-354m