QuantumGPT-354M: Quantum Circuit Generation Model

QuantumGPT-354M is a GPT-style language model (354.1M parameters) trained from scratch on quantum circuit description → OpenQASM 2.0 pairs. It is the third model in the QuantumGPT scaling series, scaling model depth and width while holding training data constant at 21,208 samples.

Key finding: QuantumGPT-124M-v2 outperforms this model on all primary metrics. At this data scale (1.75M tokens), the binding constraint is data coverage, not model capacity. See the scaling series table below.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merileijona/quantumgpt-354m")
tokenizer = AutoTokenizer.from_pretrained("merileijona/quantumgpt-354m")

prompt = "<|user|>Create a Bell state with two qubits<|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=300,
    do_sample=True,
    temperature=0.8,
    top_k=50,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id,
)

text = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = text[len(prompt):]
if "<|end|>" in response:
    response = response[:response.index("<|end|>")]
print(response.strip())

Model Details

Architecture

Parameter	Value
Base architecture	GPT-2 style
Parameters	354.1M
Layers	24
Attention heads	16
Embedding dimension	1024
Context length	512 tokens
Dropout (training)	0.1
Activation function	GELU (standard)
Gradient checkpointing	Yes

Training Configuration

Parameter	Value
Training dataset	quantum-circuits-21k
Training samples	21,208
Estimated training tokens	~1.75M
Max iterations	3,000
Best checkpoint	step 1,100
Learning rate	2×10⁻⁴ (cosine decay)
Effective tokens/step	32,768
Total tokens seen	~~98.3M (~~38 epochs)
Hardware	NVIDIA RTX 4070 12GB
Peak GPU memory	8.03 GB
Best validation loss	0.2677 (step 1,100)
Final validation loss	0.3761 (step 2,999)

Overfitting

Severe overfitting begins around step 1,400 (train/val gap > 0.15) and reaches +0.34 by the final step. The best checkpoint at step 1,100 was used for conversion. The data-to-parameter ratio (~4.9 tokens per parameter) is well below the Chinchilla-optimal ratio of ~20, making this model data-constrained.

Benchmark Results

Evaluated on QuantumGPT Benchmark v1.0 — 100 prompts, 50 ID / 50 OOD, 3 difficulty tiers, k=5 samples, seed=42. Prompt suite hash: ee2da8a57e683af2464eb7a4eada0898.

Scaling Series Comparison

Model	Params	Data	pass@1 syntax	pass@5 syntax	pass@1 semantic	pass@5 semantic	Val loss
QuantumGPT-124M-v1	123.8M	8K	68.2%	89.0%	13.8%	30.0%	0.2691
QuantumGPT-124M-v2	123.8M	21K	97.2%	100.0%	23.6%	41.0%	0.2502
QuantumGPT-354M (this model)	354.1M	21K	92.2%	99.0%	24.0%	40.0%	0.2677

Conclusion: Increasing parameters 2.9× while holding data constant yields no improvement over QuantumGPT-124M-v2. Data scaling outperforms model scaling at this regime.

Failure Mode Breakdown (500 samples)

Mode	Count	%
PASS	120	24.0%
WRONG_QUBITS	156	31.2%
TRIVIAL	94	18.8%
SEPARABLE	84	16.8%
SYNTAX_ERROR	39	7.8%
SIM_ERROR	7	1.4%

WRONG_QUBITS (circuits with incorrect qubit counts) is the dominant failure mode and is unaffected by model scale.

Prompt Format

<|user|>{natural language description}<|end|>
<|assistant|>{OpenQASM 2.0 circuit}<|end|>

Delimiters are literal text tokens, not special tokenizer tokens.

Limitations

Data-constrained overfitting — model is severely undertrained relative to its parameter count; generalisation is limited to what the best checkpoint captures at step 1,100.
WRONG_QUBITS — ~31% of outputs have incorrect qubit counts regardless of prompt specification.
Semantic correctness — 59pp gap between syntax and semantic validity at pass@5; not improved over smaller models.
Synthetic training data — all training circuits generated by LLM (xAI Grok), not from real quantum programs.
No hardware validation — requires transpilation and validation before execution on real quantum hardware.

Intended Use

✅ Research baseline for quantum circuit generation scaling studies
✅ Comparison point for data-vs-parameter scaling analysis
✅ Educational demonstrations of QASM generation

❌ Production quantum computing workflows
❌ Use cases where QuantumGPT-124M-v2 is available (it performs better)

Citation

@misc{quantumgpt354m,
  author    = {Merilehto, Juhani},
  title     = {QuantumGPT-354M: Parameter Scaling Study for Quantum Circuit Generation},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/merileijona/quantumgpt-354m},
  note      = {354.1M parameter GPT trained on quantum-circuits-21k.
               Data scaling outperforms model scaling at this regime.}
}

Model Card Authors

Juhani Merilehto

HuggingFace: @merileijona
GitHub: @juhanimerilehto
Affiliation: University of Vaasa, School of Management; University of Turku, Faculty of Technology

License

MIT License

Acknowledgments

Training framework: Andrej Karpathy's nanoGPT / nanochat architecture
Data generation: xAI Grok API
Tokenizer: Standard GPT-2 BPE (HuggingFace GPT2TokenizerFast)
Validation: Qiskit OpenQASM 2.0 parser
Hardware: NVIDIA RTX 4070 12GB / AMD Ryzen 9 5950X / 128GB RAM

Additional Resources

Training dataset: merileijona/quantum-circuits-21k
Better model: merileijona/quantumgpt-124m-v2

Model Version: 1.0 Release Date: March 2026

Downloads last month: 4

Safetensors

Model size

0.4B params

Tensor type

F32

Dataset used to train merileijona/quantumgpt-354m

Collection including merileijona/quantumgpt-354m

QuantumGPT — Quantum Circuit Generation

Collection

Three GPT-style models trained from scratch on two OpenQASM 2 datasets. • 5 items • Updated Mar 26