Instructions to use yongchao98/CodeSteer-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yongchao98/CodeSteer-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="yongchao98/CodeSteer-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yongchao98/CodeSteer-v1") model = AutoModelForCausalLM.from_pretrained("yongchao98/CodeSteer-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use yongchao98/CodeSteer-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "yongchao98/CodeSteer-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yongchao98/CodeSteer-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/yongchao98/CodeSteer-v1
- SGLang
How to use yongchao98/CodeSteer-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yongchao98/CodeSteer-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yongchao98/CodeSteer-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "yongchao98/CodeSteer-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yongchao98/CodeSteer-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use yongchao98/CodeSteer-v1 with Docker Model Runner:
docker model run hf.co/yongchao98/CodeSteer-v1
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
These are the codes, models, and datasets for the following papers:
- CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
- Steering Large Language Models between Code Execution and Textual Reasoning (ICLR'2025)
Project page: https://github.com/yongchao98/CodeSteer-v1.0/
Code
Huggingface🤗
Model Weights
SymBench🤗
Finetune Datasets
SymBench Datasets
SymBench Synthesis Scripts
Contents
- Framework
- Inspirations
- Performance
- Environment_Setup
- LLM_API_Key_Setup
- Train_and_Test_Models
- Assistance
- Citation
Framework
Figure: CodeSteer on guiding LLM code/text generation to integrate symbolic computing. At each interaction with TaskLLM, it reviews current and previous answers, then provides guidance for the next round.
Inspirations
Figure: The cases that GPT-4o makes simple mistakes by direct textual reasoning but can reliably solve the problem with prompted to use code.
Performance
We compare GPT-4o + CodeSteer with OpenAI o1 and DeepSeek R1 on SymBench, with 28 seen tasks and 9 unseen tasks. GPT-4o + CodeSteer surpasses o1 (82.7), R1 (76.8), and o1-preview (74.8), highlighting the importance of integrating symbolic computing into LLMs.
The cost of tokens and runtimes for each method are as follows. GPT-4o + CodeSteer costs less tokens and runtimes than o1 and R1.

Environment_Setup
The fine-tuning and inference of CodeSteerLLM are based on Llama-factory with some modules modified by us.
git clone https://github.com/yongchao98/CodeSteer-v1.0.git
cd CodeSteer-v1.0
conda create -n CodeSteer python=3.10
conda activate CodeSteer
pip install -r requirements.txt
LLM_API_Key_Setup
If you want to use several API-based LLMs as TaskLLM or CodeSteerLLM, then you need to set up API key.
- First, create a .env file in your project root:
OPENAI_API_KEY='your_key_here'
CLAUDE_API_KEY='your_key_here'
MIXTRAL_API_KEY='your_key_here'
DEEPSEEK_API_KEY='your_key_here'
- Add this .env file to your .gitignore to prevent accidentally committing it:
echo ".env" >> .gitignore
Train_and_Test_Models
Create_test_samples
The synthesized test samples for 37 tasks of SymBench are in dataset_gather dictionary. You can also synthezise the samples by yourself with tunable complexities with scripts in create_dataset.
Run inference without GPU, test close LLM as CodeSteerLLM
We can directly use unfinetuned model like GPT-4o as CodeSteerLLM, in this case directly run
python benchmark_test_baseline.py
Run inference with GPU, test finetuned CodeSteerLLM
We can infer Llama-3.1-8B with own GPUs (default setting is in infer_CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings). You can also download the Model Weights in your local and change the path in llama3_8B_CodeSteer.yaml.
bash infer_CodeSteer.sh
# default config file is ./llama3_8B_CodeSteer.yaml using the model uploaded on Huggingface.
Finetuning CodeSteerLLM with synthesized data
Both our synthesized datasets of SFT and DPO finetuning are in Finetune Datasets. We use Llama-factory and DeepSpeed for fintuning processes. First install Llama-factory with:
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
cd ..
Then we run the code with (default setting is in train_llama3-8B-CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings):
bash train_llama3-8B-CodeSteer.sh
Assistance
We appreciate all feedback! Feel free to raise an issue for bugs, questions, or suggestions. Contacting Yongchao Chen and Chuchu Fan for any questions and discussion.
Citation
@misc{chen2025codesteersymbolicaugmentedlanguagemodels,
title={CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance},
author={Yongchao Chen and Yilun Hao and Yueying Liu and Yang Zhang and Chuchu Fan},
year={2025},
eprint={2502.04350},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.04350},
}
@article{chen2024steering,
title={Steering Large Language Models between Code Execution and Textual Reasoning},
author={Chen, Yongchao and Jhamtani, Harsh and Sharma, Srinagesh and Fan, Chuchu and Wang, Chi},
journal={arXiv preprint arXiv:2410.03524},
year={2024}
}
- Downloads last month
- 13