AGI-Eval/PRDbench
Preview • Updated • 702 • 3
Use vLLM to deploy a HuggingFace model as an OpenAI-compatible API server.
python -m vllm.entrypoints.openai.api_server \
--model <path to PRDJudge model> \
--served-model-name PRDJudge \
--port 8004 <you can change to other port> \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--tensor-parallel-size <your GPU card number>
Once deployed, the service endpoint will be:
http://localhost:8004/v1http://<server_ip>:8004/v1You can verify the deployment with the following command:
curl http://localhost:8004/v1/models
Follow our ADK-based Agent config.
Edit EvalAgent/code_eval_agent/config.py and add your model configuration:
"your_model_name": LiteLlmWithSleep(
model="openai/PRDJudge", # Model name loaded by vLLM, must match the --model parameter
api_base="http://<server_ip>:8004/v1", # vLLM service URL, default to use localhost when deployed locally
api_key="EMPTY", # vLLM does not require an API key by default, use "EMPTY"
max_tokens_threshold=64000,
enable_compression=True,
temperature=0.1
)
Note: The
modelfield must include theopenai/prefix — this is the LiteLLM routing format for OpenAI-compatible endpoints.<your_vllm_model_name>should match the model name from the vLLM--modelparameter (you can verify viacurl http://<server_ip>:8004/v1/models).
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct