VLLMServerLLMClient
VLLMServerLLMClient targets local/self-hosted vLLM OpenAI-compatible
inference endpoints.
Default behavior
Default managed mode:
manage_server=TrueDefault startup model:
Qwen/Qwen2.5-1.5B-InstructDefault API model alias:
qwen2.5-1.5b-instructDefault managed endpoint:
http://127.0.0.1:8002/v1
Constructor-first usage
from design_research_agents import VLLMServerLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest
with VLLMServerLLMClient() as client:
response = client.generate(
LLMRequest(
messages=(LLMMessage(role="user", content="Give one architecture tradeoff."),),
model=client.default_model(),
)
)
Prefer the context-manager form so managed servers shut down deterministically.
close() remains available for explicit lifecycle control.
Dependencies and environment
Install vLLM extras for managed mode:
pip install -e ".[vllm]"For connect mode, point at an existing vLLM-compatible endpoint with
manage_server=Falseandbase_url=....
Examples
examples/clients/vllm_server_client.py
Attribution
Docs: vLLM docs
Homepage: vLLM GitHub