VLLMServerLLMClient#

VLLMServerLLMClient targets local/self-hosted vLLM OpenAI-compatible inference endpoints.

Default behavior#

Default managed mode: manage_server=True
Default startup model: Qwen/Qwen2.5-1.5B-Instruct
Default API model alias: qwen2.5-1.5b-instruct
Default managed endpoint: http://127.0.0.1:8002/v1

Constructor-first usage#

from design_research_agents import VLLMServerLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

with VLLMServerLLMClient() as client:
    response = client.generate(
        LLMRequest(
            messages=(LLMMessage(role="user", content="Give one architecture tradeoff."),),
            model=client.default_model(),
        )
    )

Prefer the context-manager form so managed servers shut down deterministically. close() remains available for explicit lifecycle control.

Dependencies and environment#

Install vLLM extras for managed mode: pip install -e ".[vllm]"
For connect mode, point at an existing vLLM-compatible endpoint with manage_server=False and base_url=....

Examples#

examples/clients/vllm_server_client.py

Attribution#

Docs: vLLM docs
Homepage: vLLM GitHub