VLLMServerLLMClient
===================
``VLLMServerLLMClient`` targets local/self-hosted ``vLLM`` OpenAI-compatible
inference endpoints.
Default behavior
----------------
- Default managed mode: ``manage_server=True``
- Default startup model: ``Qwen/Qwen2.5-1.5B-Instruct``
- Default API model alias: ``qwen2.5-1.5b-instruct``
- Default managed endpoint: ``http://127.0.0.1:8002/v1``
Constructor-first usage
-----------------------
.. code-block:: python
from design_research_agents import VLLMServerLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest
with VLLMServerLLMClient() as client:
response = client.generate(
LLMRequest(
messages=(LLMMessage(role="user", content="Give one architecture tradeoff."),),
model=client.default_model(),
)
)
Prefer the context-manager form so managed servers shut down deterministically.
``close()`` remains available for explicit lifecycle control.
Dependencies and environment
----------------------------
- Install vLLM extras for managed mode: ``pip install -e ".[vllm]"``
- For connect mode, point at an existing vLLM-compatible endpoint with
``manage_server=False`` and ``base_url=...``.
Examples
--------
- ``examples/clients/vllm_server_client.py``
Attribution
-----------
- Docs: `vLLM docs `_
- Homepage: `vLLM GitHub `_