SGLangServerLLMClient

SGLangServerLLMClient targets local/self-hosted SGLang OpenAI-compatible inference endpoints.

Default behavior

  • Default managed mode: manage_server=True

  • Default startup model: Qwen/Qwen2.5-1.5B-Instruct

  • Default managed endpoint: http://127.0.0.1:30000/v1

Constructor-first usage

from design_research_agents import SGLangServerLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

with SGLangServerLLMClient() as client:
    response = client.generate(
        LLMRequest(
            messages=(LLMMessage(role="user", content="Give one architecture tradeoff."),),
            model=client.default_model(),
        )
    )

Prefer the context-manager form so managed servers shut down deterministically. close() remains available for explicit lifecycle control.

Dependencies and environment

  • Install SGLang extras for managed mode: pip install -e ".[sglang]"

  • For connect mode, point at an existing SGLang-compatible endpoint with manage_server=False and base_url=....

Examples

  • examples/clients/sglang_server_client.py

Attribution