SGLang Server Client ==================== Source: ``examples/clients/sglang_server_client.py`` Introduction ------------ SGLang focuses on high-throughput serving and exposes OpenAI-compatible APIs, making it useful for controlled backend substitution against common response contracts and HELM-style evaluation framing. This example wires the SGLang server client into the same traced run surface used by other providers. Technical Implementation ------------------------ 1. Configure ``Tracer`` with JSONL + console output so each run emits machine-readable traces and lifecycle logs. 2. Build the runtime surface (public APIs only) and execute ``SGLangServerLLMClient.generate(...)`` with a fixed ``request_id``. 3. Construct ``LLMRequest`` inputs and call ``generate`` through the selected client implementation. 4. Print a compact JSON payload including ``trace_info`` for deterministic tests and docs examples. .. mermaid:: flowchart LR A["Input prompt or scenario"] --> B["main(): runtime wiring"] B --> C["SGLangServerLLMClient.generate(...)"] C --> D["LLMRequest/LLMResponse contracts wrap provider behavior"] C --> E["Tracer JSONL + console events"] D --> F["ExecutionResult/payload"] E --> F F --> G["Printed JSON output"] .. literalinclude:: ../../../examples/clients/sglang_server_client.py :language: python :lines: 85- :linenos: Expected Results ---------------- .. rubric:: Run Command .. code-block:: bash PYTHONPATH=src python3 examples/clients/sglang_server_client.py Example output captured with ``DRA_EXAMPLE_LLM_MODE=deterministic`` (timestamps, durations, and trace filenames vary by run): .. code-block:: text { "backend": { "base_url": "http://127.0.0.1:30000/v1", "default_model": "Qwen/Qwen2.5-1.5B-Instruct", "host": "127.0.0.1", "kind": "sglang_server", "max_retries": 3, "model_patterns": [ "Qwen/*", "qwen2.5-*" ], "name": "sglang-local-dev", "port": 30000 }, "capabilities": { "json_mode": "prompt+validate", "max_context_tokens": null, "streaming": false, "tool_calling": "best_effort", "vision": false }, "client_class": "SGLangServerLLMClient", "default_model": "Qwen/Qwen2.5-1.5B-Instruct", "example": "clients/sglang_server_client.py", "llm_call": { "prompt": "Provide one sentence on when SGLang-style serving helps local benchmarking.", "response_has_text": true, "response_model": "Qwen/Qwen2.5-1.5B-Instruct", "response_provider": "example-test-monkeypatch", "response_text": "SGLang-style serving helps when you need stable local throughput for repeated tests." }, "server": { "host": "127.0.0.1", "kind": "sglang_server", "managed": true, "port": 30000 }, "trace": { "request_id": "example-clients-sglang-server-call-001", "trace_dir": "artifacts/examples/traces", "trace_path": "artifacts/examples/traces/run_20260222T162206Z_example-clients-sglang-server-call-001.jsonl" } } References ---------- - `SGLang OpenAI-Compatible API `_ - `OpenAI Responses API `_ - `Holistic Evaluation of Language Models (HELM) `_