OpenAI Compatible HTTP Client#

Source: examples/clients/openai_compatible_http_client.py

Introduction#

OpenAI-compatible HTTP surfaces are valuable because they let one orchestration stack target multiple providers; vLLM and SGLang both expose this style of interface while OpenAI Responses API defines the baseline semantics. This example demonstrates that compatibility layer in the framework client runtime.

Technical Implementation#

  1. Configure Tracer with JSONL + console output so each run emits machine-readable traces and lifecycle logs.

  2. Build the runtime surface (public APIs only) and execute OpenAICompatibleHTTPLLMClient.generate(...) with a fixed request_id.

  3. Construct LLMRequest inputs and call generate through the selected client implementation.

  4. Print a compact JSON payload including trace_info for deterministic tests and docs examples.

        flowchart LR
    A["Input prompt or scenario"] --> B["main(): runtime wiring"]
    B --> C["OpenAICompatibleHTTPLLMClient.generate(...)"]
    C --> D["LLMRequest/LLMResponse contracts wrap provider behavior"]
    C --> E["Tracer JSONL + console events"]
    D --> F["ExecutionResult/payload"]
    E --> F
    F --> G["Printed JSON output"]
    
 1from __future__ import annotations
 2
 3import json
 4from pathlib import Path
 5
 6import design_research_agents as drag
 7
 8
 9def _build_payload() -> dict[str, object]:
10    # Run the OpenAI-compatible client using public runtime APIs. Using this with statement will automatically
11    # close the configured HTTP client when the example is done.
12    with drag.OpenAICompatibleHTTPLLMClient(
13        name="local-openai-compat",
14        base_url="http://127.0.0.1:8011/v1",
15        default_model="qwen2.5-1.5b-q4",
16        api_key_env="OPENAI_API_KEY",
17        api_key="example-key-for-config-demo",
18        max_retries=3,
19        model_patterns=("qwen2.5-*", "qwen2-*"),
20    ) as client:
21        description = client.describe()
22        prompt = "Provide one sentence on balancing latency and quality in design review assistants."
23        response = client.generate(
24            drag.LLMRequest(
25                messages=(
26                    drag.LLMMessage(role="system", content="You are a concise engineering design assistant."),
27                    drag.LLMMessage(role="user", content=prompt),
28                ),
29                model=client.default_model(),
30                temperature=0.0,
31                max_tokens=120,
32            )
33        )
34        llm_call = {
35            "prompt": prompt,
36            "response_text": response.text,
37            "response_model": response.model,
38            "response_provider": response.provider,
39            "response_has_text": bool(response.text.strip()),
40        }
41        return {
42            "client_class": description["client_class"],
43            "default_model": description["default_model"],
44            "llm_call": llm_call,
45            "backend": description["backend"],
46            "capabilities": description["capabilities"],
47            "server": description["server"],
48        }
49
50
51def main() -> None:
52    """Run traced OpenAI-compatible client call payload."""
53    # Fixed request id keeps traces and docs output deterministic across runs.
54    request_id = "example-clients-openai-compatible-call-001"
55    tracer = drag.Tracer(
56        enabled=True,
57        trace_dir=Path("artifacts/examples/traces"),
58        enable_jsonl=True,
59        enable_console=True,
60    )
61    payload = tracer.run_callable(
62        agent_name="ExamplesOpenAICompatClientCall",
63        request_id=request_id,
64        input_payload={"scenario": "openai-compatible-client-call"},
65        function=_build_payload,
66    )
67    assert isinstance(payload, dict)
68    payload["example"] = "clients/openai_compatible_http_client.py"
69    payload["trace"] = tracer.trace_info(request_id)
70    # Print the results
71    print(json.dumps(payload, ensure_ascii=True, indent=2, sort_keys=True))
72
73
74if __name__ == "__main__":
75    main()

Expected Results#

Run Command

PYTHONPATH=src python3 examples/clients/openai_compatible_http_client.py

Example output captured with DRA_EXAMPLE_LLM_MODE=deterministic (timestamps, durations, and trace filenames vary by run):

{
  "backend": {
    "api_key_env": "OPENAI_API_KEY",
    "base_url": "http://127.0.0.1:8011/v1",
    "default_model": "qwen2.5-1.5b-q4",
    "kind": "openai_compatible_http",
    "max_retries": 3,
    "model_patterns": [
      "qwen2.5-*",
      "qwen2-*"
    ],
    "name": "local-openai-compat"
  },
  "capabilities": {
    "json_mode": "prompt+validate",
    "max_context_tokens": null,
    "streaming": false,
    "tool_calling": "best_effort",
    "vision": false
  },
  "client_class": "OpenAICompatibleHTTPLLMClient",
  "default_model": "qwen2.5-1.5b-q4",
  "example": "clients/openai_compatible_http_client.py",
  "llm_call": {
    "prompt": "Provide one sentence on balancing latency and quality in design review assistants.",
    "response_has_text": true,
    "response_model": "qwen2.5-1.5b-q4",
    "response_provider": "example-test-monkeypatch",
    "response_text": "Use fast drafts for iteration, then escalate critical decisions to higher-quality models."
  },
  "server": null,
  "trace": {
    "request_id": "example-clients-openai-compatible-call-001",
    "trace_dir": "artifacts/examples/traces",
    "trace_path": "artifacts/examples/traces/run_20260222T162206Z_example-clients-openai-compatible-call-001.jsonl"
  }
}

References#