Multi Step Code Tool Calling Agent#

Source: examples/agents/multi_step_code_tool_calling_agent.py

Introduction#

ReAct and Toolformer motivate external action for model reasoning, while AutoGen highlights how multi-agent/tool ecosystems depend on explicit execution boundaries. This example focuses on code-tool calling so you can study how executable outputs are requested, validated, and traced in a controlled loop.

Note

This example’s checked-in local LlamaCppServerLLMClient config uses a Qwen3-4B GGUF model. On lower-RAM machines, swap in a smaller local model or start with Ollama Local Client.

Technical Implementation#

Configure Tracer with JSONL + console output so each run emits machine-readable traces and lifecycle logs.
Build the runtime surface (public APIs only) and execute MultiStepAgent.run(...) with a fixed request_id.
Configure and invoke Toolbox integrations (core/script/MCP/callable) before assembling the final payload.
Print a compact JSON payload including trace_info for deterministic tests and docs examples.

        flowchart LR
    A["Input prompt or scenario"] --> B["main(): runtime wiring"]
    B --> C["MultiStepAgent.run(...)"]
    C --> D["WorkflowRuntime loop enforces explicit final-answer and max-step policy"]
    C --> E["Tracer JSONL + console events"]
    D --> F["ExecutionResult/payload"]
    E --> F
    F --> G["Printed JSON output"]

from __future__ import annotations

import json
from pathlib import Path

import design_research_agents as drag

# This checked-in local config uses a Qwen3-4B GGUF model to exercise a richer
# multi-step path. On lower-RAM machines, swap in a smaller local model or
# start with the lighter Ollama local client example first.
_EXAMPLE_LLAMA_CLIENT_KWARGS = {
    "model": "Qwen_Qwen3-4B-Instruct-2507-Q4_K_M.gguf",
    "hf_model_repo_id": "bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF",
    "api_model": "qwen3-4b-instruct-2507-q4km",
    "context_window": 8192,
    "startup_timeout_seconds": 240.0,
    "request_timeout_seconds": 240.0,
}


def main() -> None:
    """Execute one multi-step code-mode run and print compact result."""
    # Fixed request id keeps traces and docs output deterministic across runs.
    request_id = "example-multi-step-code-design-001"
    tracer = drag.Tracer(
        enabled=True,
        trace_dir=Path("artifacts/examples/traces"),
        enable_jsonl=True,
        enable_console=True,
    )
    # Run the code-tool example using public runtime surfaces. Using this with statement will automatically shut
    # down the managed client and tool runtime when the example is done.
    with drag.Toolbox() as tool_runtime, drag.LlamaCppServerLLMClient(**_EXAMPLE_LLAMA_CLIENT_KWARGS) as llm_client:
        code_tool_agent = drag.MultiStepAgent(
            mode="code",
            llm_client=llm_client,
            tool_runtime=tool_runtime,
            max_steps=1,
            normalize_generated_code_per_step=True,
            default_tools_per_step=({"tool_name": "text.word_count"},),
            tracer=tracer,
        )
        result = code_tool_agent.run(
            prompt=(
                "Use executable Python only. In one step, call "
                'stats = call_tool("text.word_count", {"text": "design review metrics"}) '
                'and then call final_answer({"word_count": stats["word_count"]}).'
            ),
            request_id=request_id,
        )

    # Print the results
    summary = result.summary()
    print(json.dumps(summary, ensure_ascii=True, indent=2, sort_keys=True))


if __name__ == "__main__":
    main()

Expected Results#

Run Command

PYTHONPATH=src python3 examples/agents/multi_step_code_tool_calling_agent.py

Example output shape (values vary by run):

{
  "success": true,
  "final_output": "<example-specific payload>",
  "terminated_reason": "<string-or-null>",
  "error": null,
  "trace": {
    "request_id": "<request-id>",
    "trace_dir": "artifacts/examples/traces",
    "trace_path": "artifacts/examples/traces/run_<timestamp>_<request_id>.jsonl"
  }
}

Multi Step Code Tool Calling Agent#

Introduction#

Technical Implementation#

Expected Results#

References#