Multi Step Code Tool Calling Agent
Source: examples/agents/multi_step_code_tool_calling_agent.py
Introduction
ReAct and Toolformer motivate external action for model reasoning, while AutoGen highlights how multi-agent/tool ecosystems depend on explicit execution boundaries. This example focuses on code-tool calling so you can study how executable outputs are requested, validated, and traced in a controlled loop.
Technical Implementation
Configure
Tracerwith JSONL + console output so each run emits machine-readable traces and lifecycle logs.Build the runtime surface (public APIs only) and execute
MultiStepAgent.run(...)with a fixedrequest_id.Configure and invoke
Toolboxintegrations (core/script/MCP/callable) before assembling the final payload.Print a compact JSON payload including
trace_infofor deterministic tests and docs examples.
flowchart LR
A["Input prompt or scenario"] --> B["main(): runtime wiring"]
B --> C["MultiStepAgent.run(...)"]
C --> D["WorkflowRuntime loop enforces explicit final-answer and max-step policy"]
C --> E["Tracer JSONL + console events"]
D --> F["ExecutionResult/payload"]
E --> F
F --> G["Printed JSON output"]
1from __future__ import annotations
2
3import json
4from pathlib import Path
5
6from design_research_agents import LlamaCppServerLLMClient, MultiStepAgent, Toolbox, Tracer
7
8_EXAMPLE_LLAMA_CLIENT_KWARGS = {
9 "model": "Qwen_Qwen3-4B-Instruct-2507-Q4_K_M.gguf",
10 "hf_model_repo_id": "bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF",
11 "api_model": "qwen3-4b-instruct-2507-q4km",
12 "context_window": 8192,
13 "startup_timeout_seconds": 240.0,
14 "request_timeout_seconds": 240.0,
15}
16
17
18def main() -> None:
19 """Execute one multi-step code-mode run and print compact result."""
20 # Fixed request id keeps traces and docs output deterministic across runs.
21 request_id = "example-multi-step-code-design-001"
22 tracer = Tracer(
23 enabled=True,
24 trace_dir=Path("artifacts/examples/traces"),
25 enable_jsonl=True,
26 enable_console=True,
27 )
28 # Run the code-tool example using public runtime surfaces. Using this with statement will automatically shut
29 # down the managed client and tool runtime when the example is done.
30 with Toolbox() as tool_runtime, LlamaCppServerLLMClient(**_EXAMPLE_LLAMA_CLIENT_KWARGS) as llm_client:
31 code_tool_agent = MultiStepAgent(
32 mode="code",
33 llm_client=llm_client,
34 tool_runtime=tool_runtime,
35 max_steps=1,
36 normalize_generated_code_per_step=True,
37 default_tools_per_step=({"tool_name": "text.word_count"},),
38 tracer=tracer,
39 )
40 result = code_tool_agent.run(
41 prompt=(
42 "Use executable Python only. In one step, call "
43 'stats = call_tool("text.word_count", {"text": "design review metrics"}) '
44 'and then call final_answer({"word_count": stats["word_count"]}).'
45 ),
46 request_id=request_id,
47 )
48
49 # Print the results
50 summary = result.summary()
51 print(json.dumps(summary, ensure_ascii=True, indent=2, sort_keys=True))
52
53
54if __name__ == "__main__":
55 main()
Expected Results
Run Command
PYTHONPATH=src python3 examples/agents/multi_step_code_tool_calling_agent.py
Example output shape (values vary by run):
{
"success": true,
"final_output": "<example-specific payload>",
"terminated_reason": "<string-or-null>",
"error": null,
"trace": {
"request_id": "<request-id>",
"trace_dir": "artifacts/examples/traces",
"trace_path": "artifacts/examples/traces/run_<timestamp>_<request_id>.jsonl"
}
}