API#

This page documents the supported top-level public API from design_research_agents.__all__.

Guaranteed compatibility applies to this top-level API surface and to the public facade modules documented in docs/reference under “Guaranteed Public Modules”.

Underscored module paths (for example design_research_agents._contracts) are internal and unstable. They are documented in the module reference for contributors, but they are not compatibility-guaranteed.

Top-level groups:

Metadata: __version__
Entry points: agents, study execution helpers, LLM clients, ModelSelector, and model flights/catalogs
Skills: SkillsConfig
Core contracts: ExecutionResult, LLMRequest, LLMMessage, LLMResponse, ToolResult with normalized read helpers for structured payload access
Orchestration: workflow step classes, Workflow, workflow builders, and pattern classes (module homes: design_research_agents.workflow and design_research_agents.patterns)
Tools: Toolbox, CallableToolConfig, ScriptToolConfig, MCPServerConfig
Tracing: Tracer

`version`#

design_research_agents.__version__ = '0.4.0'#

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

Entry Points#

Agents#

class design_research_agents.DirectLLMCall(*, llm_client, system_prompt=None, temperature=None, max_tokens=None, provider_options=None, skills=None, tracer=None)[source]#

One-shot direct model call with no tool runtime.

Design choices:

Uses a small Workflow with three LogicSteps (prepare, call, finalize) so the trace mirrors multi-step agents.
Keeps defaults (system prompt, temperature, max_tokens, provider_options) on the agent, but allows per-run overrides via normalized_input.

Initialize a direct-LLM agent with optional default generation args.

Parameters:

llm_client – LLM client used for prompt execution.
system_prompt – Optional default system prompt.
temperature – Optional default sampling temperature.
max_tokens – Optional default output-token cap.
provider_options – Optional default backend-specific options.
skills – Optional Agent Skills configuration.
tracer – Optional explicit tracer dependency.

Raises:

ValueError – If max token configuration is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one direct model call into a bound workflow execution.

run(prompt, *, request_id=None, dependencies=None)[source]#: One direct model call and return normalized workflow-first output.

class design_research_agents.MultiStepAgent(*, mode, llm_client, tool_runtime=None, max_steps=5, stop_on_step_failure=True, controller_system_prompt=None, controller_user_prompt_template=None, continuation_system_prompt=None, continuation_user_prompt_template=None, step_user_prompt_template=None, tool_calling_system_prompt=None, tool_calling_user_prompt_template=None, alternatives_prompt_target='user', continuation_memory_tail_items=6, step_memory_tail_items=8, memory_store=None, memory_namespace='default', memory_read_top_k=4, memory_write_observations=True, max_tool_calls_per_step=5, execution_timeout_seconds=5, validate_tool_input_schema=False, normalize_generated_code_per_step=False, default_tools_per_step=None, allowed_tools=None, skills=None, tracer=None)[source]#

Single multi-step runtime entrypoint for direct/json/code strategies.

Initialize one mode-specific multi-step strategy.

Parameters:

mode – Required strategy mode (direct, json, or code).
llm_client – LLM client shared by all strategy modes.
tool_runtime – Tool runtime required for json and code modes.
max_steps – Maximum number of multi-step iterations.
stop_on_step_failure – Whether to stop loop execution on failed steps.
controller_system_prompt – Direct-mode controller system prompt override.
controller_user_prompt_template – Direct-mode controller user prompt override.
continuation_system_prompt – Continuation system prompt override.
continuation_user_prompt_template – Continuation user prompt override.
step_user_prompt_template – Step action user prompt override.
tool_calling_system_prompt – Json mode tool-calling system prompt override.
tool_calling_user_prompt_template – Json mode tool-calling user prompt override.
alternatives_prompt_target – Prompt insertion target for alternatives blocks.
continuation_memory_tail_items – Continuation memory tail item count.
step_memory_tail_items – Step memory tail item count.
memory_store – Optional persistent memory dependency.
memory_namespace – Memory namespace for read/write operations.
memory_read_top_k – Memory retrieval top-k.
memory_write_observations – Whether to persist per-step observations.
max_tool_calls_per_step – Code-mode per-step tool call cap.
execution_timeout_seconds – Code-mode sandbox timeout.
validate_tool_input_schema – Code-mode tool input schema validation toggle.
normalize_generated_code_per_step – Code-mode code normalization toggle.
default_tools_per_step – Code-mode default tool allowlist.
allowed_tools – Optional json-mode tool allowlist.
skills – Optional Agent Skills configuration.
tracer – Optional tracer dependency.

Raises:

ValueError – Raised when mode/tool configuration is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one run through the selected strategy mode.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one run through the selected strategy mode.

property workflow#: Expose the most recently compiled workflow from the selected strategy.

class design_research_agents.SeededRandomBaselineAgent(*, seed=None, grammar_max_steps=1)[source]#

Seeded control-condition agent for packaged-problem benchmarking.

The agent uses the same workflow-backed delegate shape as the other public agents in this repository. Packaged-problem inputs are supplied through the dependencies mapping at run time.

Initialize the baseline agent.

Parameters:

seed – Optional default seed used when per-run seed information is not provided through dependencies.
grammar_max_steps – Maximum number of grammar transitions to sample in one grammar rollout.

Raises:

ValueError – If grammar_max_steps is less than one.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one seeded random baseline run into a bound workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one seeded random baseline run.

class design_research_agents.PromptWorkflowAgent(*, workflow, prompt_builder)[source]#

Wrap one prompt-mode workflow as a first-class executable agent.

Store the wrapped workflow and study-aware prompt builder.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one study-oriented workflow execution.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one compiled study-oriented workflow run.

Study Execution#

design_research_agents.study provides the stable public facade for experiment runners. design_research_agents.integration remains available as a compatibility module.

class design_research_agents.AgentRunRequest(*, agent_ref, prompt, request_id=None, dependencies=<factory>, agent_bindings=<factory>)[source]

Typed request object for running one agent inside a study.

agent_ref

Public agent reference, executable object, or binding key.

Type:: Any

prompt

Prompt or problem-like payload for the run.

Type:: Any

request_id

Optional run identifier.

Type:: str | None

dependencies

Optional dependencies exposed to the agent.

Type:: collections.abc.Mapping[str, object]

agent_bindings

Optional mapping of binding keys to executable agents or factories.

Type:: collections.abc.Mapping[str, AgentBinding]

agent_bindings: Optional mapping of binding keys to executable agents or factories.

agent_ref: Public agent reference, executable object, or binding key.

dependencies: Optional dependencies exposed to the agent.

prompt: Prompt or problem-like payload for the run.

request_id: Optional run identifier.

class design_research_agents.AgentExecutionEnvelope(output=<factory>, metrics=<factory>, events=<factory>, trace_refs=<factory>, metadata=<factory>)[source]

Normalized execution envelope used by study-orchestration consumers.

events

metadata

metrics

output

trace_refs

class design_research_agents.StudyCondition(*, condition_id, label=None, metadata=<factory>)[source]

Minimal public condition descriptor for study orchestration.

condition_id

Stable condition identifier.

Type:: str

label

Optional human-readable condition label.

Type:: str | None

metadata

Optional condition metadata for downstream analysis.

Type:: collections.abc.Mapping[str, object]

condition_id: Stable condition identifier.

label: Optional human-readable condition label.

metadata: Optional condition metadata.

design_research_agents.execute_agent_request(request)[source]#

Execute one typed agent-run request.

Parameters:: request – Agent-run request to execute.
Returns:: Normalized execution envelope.

design_research_agents.execute_agent_run(agent_ref, *, prompt, request_id, dependencies, agent_bindings=None)[source]#: Execute one public agent reference through the stable prompt/dependencies contract.

design_research_agents.normalize_agent_execution(raw, *, request_id=None)[source]#: Normalize raw agent output into the study-facing execution envelope.

Skills#

class design_research_agents.SkillsConfig(*, project_root='.', extra_paths=(), pinned_skills=(), catalog_prompt_target='system', allow_automatic_activation=True)[source]#

Immutable configuration for Agent Skills discovery and prompt wiring.

allow_automatic_activation#: Whether tool-capable flows should advertise and expose skills.activate.

catalog_prompt_target#: Prompt location used for the discoverable skills catalog when automatic activation is enabled.

extra_paths#: Additional skill-root directories searched after the project-local skills root.

pinned_skills#: Skill names preloaded into prompts for deterministic constructor-scoped behavior.

project_root#: Root used to resolve the default .agents/skills directory and relative extra paths.

LLM Clients, Flights, and Selection#

All public LLM clients implement the same introspection helpers in addition to generation methods: default_model(), capabilities(), config_snapshot(), server_snapshot(), and describe(). They also implement close() plus with-statement lifecycle support; the context-manager form is the preferred public usage pattern.

class design_research_agents.LlamaCppServerLLMClient(*, name='llama-local', model='Qwen2.5-1.5B-Instruct-Q4_K_M.gguf', hf_model_repo_id='bartowski/Qwen2.5-1.5B-Instruct-GGUF', api_model='qwen2-1.5b-q4', host='127.0.0.1', port=8001, context_window=4096, startup_timeout_seconds=60.0, request_timeout_seconds=60.0, poll_interval_seconds=0.25, python_executable='/opt/hostedtoolcache/Python/3.12.13/x64/bin/python3', extra_server_args=(), max_retries=2, model_patterns=None)[source]#

Client for a managed local llama_cpp.server backend.

Initialize a local llama-cpp client with sensible defaults.

Parameters:

name – Logical name for this client instance, used in logging and provenance.
model – Local model identifier or path for llama_cpp.server to load.
hf_model_repo_id – Optional Hugging Face repo ID to auto-download the model from if not found locally.
api_model – The model name to report in API responses, which can differ from the local model name.
host – Host interface for the local server to bind to.
port – Port for the local server to listen on.
context_window – Context window size (n_ctx) to configure the llama_cpp.server with.
startup_timeout_seconds – Max time to wait for the server process to start and become healthy.
request_timeout_seconds – HTTP timeout for generate and stream requests.
poll_interval_seconds – Time interval between health check polls during startup.
python_executable – Python executable to use for running the server process.
extra_server_args – Additional command-line arguments to pass when starting the server process.
max_retries – Number of times to retry a request in case of failure before giving up.
model_patterns – Optional tuple of model name patterns supported by this client, used for routing decisions. If None, defaults to (api_model,).

close()[source]#: Stop the managed local server process.

class design_research_agents.AnthropicServiceLLMClient(*, name='anthropic', default_model='claude-3-5-haiku-latest', api_key_env='ANTHROPIC_API_KEY', api_key=None, base_url=None, max_retries=2, model_patterns=None)[source]#

Client for the official Anthropic API backend.

Initialize an Anthropic service client with sensible defaults.

class design_research_agents.GeminiServiceLLMClient(*, name='gemini', default_model='gemini-2.5-flash', api_key_env='GOOGLE_API_KEY', api_key=None, max_retries=2, model_patterns=None)[source]#

Client for the official Gemini API backend.

Initialize a Gemini service client with sensible defaults.

class design_research_agents.GroqServiceLLMClient(*, name='groq', default_model='llama-3.1-8b-instant', api_key_env='GROQ_API_KEY', api_key=None, base_url=None, max_retries=2, model_patterns=None)[source]#

Client for the official Groq API backend.

Initialize a Groq service client with sensible defaults.

class design_research_agents.OpenAIServiceLLMClient(*, name='openai', default_model='gpt-4o-mini', api_key_env='OPENAI_API_KEY', api_key=None, base_url=None, max_retries=2, model_patterns=None)[source]#

Client for the official OpenAI API backend.

Initialize an OpenAI service client with sensible defaults.

class design_research_agents.AzureOpenAIServiceLLMClient(*, name='azure-openai', default_model='gpt-4o-mini', api_key_env='AZURE_OPENAI_API_KEY', api_key=None, azure_endpoint_env='AZURE_OPENAI_ENDPOINT', azure_endpoint=None, api_version_env='AZURE_OPENAI_API_VERSION', api_version=None, max_retries=2, model_patterns=None)[source]#

Client for the Azure OpenAI API via the official OpenAI SDK.

Initialize an Azure OpenAI service client with sensible defaults.

class design_research_agents.OpenAICompatibleHTTPLLMClient(*, name='openai-compatible', base_url='http://127.0.0.1:8001/v1', default_model='qwen2-1.5b-q4', api_key_env='OPENAI_API_KEY', api_key=None, max_retries=2, model_patterns=None)[source]#

Client for OpenAI-compatible HTTP endpoints.

Initialize an OpenAI-compatible HTTP client with sensible defaults.

class design_research_agents.TransformersLocalLLMClient(*, name='transformers-local', model_id='distilgpt2', default_model='distilgpt2', device='auto', dtype='auto', quantization='none', trust_remote_code=False, revision=None, max_retries=2, model_patterns=None)[source]#

Client for in-process Transformers local inference.

Initialize a local Transformers client with sensible defaults.

Parameters:

name – Logical name for this client instance, used in logging and provenance.
model_id – Identifier for the model to load (e.g. “distilgpt2 or a Hugging Face repo ID like “gpt2”).
default_model – Default model name for prompts that don’t specify one.
device – Device to load the model on (e.g. “cpu”, “cuda”, “mps”, or “auto” to automatically select based on availability).
dtype – Data type to use for model weights (e.g. “float16”, “bfloat16”, “int8”, or “auto” to automatically select based on device).
quantization – Quantization level to use when loading the model (e.g. “4 bit”, “8-bit”, “fp16”, or “none” for no quantization).
trust_remote_code – Whether to allow execution of custom code from remote repositories when loading models, which may be required for some models but can be a security risk.
revision – Optional model revision to load (e.g. a git branch, tag, or commit hash), if the model is being loaded from a Hugging Face repository that has multiple revisions.
max_retries – Number of times to retry a request in case of failure before giving up
model_patterns – Optional tuple of model name patterns supported by this client, used for routing decisions. If None, defaults to (default_model,).

class design_research_agents.MLXLocalLLMClient(*, name='mlx-local', model_id='mlx-community/Qwen2.5-1.5B-Instruct-4bit', default_model='mlx-community/Qwen2.5-1.5B-Instruct-4bit', quantization='none', max_retries=2, model_patterns=None)[source]#

Client for Apple MLX local inference.

Initialize an MLX local client with sensible defaults.

Parameters:

name – Logical name for this client instance, used in logging and provenance.
model_id – Identifier for the MLX model to load (e.g. “mlx-community /Qwen2.5-1.5B-Instruct-4bit”).
default_model – Default model name for prompts that don’t specify one.
quantization – Quantization level to use when loading the model (e.g. “4 -bit”, “8-bit”, “fp16”).
max_retries – Number of times to retry a request in case of failure before giving up
model_patterns – Optional tuple of model name patterns supported by this client, used for routing decisions. If None, defaults to (default_model,).

class design_research_agents.VLLMServerLLMClient(*, name='vllm-local', model='Qwen/Qwen2.5-1.5B-Instruct', api_model='qwen2.5-1.5b-instruct', host='127.0.0.1', port=8002, manage_server=True, startup_timeout_seconds=90.0, poll_interval_seconds=0.5, python_executable='/opt/hostedtoolcache/Python/3.12.13/x64/bin/python3', extra_server_args=(), base_url=None, request_timeout_seconds=60.0, max_retries=2, model_patterns=None)[source]#

Client for local or self-hosted vLLM OpenAI-compatible inference.

Initialize a vLLM client in managed-server or connect mode.

Parameters:

name – Logical name for this client instance.
model – Model identifier passed to managed vLLM server startup.
api_model – Model alias exposed by vLLM OpenAI-compatible API.
host – Host interface used in managed mode.
port – TCP port used in managed mode.
manage_server – Whether this client manages the vLLM server lifecycle.
startup_timeout_seconds – Maximum startup wait time in managed mode.
poll_interval_seconds – Delay between readiness probes in managed mode.
python_executable – Python executable used to launch managed vLLM process.
extra_server_args – Additional CLI flags forwarded to vLLM server.
base_url – Optional connect-mode endpoint URL. Required only for remote/self-managed deployments; defaults to http://{host}:{port}/v1.
request_timeout_seconds – HTTP timeout for generate and stream requests.
max_retries – Number of retries for retryable provider/transport errors.
model_patterns – Optional tuple of model patterns for routing decisions.

Raises:

ValueError – If manage_server and base_url are both configured.

close()[source]#: Stop the managed vLLM server process when present.

class design_research_agents.OllamaLLMClient(*, name='ollama-local', default_model='qwen2.5:1.5b-instruct', host='127.0.0.1', port=11434, manage_server=True, ollama_executable='ollama', auto_pull_model=False, startup_timeout_seconds=60.0, poll_interval_seconds=0.25, request_timeout_seconds=60.0, max_retries=2, model_patterns=None)[source]#

Client for local or self-hosted Ollama chat inference.

Initialize an Ollama client in managed-server or connect mode.

Parameters:

name – Logical name for this client instance.
default_model – Default model id used when requests omit model.
host – Host interface used in managed mode or connect mode.
port – TCP port used in managed mode or connect mode.
manage_server – Whether this client manages ollama serve lifecycle.
ollama_executable – Executable used to invoke ollama commands.
auto_pull_model – Whether to pull default_model after startup.
startup_timeout_seconds – Maximum startup wait time in managed mode.
poll_interval_seconds – Delay between readiness probes in managed mode.
request_timeout_seconds – HTTP timeout for generate and stream requests.
max_retries – Number of retries for retryable provider/transport errors.
model_patterns – Optional tuple of model patterns for routing decisions.

close()[source]#: Stop the managed Ollama daemon when present.

class design_research_agents.SGLangServerLLMClient(*, name='sglang-local', model='Qwen/Qwen2.5-1.5B-Instruct', host='127.0.0.1', port=30000, manage_server=True, startup_timeout_seconds=90.0, poll_interval_seconds=0.5, python_executable='/opt/hostedtoolcache/Python/3.12.13/x64/bin/python3', extra_server_args=(), base_url=None, request_timeout_seconds=60.0, max_retries=2, model_patterns=None)[source]#

Client for local or self-hosted SGLang OpenAI-compatible inference.

Initialize an SGLang client in managed-server or connect mode.

Parameters:

name – Logical name for this client instance.
model – Model identifier passed to managed SGLang server startup.
host – Host interface used in managed mode.
port – TCP port used in managed mode.
manage_server – Whether this client manages the SGLang server lifecycle.
startup_timeout_seconds – Maximum startup wait time in managed mode.
poll_interval_seconds – Delay between readiness probes in managed mode.
python_executable – Python executable used to launch managed SGLang process.
extra_server_args – Additional CLI flags forwarded to SGLang server.
base_url – Optional connect-mode endpoint URL. Required only for remote/self-managed deployments; defaults to http://{host}:{port}/v1.
request_timeout_seconds – HTTP timeout for generate and stream requests.
max_retries – Number of retries for retryable provider/transport errors.
model_patterns – Optional tuple of model patterns for routing decisions.

Raises:

ValueError – If manage_server and base_url are both configured.

close()[source]#: Stop the managed SGLang server process when present.

class design_research_agents.ModelCatalog(*, models)[source]

Catalog of known models and their hardware hints.

models

Tuple of model specifications.

Type:: tuple[design_research_agents._model_selection._types.ModelSpec, …]

by_family(family)[source]: Return models for one family.

by_provider(provider)[source]: Return models for one provider.

classmethod default()[source]

Build the default model catalog.

Returns:: Default model catalog instance.

filter(*, provider=None, family=None, quantization=None, model_format=None, source=None, local=None, capability=None, tag=None, min_size_b=None, max_size_b=None)[source]

Return a filtered catalog.

Parameters:

provider – Optional provider key to match.
family – Optional model family to match.
quantization – Optional quantization label to match.
model_format – Optional model format to match.
source – Optional provenance label to match.
local – Optional local/remote filter.
capability – Optional capability label to require.
tag – Optional tag label to require.
min_size_b – Optional minimum model size in billions.
max_size_b – Optional maximum model size in billions.

Returns:

Catalog containing matching models in original order.

find(model_id)[source]

Return the model spec with the given id, if present.

Parameters:: model_id – Model identifier to search for.
Returns:: Matching model spec, or None when not found.

classmethod from_flights(flights)[source]

Build a model catalog from one or more flights.

Parameters:: flights – Model flights to flatten into a model catalog.
Returns:: Model catalog containing every model from the supplied flights.

classmethod from_huggingface(repo_ids, *, provider='transformers_local', family=None, revision=None, model_format=None, token=None, timeout=None, api=None, capabilities=('chat',), tags=(), quality_tier=None, speed_tier=None)[source]

Build a catalog from Hugging Face Hub model metadata.

This method performs network I/O only when api is omitted. Tests and deterministic pipelines can pass a small object exposing model_info to avoid importing or calling huggingface_hub.

Parameters:

repo_ids – Hugging Face repository ids to fetch.
provider – Runtime/provider key to assign to discovered models.
family – Optional family override for every discovered model.
revision – Optional revision to request from the Hub.
model_format – Optional format override for every discovered model.
token – Optional Hugging Face token value passed to the Hub client.
timeout – Optional request timeout.
api – Optional preconfigured object exposing model_info.
capabilities – Capability labels to assign to discovered models.
tags – Extra tags to assign to every discovered model.
quality_tier – Optional quality score assigned to discovered models.
speed_tier – Optional speed score assigned to discovered models.

Returns:

Catalog containing discovered Hugging Face models.

Raises:

ImportError – If huggingface_hub is needed but unavailable.
ValueError – If no repository ids are provided.

local()[source]: Return local models.

merge(*catalogs, replace=False)[source]

Merge this catalog with additional catalogs.

Parameters:

catalogs – Additional catalogs to append.
replace – When true, later catalogs replace duplicate model ids. When false, duplicate model ids raise ValueError.

Returns:

Merged model catalog.

model_ids()[source]

Return all model ids in catalog order.

Returns:: Tuple of model ids.

models: Stored models value.

remote()[source]: Return remote models.

require(model_id)[source]

Return a model by id or raise a clear error.

Parameters:: model_id – Model identifier to retrieve.
Returns:: Matching model spec.
Raises:: KeyError – If no model exists for model_id.

signature()[source]

Return a stable signature for catalog reproducibility.

Returns:: Stable signature string derived from the catalog contents.

with_capability(capability)[source]: Return models declaring one capability.

with_tag(tag)[source]: Return models declaring one tag.

class design_research_agents.ModelFlight(*, flight_id, description, models, tags=())[source]

Named, reproducible set of model candidates for experiments or selection.

flight_id

Stable identifier for the candidate set.

Type:: str

description

Human-readable summary of the flight’s purpose.

Type:: str

models

Tuple of model specifications included in the flight.

Type:: tuple[design_research_agents._model_selection._types.ModelSpec, …]

tags

Optional labels for discovery and grouping.

Type:: tuple[str, …]

description: Human-readable summary of this flight.

flight_id: Stable identifier for this flight.

model_ids()[source]

Return model ids in stable flight order.

Returns:: Tuple of model ids.

models: Model specifications included in this flight.

tags: Optional labels for discovery and grouping.

with_models(models)[source]

Return a copy with a narrower model set and the same metadata.

Parameters:: models – Replacement model specs for the copied flight.
Returns:: New ModelFlight with the supplied models.

class design_research_agents.ModelFlightRegistry(*, flights)[source]

Registry of named model flights.

flights

Tuple of known model flights.

Type:: tuple[design_research_agents._model_selection._catalog.ModelFlight, …]

collect_models(flights=None)[source]

Return models from all or selected flights.

Parameters:: flights – Optional explicit flights to flatten. When omitted, all catalog flights are used.
Returns:: Flattened tuple of model specifications in flight order.

classmethod default()[source]

Build the default model-flight registry.

Returns:: Default model-flight registry instance.

find(flight_id)[source]

Return the flight with the given id, if present.

Parameters:: flight_id – Flight identifier to search for.
Returns:: Matching flight, or None when not found.

flight_ids()[source]

Return all flight ids in stable catalog order.

Returns:: Tuple of flight ids.

flights: Stored model flights.

require(flight_id)[source]

Return the requested flight or raise a clear error.

Parameters:: flight_id – Flight identifier to retrieve.
Returns:: Matching flight.
Raises:: KeyError – If no flight exists for flight_id.

class design_research_agents.ModelSelector(*, catalog=None, prefer_local=True, ram_reserve_gb=2.0, vram_reserve_gb=0.5, max_load_ratio=0.85, remote_cost_floor_usd=0.02, default_max_latency_ms=None, local_client_resolver=None)[source]

Flat model selection interface with client/config resolution helpers.

Initialize model selector policy controls and optional resolver hook.

Parameters:

catalog – Optional model catalog to use for selection.
prefer_local – Whether to prefer local models over remote ones when all else is equal.
ram_reserve_gb – Amount of RAM (in GB) to reserve when evaluating local candidates.
vram_reserve_gb – Amount of GPU VRAM (in GB) to reserve when evaluating local candidates.
max_load_ratio – Maximum system load ratio to consider a local candidate viable (0.0 to 1.0).
remote_cost_floor_usd – Minimum cost threshold (in USD) for remote models to be considered viable.
default_max_latency_ms – Default maximum latency (in milliseconds) to consider when evaluating candidates, if not specified in selection constraints.
local_client_resolver – Optional callable that takes a ModelSelectionDecision and returns a dict with ‘client_class’ and ‘kwargs’ for constructing a local client when the provider is not recognized by the built-in resolver. This allows for custom local providers to be integrated without modifying the ModelSelector code.

select(*, task, priority='balanced', require_local=False, preferred_provider=None, max_cost_usd=None, max_latency_ms=None, hardware_profile=None, output='client')[source]

Select a model and return a decision, config mapping, or live client.

Parameters:

task – Description of the task or use case for which a model is being selected.
priority – Selection priority, which may influence the trade-off between quality, latency, and cost in the decision process.
require_local – If True, only consider local models as viable candidates.
preferred_provider – Optional provider name to prioritize in the selection process.
max_cost_usd – Optional maximum cost threshold (in USD) for candidate models.
max_latency_ms – Optional maximum latency threshold (in milliseconds) for candidate models.
hardware_profile – Optional mapping or HardwareProfile instance describing the current hardware state, which may be used to evaluate local candidates.
output – Determines the format of the selection result. “client” returns an instantiated LLMClient ready for use, “decision” returns the raw ModelSelectionDecision object with details of the selection rationale, and “client_config” returns a dict containing the information needed to construct an LLMClient (including ‘client_class’ and ‘kwargs’) without actually instantiating it.

Returns:

Depending on the ‘output’ parameter –

If output is “client”: An instantiated LLMClient configured according to the selection decision, ready for use in making requests.
If output is “decision”: A ModelSelectionDecision object containing details about the selected model, provider, rationale, and policy information.
If output is “client_config”: A dict containing the resolved client configuration, including ‘client_class’, ‘kwargs’, and metadata from the selection decision, which can be used to instantiate an LLMClient at a later time or in a different context.

Raises:

ValueError – If output is unsupported or selection/config coercion fails.

Core Contracts#

ExecutionResult and per-step WorkflowStepResult objects expose matching output access helpers for safe reads from loosely structured payloads. The public ToolResult contract also includes normalized getters such as result_dict(), result_list(), error_message, and artifact_paths.

class design_research_agents.ExecutionResult(*, success, output=<factory>, tool_results=<factory>, model_response=None, step_results=<factory>, execution_order=<factory>, metadata=<factory>)[source]

Structured output produced by one execution entrypoint.

This shape intentionally covers both agent-like executions and workflow-like executions so callers can consume one result contract everywhere.

property error

Return terminal error payload when present.

Returns:: Error payload from output mapping, or None.

execution_order: Step ids in the order they were executed for workflow-style runs.

property final_output

Return workflow/agent final_output payload when present.

Returns:: Final output value from output payload, or None.

metadata: Additional diagnostics, runtime counters, and trace metadata.

model_response: Final model response associated with the run, when available.

output: Primary payload produced by the entrypoint.

output_dict(key)[source]

Return one output value normalized to a dictionary.

Parameters:: key – Output key to read.
Returns:: Dictionary value when the output value is mapping-like, else {}.

output_list(key)[source]

Return one output value normalized to a list.

Parameters:: key – Output key to read.
Returns:: List value when the output value is a list/tuple, else [].

output_value(key, default=None)[source]

Return one output value by key with optional default.

Parameters:

key – Output key to read.
default – Value returned when key is absent.

Returns:

Output value for key when present, else default.

step_results: Per-step results keyed by step id for workflow-style runs.

success: True when the overall run completed without terminal failure.

summary()[source]

Return one compact summary payload for user-facing output.

Returns:: Compact summary payload with canonical execution fields.

property terminated_reason

Return normalized termination reason when present.

Returns:: Termination reason string, or None.

to_dict()[source]

Return a JSON-serializable dictionary representation of the result.

Returns:: Dictionary representation of the result payload.

to_json(*, ensure_ascii=True, indent=2, sort_keys=True)[source]

Return JSON string for deterministic pretty-printing.

Parameters:

ensure_ascii – Forwarded to json.dumps.
indent – Forwarded to json.dumps.
sort_keys – Forwarded to json.dumps.

Returns:

JSON representation of this result.

tool_results: Tool invocation results captured during execution, in call order.

class design_research_agents.LLMRequest(*, messages, model=None, temperature=None, max_tokens=None, tools=(), response_schema=None, response_format=None, metadata=<factory>, provider_options=<factory>, task_profile=None)[source]

Provider-neutral request payload for LLM generation.

max_tokens: Maximum output token limit.

messages: Ordered conversation/messages sent to the model.

metadata: Caller metadata forwarded for tracing and diagnostics.

model: Explicit model identifier override for this request.

provider_options: Backend/provider-specific low-level options.

response_format: Provider-specific response-format hints.

response_schema: Optional schema for structured output validation.

task_profile: Optional routing profile used by selector-aware clients.

temperature: Sampling temperature override.

tools: Tool specifications exposed for model tool-calling.

class design_research_agents.LLMMessage(*, role, content, name=None, tool_call_id=None, tool_name=None)[source]

One chat message in the provider-neutral completion format.

content: Plain-text message content.

name: Optional participant name, when supported by the provider.

role: Message role used by chat-compatible backends.

tool_call_id: Tool call identifier for tool-response messages.

tool_name: Tool name associated with a tool-response message.

class design_research_agents.LLMResponse(*, text, model=None, provider=None, finish_reason=None, usage=None, latency_ms=None, raw_output=None, tool_calls=(), raw=None, provenance=None)[source]

Normalized non-streaming response payload returned by a backend.

finish_reason: Provider-specific completion reason.

latency_ms: End-to-end latency in milliseconds.

model: Model identifier reported by the backend.

provenance: Execution provenance metadata for auditability.

provider: Provider/backend name that produced this response.

raw: Canonical raw backend payload snapshot.

raw_output: Legacy/raw backend payload for debugging.

text: Primary response text emitted by the model.

tool_calls: Tool calls requested by the model in this response.

usage: Token usage counters when available.

class design_research_agents.ToolResult(*, tool_name, ok, result=None, artifacts=(), warnings=(), error=None, metadata=None)[source]

Result payload emitted from a tool runtime invocation.

Initialize canonical tool result payload.

Parameters:

tool_name – Name of the invoked tool.
ok – Invocation success flag.
result – Primary result payload (defaults to empty mapping).
artifacts – Raw or typed artifact entries to normalize.
warnings – Warning messages to attach to the result.
error – Error payload to normalize into ToolError.
metadata – Optional diagnostic metadata mapping.

property artifact_paths

Return artifact paths in emitted order.

Returns:: Tuple of artifact path strings.

artifacts: Artifact list emitted by the invocation.

error: Structured error details when ok is false.

property error_message

Return the normalized tool error message when present.

Returns:: Error message string, or None.

metadata: Supplemental runtime metadata for diagnostics and tracing.

ok: True when invocation succeeded.

result: Primary tool return payload.

result_dict()[source]

Return the primary result payload normalized to a dictionary.

Returns:: Dictionary value when result is mapping-like, else {}.

result_list()[source]

Return the primary result payload normalized to a list.

Returns:: List value when result is a list/tuple, else [].

tool_name: Name of the invoked tool.

warnings: Non-fatal warnings produced during invocation.

Orchestration#

Workflow Steps and Facade#

CompiledExecution is the workflow-backed object returned by delegate compile(...) methods. Calling compiled.run() executes the bound workflow and applies delegate-specific finalization. Accessing compiled.workflow gives the raw workflow graph for inspection and testing. Calling compiled.workflow.run(...) directly bypasses that finalization layer and returns the raw workflow result. Use compiled.to_mermaid() / compiled.to_svg() for direct compiled-workflow diagrams, or delegate.compile_to_mermaid() / delegate.compile_to_svg() to render the most recently compiled workflow stored on a delegate instance.

Workflow step executions surface WorkflowStepResult payloads through ExecutionResult.step_results. These step results mirror the top-level ExecutionResult output accessor helpers for consistent reads.

class design_research_agents.LogicStep(*, step_id, handler, dependencies=(), route_map=None, artifacts_builder=None)[source]#

Workflow step that executes deterministic local logic.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

dependencies#: Step ids that must complete before this step can run.

handler#: Deterministic local function that computes this step output.

route_map#: Optional route key to downstream-target mapping for conditional activation.

step_id#: Unique step identifier used for dependency wiring and result lookup.

class design_research_agents.ToolStep(*, step_id, tool_name, dependencies=(), input_data=None, input_builder=None, artifacts_builder=None)[source]#

Workflow step that invokes one runtime tool.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

dependencies#: Step ids that must complete before this step can run.

input_builder#: Optional callback that derives input payload from runtime step context.

input_data#: Static input payload used when input_builder is not provided.

step_id#: Unique step identifier used for dependency wiring and result lookup.

tool_name#: Registered tool name to invoke through the tool runtime.

class design_research_agents.DelegateStep(*, step_id, delegate, dependencies=(), prompt=None, prompt_builder=None, artifacts_builder=None)[source]#

Workflow step that invokes one direct delegate.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

delegate#: Direct delegate object (agent, pattern, or workflow-like runner).

dependencies#: Step ids that must complete before this step can run.

prompt#: Static prompt passed to the delegate when prompt_builder is absent.

prompt_builder#: Optional callback that derives a prompt string from runtime step context.

step_id#: Unique step identifier used for dependency wiring and result lookup.

class design_research_agents.ModelStep(*, step_id, llm_client, request_builder, dependencies=(), response_parser=None, artifacts_builder=None)[source]#

Workflow step that executes one model request through an LLM client.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

dependencies#: Step ids that must complete before this step can run.

llm_client#: LLM client used to execute the request built for this step.

request_builder#: Callback that builds the LLMRequest payload from runtime context.

response_parser#: Optional callback that parses model response into structured output.

step_id#: Unique step identifier used for dependency wiring and result lookup.

class design_research_agents.DelegateBatchStep(*, step_id, calls_builder, dependencies=(), fail_fast=True, artifacts_builder=None)[source]#

Workflow step that executes multiple delegate invocations in sequence.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

calls_builder#: Callback that builds batch delegate call specs from runtime context.

dependencies#: Step ids that must complete before this step can run.

fail_fast#: Whether to stop executing additional calls after first failure.

step_id#: Unique step identifier used for dependency wiring and result lookup.

class design_research_agents.LoopStep(*, step_id, steps, dependencies=(), max_iterations=1, initial_state=None, continue_predicate=None, state_reducer=None, execution_mode='sequential', failure_policy='skip_dependents', artifacts_builder=None)[source]#

Workflow step that executes an iterative nested workflow body.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

continue_predicate#: Predicate deciding whether to execute the next iteration.

dependencies#: Step ids that must complete before loop iteration begins.

execution_mode#: Execution mode used for nested loop-body workflow runs.

failure_policy#: Failure handling policy applied within each loop iteration run.

initial_state#: Initial loop state mapping provided to iteration context.

max_iterations#: Hard cap on the number of loop iterations.

state_reducer#: Reducer that computes next loop state from prior state and iteration result.

step_id#: Unique step identifier used for dependency wiring and result lookup.

steps#: Static loop body steps executed for each iteration.

class design_research_agents.MemoryReadStep(*, step_id, query_builder, dependencies=(), namespace='default', top_k=5, min_score=None, artifacts_builder=None)[source]#

Workflow step that reads relevant records from the memory store.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

dependencies#: Step ids that must complete before this step can run.

min_score#: Optional minimum score threshold for returned records.

namespace#: Namespace partition to read from.

query_builder#: Callback that builds query text or query payload from step context.

step_id#: Unique step identifier used for dependency wiring and result lookup.

top_k#: Maximum number of records to return.

class design_research_agents.MemoryWriteStep(*, step_id, records_builder, dependencies=(), namespace='default', artifacts_builder=None)[source]#

Workflow step that writes records into the memory store.

artifacts_builder#: Optional callback that extracts user-facing artifact manifests from step context.

dependencies#: Step ids that must complete before this step can run.

namespace#: Namespace partition to write into.

records_builder#: Callback that builds record payloads from step context.

step_id#: Unique step identifier used for dependency wiring and result lookup.

class design_research_agents.Workflow(*, tool_runtime=None, memory_store=None, steps, input_schema=None, output_schema=None, prompt_context_key='prompt', base_context=None, default_execution_mode='sequential', default_failure_policy='skip_dependents', default_request_id_prefix=None, default_dependencies=None, tracer=None)[source]#

Configured workflow for user-defined step graphs and run defaults.

Store runtime dependencies, step graph, and input handling mode.

Parameters:

tool_runtime – Tool runtime used by ToolStep executions.
memory_store – Optional memory store used by memory step executions.
steps – Static workflow step graph to execute for each run.
input_schema – Optional schema used to infer input mode and validate mapped input. When omitted, workflow expects prompt-like input.
output_schema – Optional schema enforced against output.final_output when the run succeeds.
prompt_context_key – Context key used to store normalized prompt input.
base_context – Base context merged into every run context.
default_execution_mode – Default runtime step scheduling mode.
default_failure_policy – Default dependency failure handling policy.
default_request_id_prefix – Optional prefix used to derive request ids.
default_dependencies – Default dependency objects injected into each run.
tracer – Optional tracer used for workflow runtime events.

Raises:

ValueError – If constructor inputs are inconsistent.

run(input=None, *, execution_mode=None, failure_policy=None, request_id=None, dependencies=None)[source]#

Execute one workflow run with input mode inferred from input_schema.

Parameters:

input – Prompt string or problem-like object when input_schema is omitted; otherwise schema mapping.
execution_mode – Optional per-run execution mode override.
failure_policy – Optional per-run failure policy override.
request_id – Optional explicit request id for tracing/correlation.
dependencies – Optional per-run dependency overrides.

Returns:

Aggregated workflow execution result.

to_mermaid(*, direction='TD')[source]#

Return a deterministic Mermaid diagram for the configured workflow.

Parameters:: direction – Mermaid flowchart direction (for example TD or LR).
Returns:: Mermaid flowchart text that reflects the declared step topology.

to_svg(*, direction='TD')[source]#

Return a deterministic SVG diagram for the configured workflow.

Parameters:: direction – Diagram direction (for example TD or LR).
Returns:: Standalone SVG markup that reflects the declared step topology.

class design_research_agents.CompiledExecution(*, workflow, input, request_id, dependencies, delegate_name, finalize=<function _identity_result>, execution_mode='sequential', failure_policy='skip_dependents', tracer=None, trace_input=<factory>, workflow_request_id=None)[source]#

Bound compiled delegate execution that can be run repeatedly.

delegate_name#: Delegate name used for top-level trace metadata.

dependencies#: Bound dependency payload mapping.

execution_mode#: Workflow execution mode used by run().

failure_policy#: Workflow failure policy used by run().

finalize#: Finalizer that maps the raw workflow result into the delegate result.

input#: Bound workflow input payload.

request_id#: Top-level request identifier for delegate tracing.

run()[source]#: Execute the compiled workflow and finalize the result.

to_mermaid(*, direction='TD')[source]#: Return a Mermaid diagram for the bound compiled workflow.

to_svg(*, direction='TD')[source]#: Return an SVG diagram for the bound compiled workflow.

trace_input#: Input payload attached to the top-level trace scope.

tracer#: Optional tracer used for top-level compile-run traces.

workflow#: Workflow graph compiled for this execution.

workflow_request_id#: Optional nested workflow request id override.

design_research_agents.build_json_prompt_workflow(*, llm_client, response_schema, system_prompt='You are a careful study participant. Return valid JSON only and match the requested schema exactly.', step_id='json_response', temperature=0.0, max_tokens=400, request_metadata=None, default_request_id_prefix='json-prompt-workflow', model=None, fallback_model_name=None, fallback_provider=None)[source]#

Build a prompt-mode workflow that returns one parsed JSON output.

Parameters:

llm_client – LLM client used by the workflow’s model step.
response_schema – JSON schema supplied to the model request and workflow output.
system_prompt – System message used to request strict JSON output.
step_id – Stable model-step identifier.
temperature – Sampling temperature passed to the LLM request.
max_tokens – Maximum output tokens passed to the LLM request.
request_metadata – Metadata added to every LLM request from this workflow.
default_request_id_prefix – Prefix used when workflow callers omit a request id.
model – Optional model override passed on the request.
fallback_model_name – Model label used in output events when the response omits it.
fallback_provider – Provider label used in output events when the response omits it.

Returns:

Workflow containing one model step that parses JSON into final_output.

Patterns#

Pattern compile(...) methods are the lower-level construction hook for advanced callers. They return a bound CompiledExecution and omit the top-level run() convenience wrapper until you call compiled.run().

class design_research_agents.TwoSpeakerConversationPattern(*, llm_client_a, llm_client_b=None, speaker_a_delegate=None, speaker_b_delegate=None, max_turns=3, speaker_a_name='speaker_a', speaker_b_name='speaker_b', speaker_a_system_prompt=None, speaker_a_user_prompt_template=None, speaker_b_system_prompt=None, speaker_b_user_prompt_template=None, default_request_id_prefix=None, default_dependencies=None, skills=None, tracer=None)[source]#

Two-speaker LLM conversation pattern with per-speaker prompts and clients.

Store dependencies and prompt defaults for conversation orchestration.

Parameters:

llm_client_a – LLM client used by speaker A.
llm_client_b – Optional LLM client used by speaker B. Defaults to llm_client_a when omitted.
speaker_a_delegate – Optional explicit delegate for speaker A.
speaker_b_delegate – Optional explicit delegate for speaker B.
max_turns – Maximum conversation turns where each turn is A->B.
speaker_a_name – Display name for speaker A in transcript and prompts.
speaker_b_name – Display name for speaker B in transcript and prompts.
speaker_a_system_prompt – Optional override for speaker A system prompt.
speaker_a_user_prompt_template – Optional speaker A user template override.
speaker_b_system_prompt – Optional override for speaker B system prompt.
speaker_b_user_prompt_template – Optional speaker B user template override.
default_request_id_prefix – Optional request-id prefix used for auto-generated ids.
default_dependencies – Default dependency mapping merged into each run.
skills – Optional Agent Skills configuration.
tracer – Optional tracer used for pattern and nested agent traces.

Raises:

ValueError – Raised when constructor configuration is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one two-speaker conversation workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one two-speaker conversation run.

class design_research_agents.DebatePattern(*, llm_client, tool_runtime, affirmative_delegate=None, negative_delegate=None, judge_delegate=None, max_rounds=3, affirmative_system_prompt=None, affirmative_user_prompt_template=None, negative_system_prompt=None, negative_user_prompt_template=None, judge_system_prompt=None, judge_user_prompt_template=None, default_request_id_prefix='debate', default_dependencies=None, skills=None, tracer=None)[source]#

Configured reusable debate pattern with affirmative, negative, and judge phases.

Store dependencies and initialize prompt defaults.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one debate workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: The debate pattern and return one final judged result.

class design_research_agents.PlanExecutePattern(*, llm_client, tool_runtime, planner_delegate=None, executor_delegate=None, max_iterations=3, max_tool_calls_per_step=5, planner_system_prompt=None, planner_user_prompt_template=None, executor_step_prompt_template=None, default_request_id_prefix=None, default_dependencies=None, skills=None, tracer=None)[source]#

Planner/executor orchestration pattern built on workflow primitives.

Store dependencies and initialize workflow-native orchestration settings.

Parameters:

llm_client – LLM client used for planner and executor model calls.
tool_runtime – Tool runtime used by executor agent steps.
planner_delegate – Optional planner delegate override.
executor_delegate – Optional executor delegate override.
max_iterations – Maximum number of plan steps executed in one run.
max_tool_calls_per_step – Maximum tool calls allowed per executor step.
planner_system_prompt – Optional override for planner system prompt.
planner_user_prompt_template – Optional override for planner user prompt.
executor_step_prompt_template – Optional override for executor step prompt.
default_request_id_prefix – Optional prefix used to derive request ids.
default_dependencies – Dependency defaults merged into each run.
skills – Optional Agent Skills configuration.
tracer – Optional tracer used for run-level instrumentation.

Raises:

ValueError – If max_iterations or max_tool_calls_per_step is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one bound plan-execute orchestration.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one plan-execute orchestration run.

class design_research_agents.ProposeCriticPattern(*, llm_client, tool_runtime, proposer_delegate=None, critic_delegate=None, max_iterations=3, proposer_system_prompt=None, proposer_user_prompt_template=None, critic_system_prompt=None, critic_user_prompt_template=None, default_request_id_prefix=None, default_dependencies=None, skills=None, tracer=None)[source]#

Propose/critique revision pattern built on workflow primitives.

Store dependencies and initialize workflow-native orchestration settings.

Parameters:

llm_client – LLM client used by proposer and critic calls.
tool_runtime – Tool runtime used by loop execution runtime.
proposer_delegate – Optional proposer delegate override.
critic_delegate – Optional critic delegate override.
max_iterations – Maximum propose/critic iterations per run.
proposer_system_prompt – Optional override for proposer system prompt.
proposer_user_prompt_template – Optional proposer user prompt template.
critic_system_prompt – Optional override for critic system prompt.
critic_user_prompt_template – Optional critic user prompt template.
default_request_id_prefix – Optional prefix used to derive request ids.
default_dependencies – Dependency defaults merged into each run.
skills – Optional Agent Skills configuration.
tracer – Optional tracer used for run-level instrumentation.

Raises:

ValueError – If max_iterations is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one propose/critic workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one propose-and-critique orchestration run.

class design_research_agents.RalphLoopPattern(*, roles, evaluator_role_id, loop_config=None, tracer=None)[source]#

Dynamic-role loop with evaluator-threshold termination.

Store role lineup and loop settings.

class LoopConfig(*, max_iterations=3, consensus_threshold=0.8, selection_strategy='best_score')[source]#

Typed loop configuration for Ralph loop execution.

consensus_threshold#

max_iterations#

selection_strategy#

class RoleSpec(*, role_id, delegate, prompt_template=None)[source]#

One ordered role participating in each iteration.

delegate#

prompt_template#

role_id#

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one Ralph loop workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute the Ralph loop pattern.

class design_research_agents.NominalTeamPattern(*, team_members, evaluator_delegate, tracer=None)[source]#

Independent member generation followed by evaluator-based best-of-N selection.

Store team-member and evaluator delegates.

class MemberSpec(*, member_id, delegate, prompt_template=None)[source]#

One independent nominal-team member configuration.

delegate#

member_id#

prompt_template#

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one nominal-team workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one nominal-team run.

class design_research_agents.RouterDelegatePattern(*, llm_client, tool_runtime, alternatives, alternative_descriptions=None, router_system_prompt=None, router_user_prompt_template=None, default_request_id_prefix=None, default_dependencies=None, skills=None, tracer=None)[source]#

Routing/delegation pattern built on workflow primitives.

Store dependencies and initialize workflow-native routing settings.

Parameters:

llm_client – LLM client used by the router agent.
tool_runtime – Tool runtime used to cost/metadata-account delegated calls.
alternatives – Mapping of route keys to delegate objects.
alternative_descriptions – Optional descriptions used to guide routing.
router_system_prompt – Optional override for router system prompt.
router_user_prompt_template – Optional override for router user prompt.
default_request_id_prefix – Optional prefix used to derive request ids.
default_dependencies – Dependency defaults merged into each run.
skills – Optional Agent Skills configuration.
tracer – Optional tracer used for run-level instrumentation.

Raises:

ValueError – If no valid route alternatives are supplied.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one intent-routing orchestration run.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute one intent-routing orchestration run.

class design_research_agents.RoundBasedCoordinationPattern(*, peers, max_rounds=4, initial_state=None, peer_prompt_builder=None, tracer=None)[source]#

Round-based peer coordination pattern with deterministic peer ordering.

Initialize peer-only networked orchestration.

Parameters:

peers – Mapping of peer ids to delegate objects.
max_rounds – Maximum number of coordination rounds.
initial_state – Optional initial shared state payload.
peer_prompt_builder – Optional prompt builder per peer and round.
tracer – Optional tracer dependency.

Raises:

ValueError – Raised when peers is empty or max_rounds is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile peer-only networked coordination rounds.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute peer-only networked coordination rounds.

class design_research_agents.BlackboardPattern(*, peers, max_rounds=6, stability_rounds=2, initial_state=None, peer_prompt_builder=None, tracer=None)[source]#

Networked pattern with explicit blackboard reducer semantics.

Initialize blackboard specialization with convergence controls.

Parameters:

peers – Peer delegates participating in rounds.
max_rounds – Maximum rounds before termination.
stability_rounds – Number of unchanged state hashes required to declare convergence.
initial_state – Optional initial blackboard override mapping.
peer_prompt_builder – Optional peer prompt builder callback.
tracer – Optional tracer dependency.

Raises:

ValueError – Raised when stability_rounds is less than one.

class design_research_agents.TreeSearchPattern(*, generator_delegate, evaluator_delegate, max_depth=3, branch_factor=3, beam_width=2, search_strategy='beam', mcts_exploration_weight=1.4, simulation_budget=None, tracer=None)[source]#

Tree-search pattern with beam and MCTS strategies.

Initialize tree-search reasoning pattern.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one tree-search workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute tree search and return the highest-scoring candidate.

class design_research_agents.RAGPattern(*, reasoning_delegate, memory_store, memory_namespace='default', memory_top_k=5, memory_min_score=None, graph_memory_store=None, graph_namespace=None, graph_top_k=3, graph_max_hops=1, graph_min_score=None, write_back=True, tracer=None)[source]#

Reasoning pattern orchestrated as memory read -> reason -> memory write.

Initialize RAG reasoning pattern.

Parameters:

reasoning_delegate – Delegate object that performs reasoning with retrieved context.
memory_store – Memory store used for retrieval and optional write-back.
memory_namespace – Namespace partition for reads/writes.
memory_top_k – Number of retrieved matches for reasoning context.
memory_min_score – Optional minimum retrieval score threshold.
graph_memory_store – Optional graph memory store used for subgraph retrieval.
graph_namespace – Optional namespace for graph retrieval; defaults to memory_namespace.
graph_top_k – Number of graph seed nodes to retrieve.
graph_max_hops – Graph traversal depth from matched seed nodes.
graph_min_score – Optional minimum graph seed score threshold.
write_back – Whether to persist one summary record after reasoning.
tracer – Optional tracer dependency.

Raises:

ValueError – Raised when retrieval configuration is invalid.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile the read/reason/write workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute memory retrieval, delegated reasoning, and optional write-back.

class design_research_agents.SimulatedAnnealingPattern(*, neighbor_delegate=None, modifications_delegate=None, objective_delegate, objective_mode='minimize', constraints=None, initial_state=None, initial_state_generator=None, expected_keys=None, state_validator=None, initial_temperature=100.0, max_iterations=100, convergence_threshold=1e-06, convergence_steps=5, temperature_schedule=None, random_seed=None, tracer=None)[source]#

General simulated annealing optimization pattern.

Store dependencies and validate baseline simulated annealing settings.

Parameters:

neighbor_delegate – Delegate that generates a neighboring solution given the current solution. Mutually exclusive with modifications_delegate. (Default: None)
modifications_delegate – Delegate that returns list of possible modifications to current solution. Mutually exclusive with neighbor_delegate. (Default: None)
objective_delegate – Delegate that computes the objective function value for a given solution.
objective_mode – Whether to minimize or maximize the objective function. (Default: “minimize”)
constraints – Optional list of delegates that define constraints for the optimization. (Default: None)
initial_state – Initial state for the optimization. Mutually exclusive with initial_state_generator. (Default: None)
initial_state_generator – Callable that generates the initial state. Mutually exclusive with initial_state. (Default: None)
expected_keys – Optional set of expected keys that must be present in initial state. (Default: None)
state_validator – Optional callable that validates a state. (Default: None)
initial_temperature – Starting temperature for the annealing process. (Default: 100.0)
max_iterations – Maximum number of iterations to perform. (Default: 100)
convergence_threshold – Minimum absolute change in objective value to consider non-converged. (Default: 1e-6)
convergence_steps – Number of consecutive steps with objective value change below threshold. (Default: 5)
temperature_schedule – Schedule for temperature decay. (Default: ExponentialSchedule)
random_seed – Seed for random number generation. (Default: None)
tracer – Optional tracer for workflow and debugging.

compile(prompt, *, request_id=None, dependencies=None)[source]#: Compile one simulated annealing workflow.

run(prompt, *, request_id=None, dependencies=None)[source]#: Execute the simulated annealing pattern.

Temperature Schedules#

Temperature schedules control the cooling rate during simulated annealing. Pass an instance to SimulatedAnnealingPattern via the temperature_schedule parameter. Subclass TemperatureSchedule to implement custom schedules.

class design_research_agents.TemperatureSchedule[source]#

Base class for temperature decay schedules.

get_params()[source]#

Return JSON-safe parameters describing this schedule.

Override in custom sublasses to cotnrol what is exposed in result metadata.

abstractmethod get_temperature(initial_temperature, iteration, *, current_temperature=None, objective_value_history=None)[source]#

Return the temperature for one iteration.

Parameters:

initial_temperature – The initial temperature configured for the SA run.
iteration – The current iteration number (starting from 0).
current_temperature – The temperature from the previous iteration, if applicable.
objective_value_history – List of objective values from previous iterations, if applicable.

Returns:

Temperature value for current iteration.

class design_research_agents.LinearSchedule(alpha)[source]#

Linear decay schedule.

get_params()[source]#

Return JSON-safe parameters describing this schedule.

Override in custom sublasses to cotnrol what is exposed in result metadata.

get_temperature(initial_temperature, iteration, *, current_temperature=None, objective_value_history=None)[source]#: Decrease temperature by a constant amount each iteration.

class design_research_agents.ExponentialSchedule(alpha)[source]#

Exponential decay schedule.

get_params()[source]#

Return JSON-safe parameters describing this schedule.

Override in custom sublasses to cotnrol what is exposed in result metadata.

get_temperature(initial_temperature, iteration, *, current_temperature=None, objective_value_history=None)[source]#: Decrease temperature by a constant multiplicative factor.

class design_research_agents.LogarithmicSchedule(c, d)[source]#

Logarithmic decay schedule.

get_params()[source]#

Return JSON-safe parameters describing this schedule.

Override in custom sublasses to cotnrol what is exposed in result metadata.

get_temperature(initial_temperature, iteration, *, current_temperature=None, objective_value_history=None)[source]#: Decrease temperature according to a logarithmic schedule.

class design_research_agents.AdaptiveSchedule(delta=None, mu=5.0)[source]#

Triki adaptive temperature schedule.

Uses formula T_{k+1} = T_k * (1 - T_k * delta / sigma_sq) where sigma_sq is the variance of all objective values sampled so far and delta is a constant target decrease in cost per Metropolis chain.

When delta is not provided, it is derived automatically on the first call that has sufficient objective value history as stdev(objective_value_history) / mu, then held constant for the rest of the run.

Falls back to current temperature when objective_value_history has fewer than 2 entries, when variance is zero, or when the factor T_k * delta / sigma_sq >= 1.

get_params()[source]#

Return JSON-safe parameters describing this schedule.

Override in custom sublasses to cotnrol what is exposed in result metadata.

get_temperature(initial_temperature, iteration, *, current_temperature=None, objective_value_history=None)[source]#: Decrease temperature adaptively based on spread of sampled objective values.

Tools#

class design_research_agents.Toolbox(*, workspace_root='.', enable_core_tools=True, script_tools=None, callable_tools=None, mcp_servers=None)[source]#

Tool runtime that routes calls across enabled tool sources.

Initialize toolbox sources from ergonomic constructor arguments.

Parameters:

workspace_root – Root directory for tools that interact with the filesystem.
enable_core_tools – Whether to enable the built-in core tools.
script_tools – Optional tuple of ScriptToolConfig definitions to expose through a script tool source.
callable_tools – Optional tuple of CallableToolConfig definitions to register as in-process tools.
mcp_servers – Optional tuple of MCP server definitions to connect to and expose tools from.

close()[source]#: Release external source resources.

property config#

Return active runtime configuration.

Returns:: Fully resolved runtime configuration for this toolbox.

invoke(tool_name, input, *, request_id, dependencies)[source]#

Invoke one tool through the registry routing layer.

Parameters:

tool_name – Name of the tool to invoke. This will be normalized by stripping leading and trailing whitespace before lookup.
input – Mapping of input values to provide for this tool invocation. This will be validated against the tool’s input schema before invocation.
request_id – Request ID to associate with this tool invocation, which will be passed through to the underlying tool handler and can be used for logging, tracing, and other purposes.
dependencies – Mapping of dependencies to provide for this tool invocation, which will be passed through to the underlying tool handler and can be used to provide additional context or resources needed for the tool execution.

Returns:

The result of the tool invocation, as returned by the underlying tool handler. This will be validated against the tool’s output schema before being returned to the caller.

invoke_dict(tool_name, input, *, request_id, dependencies)[source]#

Invoke one tool and require a successful dictionary payload.

Parameters:

tool_name – Name of the tool to invoke.
input – Tool input payload mapping.
request_id – Request identifier associated with this invocation.
dependencies – Dependency payload mapping for this invocation.

Returns:

Tool result mapping.

Raises:

RuntimeError – If invocation fails or result payload is not a mapping.

list_tools()[source]#

List all tools currently exposed by enabled runtime sources.

Returns:: Sequence of ToolSpec objects representing all tools currently exposed by enabled runtime sources, in no particular order.

register_callable_tool(callable_tool)[source]#

Parameters:: callable_tool – CallableToolConfig definition to register. The name field will be normalized by stripping leading and trailing whitespace, and must be non-empty after normalization.
Returns:: None
Raises:: Exception – Raised when this operation cannot complete.

register_tool(*, spec, handler)[source]#

Parameters:

spec – ToolSpec defining the tool to register. The name field will be normalized by stripping leading and trailing whitespace, and must be non-empty after normalization.
handler – ToolHandler function to execute when this tool is invoked. The handler will be wrapped to match the expected signature for in-process tools, which includes additional parameters for request ID and dependencies that will be ignored by the provided handler.

Returns:

None

property registry#

Return the source-merging registry.

Returns:: Registry that owns source routing and invocation dispatch.

class design_research_agents.CallableToolConfig(*, name, description, handler, input_schema=<factory>, output_schema=<factory>, permissions=(), risky=None)[source]#

Simple in-process callable tool wrapper descriptor.

description#: Short description of the tool’s behavior.

handler#: Python callable that implements the tool’s behavior. It should accept a single argument of type Mapping[str, object] and return an arbitrary JSON-serializable object.

input_schema#: JSON Schema describing the expected input structure for the tool. This is used for validation and documentation purposes.

name#: Unique name of the tool.

output_schema#: JSON Schema describing the structure of the tool’s output. This is used for validation and documentation purposes.

permissions#: Optional tuple of permission strings that the tool requires. This can be used to enforce security constraints or to inform users about the tool’s capabilities.

risky#: Whether the tool performs potentially risky operations.

class design_research_agents.ScriptToolConfig(*, name, path, description, input_schema=<factory>, output_schema=<factory>, filesystem_read=False, filesystem_write=False, network=False, commands=(), timeout_s=30, permissions=(), risky=None)[source]#

One explicit script-backed tool definition.

commands#: Optional tuple of allowed shell commands that the tool is permitted to execute. If non-empty, the tool will only be allowed to execute commands in this list, and attempts to execute any other commands will be blocked. This is used to enforce security constraints and limit the tool’s capabilities.

description#: Short description of the tool’s behavior. This should be a concise summary of what the tool does, suitable for inclusion in prompts and documentation.

filesystem_read#: Flag indicating whether the tool needs read access to the filesystem. If True, the tool will be granted read access to the workspace root and artifacts directory. If False, the tool will not be granted any filesystem access. This is used to enforce security constraints and limit the tool’s capabilities.

filesystem_write#: Flag indicating whether the tool needs write access to the filesystem. If True, the tool will be granted write access to the workspace root and artifacts directory. If False, the tool will not be granted any filesystem access. This is used to enforce security constraints and limit the tool’s capabilities.

input_schema#: JSON Schema describing the expected input structure for the tool. This is used for validation and documentation purposes. The tool will receive its input as a JSON-encoded string on its standard input, and it should produce its output as a JSON-encoded string on its standard output. The input schema should describe the structure of the JSON object that the tool expects to receive, including any required properties and their types.

name#: Unique name of the tool. This is used to reference the tool in prompts and logs.

network#: Flag indicating whether the tool needs access to the network. If True, the tool will be granted access to the network. If False, the tool will not be granted any network access. This is used to enforce security constraints and limit the tool’s capabilities.

output_schema#: JSON Schema describing the structure of the tool’s output. This is used for validation and documentation purposes. The tool’s output should be a JSON-encoded string written to its standard output, and the output schema should describe the structure of the JSON object that the tool produces, including any properties and their types.

path#: Filesystem path to the script that implements the tool’s behavior. This should be an absolute path or a path relative to the configured workspace root. The script will be executed as a subprocess when the tool is invoked, and communicated with via its standard input and output streams.

permissions#: Optional tuple of permission strings that the tool requires. This can be used to enforce security constraints or to inform users about the tool’s capabilities. The specific permission strings and their meanings are not defined by this configuration and should be interpreted by the tool runtime or the user interface accordingly.

risky#: Optional boolean flag indicating whether the tool performs potentially risky operations, such as executing shell commands, accessing the filesystem, or making network requests. This can be used to inform users about the tool’s capabilities and potential risks.

timeout_s#: Timeout in seconds for the tool’s execution. If the tool does not produce output within this time frame, it will be considered unresponsive, and appropriate error handling will be triggered.

class design_research_agents.MCPServerConfig(*, id, type='stdio', command=(), timeout_s=20, env_allowlist=('PATH', 'HOME', 'USER', 'LANG', 'LC_ALL', 'PYTHONPATH', 'VIRTUAL_ENV'), env=<factory>)[source]#

External MCP server definition.

command#: Command to launch the server, specified as a tuple of strings. The first element should be the executable, and the subsequent elements are its arguments.

env#: Explicit environment variables to set for the server process. This is a mapping of variable names to their desired values. These variables will be included in the server’s environment in addition to any variables from the allowlist that are present in the parent process.

env_allowlist#: Allowlist of environment variable names that will be passed to the server process. Only variables in this list will be included in the server’s environment, which helps to limit exposure of sensitive information and reduce the attack surface.

id#: Unique identifier for the server. This is used to reference the server in tool definitions and logs.

classmethod python_module(*, id, module, args=(), python=None, timeout_s=20, env_allowlist=None, env=None)[source]#

Create a stdio server config for python -m <module>.

Parameters:

id – Unique server identifier used in tool names and traces.
module – Importable Python module to launch with -m.
args – Positional command-line arguments passed after the module.
python – Python executable to use. Defaults to the current interpreter.
timeout_s – Timeout in seconds for MCP server responses.
env_allowlist – Optional environment-variable allowlist override.
env – Explicit environment variables for the server process.

Returns:

MCP server config with a normalized module command.

Raises:

ValueError – If module is empty.

timeout_s#: Timeout in seconds for server responses before treating it as unresponsive.

type#: Communication protocol to use with the server. Currently, only ‘stdio’ is supported, which means the server will be launched as a subprocess and communicated with via its standard input and output streams.

Tracing#

class design_research_agents.Tracer(*, enabled=True, trace_dir=PosixPath('traces'), enable_jsonl=True, enable_console=True, console_stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)[source]#

Explicitly configured tracer dependency injected into runtimes.

build_sinks(*, trace_path)[source]#

Build concrete sinks for this tracer configuration.

Parameters:: trace_path – Optional JSONL path returned by build_trace_path.
Returns:: Concrete sink instances enabled by this tracer configuration.

build_trace_path(*, run_id)[source]#

Build a trace JSONL path for one run when JSONL sink is enabled.

Parameters:: run_id – Request or run identifier used in the trace filename.
Returns:: JSONL output path for the run, or None when JSONL output is disabled.

console_stream#: Stream used for console trace output.

enable_console#: Whether console trace output should be emitted.

enable_jsonl#: Whether JSONL trace files should be emitted.

enabled#: Whether tracing is enabled for this tracer instance.

resolve_latest_trace_path(request_id)[source]#

Resolve latest emitted JSONL trace path for one request id.

Parameters:: request_id – Request identifier used in trace filenames.
Returns:: Latest matching trace file path, or None.

run_callable(*, agent_name, request_id, input_payload, function, dependencies=None)[source]#

One callable wrapped in explicit trace session lifecycle.

Parameters:

agent_name – Delegate name used in trace metadata.
request_id – Request id used for trace run and file naming.
input_payload – Input payload metadata for trace run start.
function – Zero-argument callable to execute.
dependencies – Optional dependency mapping for trace metadata.

Returns:

Function return value.

trace_dir#: Directory where JSONL trace files are written.

trace_info(request_id)[source]#

Return JSON-serializable trace metadata for one request id.

Parameters:: request_id – Request identifier associated with the trace run.
Returns:: Trace metadata payload.

API#

__version__#

Entry Points#

Agents#

Study Execution#

Skills#

LLM Clients, Flights, and Selection#

Core Contracts#

Orchestration#

Workflow Steps and Facade#

Patterns#

Temperature Schedules#

Tools#

Tracing#

`version`#