Model Selection Modules

This page documents internal model-selection modules for contributor visibility. These underscored module paths are intentionally unstable.

Model catalog utilities and default catalog entries.

class design_research_agents._model_selection._catalog.ModelCatalog(*, models)[source]

Catalog of known models and their hardware hints.

models

Tuple of model specifications.

Type:

tuple[design_research_agents._model_selection._types.ModelSpec, …]

classmethod default()[source]

Build the default model catalog.

Returns:

Default model catalog instance.

find(model_id)[source]

Return the model spec with the given id, if present.

Parameters:

model_id – Model identifier to search for.

Returns:

Matching model spec, or None when not found.

models

Stored models value.

signature()[source]

Return a stable signature for catalog reproducibility.

Returns:

Stable signature string derived from the catalog contents.

Hardware profiling helpers for model selection.

class design_research_agents._model_selection._hardware.HardwareProfile(*, total_ram_gb, available_ram_gb, cpu_count, load_average, gpu_present, gpu_vram_gb, gpu_name=None, platform_name=None)[source]

Snapshot of system hardware capacity for model selection.

total_ram_gb

Total system RAM in GiB.

Type:

float | None

available_ram_gb

Available system RAM in GiB.

Type:

float | None

cpu_count

Logical CPU count.

Type:

int | None

load_average

Load average tuple when supported.

Type:

tuple[float, float, float] | None

gpu_present

Whether a GPU is detected.

Type:

bool | None

gpu_vram_gb

Detected GPU VRAM in GiB.

Type:

float | None

gpu_name

Optional GPU name.

Type:

str | None

platform_name

Platform identifier string.

Type:

str | None

available_ram_gb

Available system memory in GiB.

cpu_count

Logical CPU count when it can be detected.

classmethod detect()[source]

Collect a best-effort hardware profile for the current system.

Returns:

Detected hardware profile snapshot.

gpu_name

Detected GPU name, when available.

gpu_present

Whether a GPU appears to be available.

gpu_vram_gb

Best-effort GPU memory estimate in GiB.

load_average

One-, five-, and fifteen-minute load averages when supported.

platform_name

Platform identifier used during detection.

to_dict()[source]

Return a JSON-ready representation of the profile.

Returns:

JSON-serializable hardware profile mapping.

total_ram_gb

Total system memory in GiB.

Model selection policy implementation.

class design_research_agents._model_selection._policy.ModelSelectionPolicy(*, catalog=<factory>, config=<factory>)[source]

Policy that selects a model using intent, constraints, and hardware.

catalog

Model catalog used for candidate selection.

Type:

design_research_agents._model_selection._catalog.ModelCatalog

config

Policy configuration values.

Type:

design_research_agents._model_selection._types.ModelSelectionPolicyConfig

catalog

Catalog queried for candidate models.

config

Policy thresholds and default selection behavior.

select_model(*, intent, constraints, hardware_profile)[source]

Select an appropriate model and emit a traceable decision.

Parameters:
  • intent – Task intent and priority preferences.

  • constraints – Optional model selection constraints.

  • hardware_profile – Optional hardware profile override.

Returns:

Selection decision with rationale and safety bounds.

Raises:

Exception – Raised when this operation cannot complete.

Public model selection facade with flattened constructor-first ergonomics.

class design_research_agents._model_selection._selector.ModelSelector(*, catalog=None, prefer_local=True, ram_reserve_gb=2.0, vram_reserve_gb=0.5, max_load_ratio=0.85, remote_cost_floor_usd=0.02, default_max_latency_ms=None, local_client_resolver=None)[source]

Flat model selection interface with client/config resolution helpers.

Initialize model selector policy controls and optional resolver hook.

Parameters:
  • catalog – Optional model catalog to use for selection.

  • prefer_local – Whether to prefer local models over remote ones when all else is equal.

  • ram_reserve_gb – Amount of RAM (in GB) to reserve when evaluating local candidates.

  • vram_reserve_gb – Amount of GPU VRAM (in GB) to reserve when evaluating local candidates.

  • max_load_ratio – Maximum system load ratio to consider a local candidate viable (0.0 to 1.0).

  • remote_cost_floor_usd – Minimum cost threshold (in USD) for remote models to be considered viable.

  • default_max_latency_ms – Default maximum latency (in milliseconds) to consider when evaluating candidates, if not specified in selection constraints.

  • local_client_resolver – Optional callable that takes a ModelSelectionDecision and returns a dict with ‘client_class’ and ‘kwargs’ for constructing a local client when the provider is not recognized by the built-in resolver. This allows for custom local providers to be integrated without modifying the ModelSelector code.

select(*, task, priority='balanced', require_local=False, preferred_provider=None, max_cost_usd=None, max_latency_ms=None, hardware_profile=None, output='client')[source]

Select a model and return a decision, config mapping, or live client.

Parameters:
  • task – Description of the task or use case for which a model is being selected.

  • priority – Selection priority, which may influence the trade-off between quality, latency, and cost in the decision process.

  • require_local – If True, only consider local models as viable candidates.

  • preferred_provider – Optional provider name to prioritize in the selection process.

  • max_cost_usd – Optional maximum cost threshold (in USD) for candidate models.

  • max_latency_ms – Optional maximum latency threshold (in milliseconds) for candidate models.

  • hardware_profile – Optional mapping or HardwareProfile instance describing the current hardware state, which may be used to evaluate local candidates.

  • output – Determines the format of the selection result. “client” returns an instantiated LLMClient ready for use, “decision” returns the raw ModelSelectionDecision object with details of the selection rationale, and “client_config” returns a dict containing the information needed to construct an LLMClient (including ‘client_class’ and ‘kwargs’) without actually instantiating it.

Returns:

Depending on the ‘output’ parameter

  • If output is “client”: An instantiated LLMClient configured according to the selection decision, ready for use in making requests.

  • If output is “decision”: A ModelSelectionDecision object containing details about the selected model, provider, rationale, and policy information.

  • If output is “client_config”: A dict containing the resolved client configuration, including ‘client_class’, ‘kwargs’, and metadata from the selection decision, which can be used to instantiate an LLMClient at a later time or in a different context.

Raises:

ValueError – If output is unsupported or selection/config coercion fails.

Shared model selection data types.

class design_research_agents._model_selection._types.ModelCostHint(*, tier, usd_per_1k_tokens=None)[source]

Cost hints for model selection.

tier

Relative cost tier for this model option.

usd_per_1k_tokens

Estimated USD cost per 1K tokens, when available.

class design_research_agents._model_selection._types.ModelLatencyHint(*, tier, note=None)[source]

Latency hints for model selection.

note

Optional annotation for latency assumptions.

tier

Relative latency tier for this model option.

class design_research_agents._model_selection._types.ModelMemoryHint(*, min_ram_gb, min_vram_gb, note=None)[source]

Memory requirement hints for model selection.

min_ram_gb

Suggested minimum system RAM in GiB.

Type:

float | None

min_vram_gb

Suggested minimum GPU VRAM in GiB.

Type:

float | None

note

Optional annotation for the hint.

Type:

str | None

min_ram_gb

Suggested minimum system RAM (GiB) for reliable execution.

min_vram_gb

Suggested minimum GPU VRAM (GiB), when GPU execution is relevant.

note

Optional annotation explaining caveats in the memory hint.

class design_research_agents._model_selection._types.ModelSafetyConstraints(*, max_cost_usd, max_latency_ms)[source]

Safety bounds attached to a model selection decision.

max_cost_usd

Cost bound propagated into the decision.

Type:

float | None

max_latency_ms

Latency bound propagated into the decision.

Type:

int | None

max_cost_usd

Cost bound carried into the final decision payload.

max_latency_ms

Latency bound carried into the final decision payload.

class design_research_agents._model_selection._types.ModelSelectionConstraints(*, require_local=False, preferred_provider=None, max_cost_usd=None, max_latency_ms=None)[source]

Constraints that bound model selection choices.

require_local

Whether to force local-only selection.

Type:

bool

preferred_provider

Optional provider override.

Type:

str | None

max_cost_usd

Optional maximum cost per 1K tokens.

Type:

float | None

max_latency_ms

Optional latency cap in milliseconds.

Type:

int | None

max_cost_usd

Optional maximum cost bound (USD per 1K tokens).

max_latency_ms

Optional maximum latency bound in milliseconds.

preferred_provider

Optional preferred provider key to bias selection.

require_local

When true, only local providers are eligible.

class design_research_agents._model_selection._types.ModelSelectionDecision(*, model_id, provider, rationale, safety_constraints, policy_id, catalog_signature)[source]

Selection output describing the chosen model and rationale.

model_id

Selected model identifier.

Type:

str

provider

Selected provider name.

Type:

str

rationale

Human-readable rationale for the choice.

Type:

str

safety_constraints

Safety bounds applied to the selection.

Type:

design_research_agents._model_selection._types.ModelSafetyConstraints

policy_id

Policy identifier for reproducibility.

Type:

str

catalog_signature

Catalog signature used for the decision.

Type:

str

catalog_signature

Catalog signature/version used during selection.

model_id

Selected model identifier.

policy_id

Policy identifier used to produce this decision.

provider

Provider key for the selected model.

rationale

Human-readable explanation of the selection decision.

safety_constraints

Safety/cost/latency constraints attached to the decision.

class design_research_agents._model_selection._types.ModelSelectionIntent(*, task, priority='balanced')[source]

Intent descriptor used by the model selection policy.

priority

Priority tradeoff between quality and speed.

task

Task description used to classify selection intent.

class design_research_agents._model_selection._types.ModelSelectionPolicyConfig(*, policy_id='default', prefer_local=True, ram_reserve_gb=2.0, vram_reserve_gb=0.5, max_load_ratio=0.85, remote_cost_floor_usd=0.02, default_max_latency_ms=None)[source]

Configuration controlling model selection behavior.

policy_id

Identifier used for traceability.

Type:

str

prefer_local

Whether to prefer local models by default.

Type:

bool

ram_reserve_gb

Reserved system RAM in GiB.

Type:

float

vram_reserve_gb

Reserved GPU VRAM in GiB.

Type:

float

max_load_ratio

Load ratio threshold to prefer remote.

Type:

float

remote_cost_floor_usd

Cost below which remote is avoided.

Type:

float

default_max_latency_ms

Default latency cap when none is provided.

Type:

int | None

default_max_latency_ms

Default latency bound applied when callers provide none.

max_load_ratio

System load threshold above which remote models are preferred.

policy_id

Policy identifier used for traceability.

prefer_local

Whether local models are preferred by default.

ram_reserve_gb

Reserved system RAM (GiB) not available to model workloads.

remote_cost_floor_usd

Remote cost floor below which remote options are deprioritized.

vram_reserve_gb

Reserved GPU VRAM (GiB) not available to model workloads.

class design_research_agents._model_selection._types.ModelSpec(*, model_id, provider, family, size_b, format, quantization, memory_hint, latency_hint, cost_hint, quality_tier, speed_tier)[source]

Catalog entry describing one model option.

model_id

Unique model identifier used by backends.

Type:

str

provider

Backend or provider name.

Type:

str

family

Model family grouping label.

Type:

str

size_b

Approximate parameter count in billions.

Type:

float | None

format

Storage or API format identifier.

Type:

str | None

quantization

Quantization name when applicable.

Type:

str | None

memory_hint

Optional memory requirement hints.

Type:

design_research_agents._model_selection._types.ModelMemoryHint | None

latency_hint

Optional latency hints.

Type:

design_research_agents._model_selection._types.ModelLatencyHint | None

cost_hint

Optional cost hints.

Type:

design_research_agents._model_selection._types.ModelCostHint | None

quality_tier

Relative quality score (higher is better).

Type:

int | None

speed_tier

Relative speed score (higher is faster).

Type:

int | None

cost_hint

Optional cost profile for this model.

family

Model family grouping (for reporting and routing heuristics).

format

Model format identifier (for example GGUF or API-native).

property is_local

Return True when the model runs locally.

Returns:

True when the provider is a local backend.

latency_hint

Optional latency profile for this model.

memory_hint

Optional memory requirements for this model.

model_id

Provider-specific model identifier.

provider

Provider/backend key used to execute the model.

quality_tier

Relative quality ranking used by policy scoring.

quantization

Quantization descriptor when applicable.

size_b

Approximate parameter count in billions.

speed_tier

Relative speed ranking used by policy scoring.