Model Selection Modules
This page documents internal model-selection modules for contributor visibility. These underscored module paths are intentionally unstable.
Model catalog utilities and default catalog entries.
- class design_research_agents._model_selection._catalog.ModelCatalog(*, models)[source]
Catalog of known models and their hardware hints.
- models
Tuple of model specifications.
- Type:
tuple[design_research_agents._model_selection._types.ModelSpec, …]
- classmethod default()[source]
Build the default model catalog.
- Returns:
Default model catalog instance.
- find(model_id)[source]
Return the model spec with the given id, if present.
- Parameters:
model_id – Model identifier to search for.
- Returns:
Matching model spec, or
Nonewhen not found.
- models
Stored
modelsvalue.
- signature()[source]
Return a stable signature for catalog reproducibility.
- Returns:
Stable signature string derived from the catalog contents.
Hardware profiling helpers for model selection.
- class design_research_agents._model_selection._hardware.HardwareProfile(*, total_ram_gb, available_ram_gb, cpu_count, load_average, gpu_present, gpu_vram_gb, gpu_name=None, platform_name=None)[source]
Snapshot of system hardware capacity for model selection.
- total_ram_gb
Total system RAM in GiB.
- Type:
float | None
- available_ram_gb
Available system RAM in GiB.
- Type:
float | None
- cpu_count
Logical CPU count.
- Type:
int | None
- load_average
Load average tuple when supported.
- Type:
tuple[float, float, float] | None
- gpu_present
Whether a GPU is detected.
- Type:
bool | None
- gpu_vram_gb
Detected GPU VRAM in GiB.
- Type:
float | None
- gpu_name
Optional GPU name.
- Type:
str | None
- platform_name
Platform identifier string.
- Type:
str | None
- available_ram_gb
Available system memory in GiB.
- cpu_count
Logical CPU count when it can be detected.
- classmethod detect()[source]
Collect a best-effort hardware profile for the current system.
- Returns:
Detected hardware profile snapshot.
- gpu_name
Detected GPU name, when available.
- gpu_present
Whether a GPU appears to be available.
- gpu_vram_gb
Best-effort GPU memory estimate in GiB.
- load_average
One-, five-, and fifteen-minute load averages when supported.
- platform_name
Platform identifier used during detection.
- to_dict()[source]
Return a JSON-ready representation of the profile.
- Returns:
JSON-serializable hardware profile mapping.
- total_ram_gb
Total system memory in GiB.
Model selection policy implementation.
- class design_research_agents._model_selection._policy.ModelSelectionPolicy(*, catalog=<factory>, config=<factory>)[source]
Policy that selects a model using intent, constraints, and hardware.
- catalog
Model catalog used for candidate selection.
- Type:
design_research_agents._model_selection._catalog.ModelCatalog
- config
Policy configuration values.
- Type:
design_research_agents._model_selection._types.ModelSelectionPolicyConfig
- catalog
Catalog queried for candidate models.
- config
Policy thresholds and default selection behavior.
- select_model(*, intent, constraints, hardware_profile)[source]
Select an appropriate model and emit a traceable decision.
- Parameters:
intent – Task intent and priority preferences.
constraints – Optional model selection constraints.
hardware_profile – Optional hardware profile override.
- Returns:
Selection decision with rationale and safety bounds.
- Raises:
Exception – Raised when this operation cannot complete.
Public model selection facade with flattened constructor-first ergonomics.
- class design_research_agents._model_selection._selector.ModelSelector(*, catalog=None, prefer_local=True, ram_reserve_gb=2.0, vram_reserve_gb=0.5, max_load_ratio=0.85, remote_cost_floor_usd=0.02, default_max_latency_ms=None, local_client_resolver=None)[source]
Flat model selection interface with client/config resolution helpers.
Initialize model selector policy controls and optional resolver hook.
- Parameters:
catalog – Optional model catalog to use for selection.
prefer_local – Whether to prefer local models over remote ones when all else is equal.
ram_reserve_gb – Amount of RAM (in GB) to reserve when evaluating local candidates.
vram_reserve_gb – Amount of GPU VRAM (in GB) to reserve when evaluating local candidates.
max_load_ratio – Maximum system load ratio to consider a local candidate viable (0.0 to 1.0).
remote_cost_floor_usd – Minimum cost threshold (in USD) for remote models to be considered viable.
default_max_latency_ms – Default maximum latency (in milliseconds) to consider when evaluating candidates, if not specified in selection constraints.
local_client_resolver – Optional callable that takes a ModelSelectionDecision and returns a dict with ‘client_class’ and ‘kwargs’ for constructing a local client when the provider is not recognized by the built-in resolver. This allows for custom local providers to be integrated without modifying the ModelSelector code.
- select(*, task, priority='balanced', require_local=False, preferred_provider=None, max_cost_usd=None, max_latency_ms=None, hardware_profile=None, output='client')[source]
Select a model and return a decision, config mapping, or live client.
- Parameters:
task – Description of the task or use case for which a model is being selected.
priority – Selection priority, which may influence the trade-off between quality, latency, and cost in the decision process.
require_local – If True, only consider local models as viable candidates.
preferred_provider – Optional provider name to prioritize in the selection process.
max_cost_usd – Optional maximum cost threshold (in USD) for candidate models.
max_latency_ms – Optional maximum latency threshold (in milliseconds) for candidate models.
hardware_profile – Optional mapping or HardwareProfile instance describing the current hardware state, which may be used to evaluate local candidates.
output – Determines the format of the selection result. “client” returns an instantiated LLMClient ready for use, “decision” returns the raw ModelSelectionDecision object with details of the selection rationale, and “client_config” returns a dict containing the information needed to construct an LLMClient (including ‘client_class’ and ‘kwargs’) without actually instantiating it.
- Returns:
Depending on the ‘output’ parameter –
If output is “client”: An instantiated LLMClient configured according to the selection decision, ready for use in making requests.
If output is “decision”: A ModelSelectionDecision object containing details about the selected model, provider, rationale, and policy information.
If output is “client_config”: A dict containing the resolved client configuration, including ‘client_class’, ‘kwargs’, and metadata from the selection decision, which can be used to instantiate an LLMClient at a later time or in a different context.
- Raises:
ValueError – If
outputis unsupported or selection/config coercion fails.
Shared model selection data types.
- class design_research_agents._model_selection._types.ModelCostHint(*, tier, usd_per_1k_tokens=None)[source]
Cost hints for model selection.
- tier
Relative cost tier for this model option.
- usd_per_1k_tokens
Estimated USD cost per 1K tokens, when available.
- class design_research_agents._model_selection._types.ModelLatencyHint(*, tier, note=None)[source]
Latency hints for model selection.
- note
Optional annotation for latency assumptions.
- tier
Relative latency tier for this model option.
- class design_research_agents._model_selection._types.ModelMemoryHint(*, min_ram_gb, min_vram_gb, note=None)[source]
Memory requirement hints for model selection.
- min_ram_gb
Suggested minimum system RAM in GiB.
- Type:
float | None
- min_vram_gb
Suggested minimum GPU VRAM in GiB.
- Type:
float | None
- note
Optional annotation for the hint.
- Type:
str | None
- min_ram_gb
Suggested minimum system RAM (GiB) for reliable execution.
- min_vram_gb
Suggested minimum GPU VRAM (GiB), when GPU execution is relevant.
- note
Optional annotation explaining caveats in the memory hint.
- class design_research_agents._model_selection._types.ModelSafetyConstraints(*, max_cost_usd, max_latency_ms)[source]
Safety bounds attached to a model selection decision.
- max_cost_usd
Cost bound propagated into the decision.
- Type:
float | None
- max_latency_ms
Latency bound propagated into the decision.
- Type:
int | None
- max_cost_usd
Cost bound carried into the final decision payload.
- max_latency_ms
Latency bound carried into the final decision payload.
- class design_research_agents._model_selection._types.ModelSelectionConstraints(*, require_local=False, preferred_provider=None, max_cost_usd=None, max_latency_ms=None)[source]
Constraints that bound model selection choices.
- require_local
Whether to force local-only selection.
- Type:
bool
- preferred_provider
Optional provider override.
- Type:
str | None
- max_cost_usd
Optional maximum cost per 1K tokens.
- Type:
float | None
- max_latency_ms
Optional latency cap in milliseconds.
- Type:
int | None
- max_cost_usd
Optional maximum cost bound (USD per 1K tokens).
- max_latency_ms
Optional maximum latency bound in milliseconds.
- preferred_provider
Optional preferred provider key to bias selection.
- require_local
When true, only local providers are eligible.
- class design_research_agents._model_selection._types.ModelSelectionDecision(*, model_id, provider, rationale, safety_constraints, policy_id, catalog_signature)[source]
Selection output describing the chosen model and rationale.
- model_id
Selected model identifier.
- Type:
str
- provider
Selected provider name.
- Type:
str
- rationale
Human-readable rationale for the choice.
- Type:
str
- safety_constraints
Safety bounds applied to the selection.
- Type:
design_research_agents._model_selection._types.ModelSafetyConstraints
- policy_id
Policy identifier for reproducibility.
- Type:
str
- catalog_signature
Catalog signature used for the decision.
- Type:
str
- catalog_signature
Catalog signature/version used during selection.
- model_id
Selected model identifier.
- policy_id
Policy identifier used to produce this decision.
- provider
Provider key for the selected model.
- rationale
Human-readable explanation of the selection decision.
- safety_constraints
Safety/cost/latency constraints attached to the decision.
- class design_research_agents._model_selection._types.ModelSelectionIntent(*, task, priority='balanced')[source]
Intent descriptor used by the model selection policy.
- priority
Priority tradeoff between quality and speed.
- task
Task description used to classify selection intent.
- class design_research_agents._model_selection._types.ModelSelectionPolicyConfig(*, policy_id='default', prefer_local=True, ram_reserve_gb=2.0, vram_reserve_gb=0.5, max_load_ratio=0.85, remote_cost_floor_usd=0.02, default_max_latency_ms=None)[source]
Configuration controlling model selection behavior.
- policy_id
Identifier used for traceability.
- Type:
str
- prefer_local
Whether to prefer local models by default.
- Type:
bool
- ram_reserve_gb
Reserved system RAM in GiB.
- Type:
float
- vram_reserve_gb
Reserved GPU VRAM in GiB.
- Type:
float
- max_load_ratio
Load ratio threshold to prefer remote.
- Type:
float
- remote_cost_floor_usd
Cost below which remote is avoided.
- Type:
float
- default_max_latency_ms
Default latency cap when none is provided.
- Type:
int | None
- default_max_latency_ms
Default latency bound applied when callers provide none.
- max_load_ratio
System load threshold above which remote models are preferred.
- policy_id
Policy identifier used for traceability.
- prefer_local
Whether local models are preferred by default.
- ram_reserve_gb
Reserved system RAM (GiB) not available to model workloads.
- remote_cost_floor_usd
Remote cost floor below which remote options are deprioritized.
- vram_reserve_gb
Reserved GPU VRAM (GiB) not available to model workloads.
- class design_research_agents._model_selection._types.ModelSpec(*, model_id, provider, family, size_b, format, quantization, memory_hint, latency_hint, cost_hint, quality_tier, speed_tier)[source]
Catalog entry describing one model option.
- model_id
Unique model identifier used by backends.
- Type:
str
- provider
Backend or provider name.
- Type:
str
- family
Model family grouping label.
- Type:
str
- size_b
Approximate parameter count in billions.
- Type:
float | None
- format
Storage or API format identifier.
- Type:
str | None
- quantization
Quantization name when applicable.
- Type:
str | None
- memory_hint
Optional memory requirement hints.
- Type:
design_research_agents._model_selection._types.ModelMemoryHint | None
- latency_hint
Optional latency hints.
- Type:
design_research_agents._model_selection._types.ModelLatencyHint | None
- cost_hint
Optional cost hints.
- Type:
design_research_agents._model_selection._types.ModelCostHint | None
- quality_tier
Relative quality score (higher is better).
- Type:
int | None
- speed_tier
Relative speed score (higher is faster).
- Type:
int | None
- cost_hint
Optional cost profile for this model.
- family
Model family grouping (for reporting and routing heuristics).
- format
Model format identifier (for example GGUF or API-native).
- property is_local
Return True when the model runs locally.
- Returns:
Truewhen the provider is a local backend.
- latency_hint
Optional latency profile for this model.
- memory_hint
Optional memory requirements for this model.
- model_id
Provider-specific model identifier.
- provider
Provider/backend key used to execute the model.
- quality_tier
Relative quality ranking used by policy scoring.
- quantization
Quantization descriptor when applicable.
- size_b
Approximate parameter count in billions.
- speed_tier
Relative speed ranking used by policy scoring.