MLXLocalLLMClient#

MLXLocalLLMClient runs local inference on Apple MLX.

Default behavior#

Default model id and model name: mlx-community/Qwen2.5-1.5B-Instruct-4bit
Local execution optimized for Apple silicon

Constructor-first usage#

from design_research_agents import MLXLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

client = MLXLocalLLMClient()
response = client.generate(
    LLMRequest(
        messages=(LLMMessage(role="user", content="Produce three concise insights."),),
        model=client.default_model(),
    )
)

Dependencies and environment#

Install MLX backend extras: pip install -e \".[mlx]\"
Apple silicon environment with MLX stack available

Model notes for local runs#

Prefer quantized MLX-ready instruct checkpoints for better on-device throughput.
Treat model size as a latency/quality dial; validate with representative prompts before scaling up.
Pair with Model Selection when you need hardware-aware fallback behavior.

Examples#

examples/clients/mlx_local_client.py

Attribution#

Docs: MLX docs
Homepage: MLX GitHub