MLXLocalLLMClient

MLXLocalLLMClient runs local inference on Apple MLX.

Default behavior

  • Default model id and model name: mlx-community/Qwen2.5-1.5B-Instruct-4bit

  • Local execution optimized for Apple silicon

Constructor-first usage

from design_research_agents import MLXLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

client = MLXLocalLLMClient()
response = client.generate(
    LLMRequest(
        messages=(LLMMessage(role="user", content="Produce three concise insights."),),
        model=client.default_model(),
    )
)

Dependencies and environment

  • Install MLX backend extras: pip install -e \".[mlx]\"

  • Apple silicon environment with MLX stack available

Model notes for local runs

  • Prefer quantized MLX-ready instruct checkpoints for better on-device throughput.

  • Treat model size as a latency/quality dial; validate with representative prompts before scaling up.

  • Pair with Model Selection when you need hardware-aware fallback behavior.

Examples

  • examples/clients/mlx_local_client.py

Attribution