MLXLocalLLMClient
MLXLocalLLMClient runs local inference on Apple MLX.
Default behavior
Default model id and model name:
mlx-community/Qwen2.5-1.5B-Instruct-4bitLocal execution optimized for Apple silicon
Constructor-first usage
from design_research_agents import MLXLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest
client = MLXLocalLLMClient()
response = client.generate(
LLMRequest(
messages=(LLMMessage(role="user", content="Produce three concise insights."),),
model=client.default_model(),
)
)
Dependencies and environment
Install MLX backend extras:
pip install -e \".[mlx]\"Apple silicon environment with MLX stack available
Model notes for local runs
Prefer quantized MLX-ready instruct checkpoints for better on-device throughput.
Treat model size as a latency/quality dial; validate with representative prompts before scaling up.
Pair with Model Selection when you need hardware-aware fallback behavior.
Examples
examples/clients/mlx_local_client.py
Attribution
Docs: MLX docs
Homepage: MLX GitHub