LLM Clients
The framework exposes constructor-first client classes that implement the
LLMClient contract. Choose a client based on deployment constraints first,
then tune model selection.
Every public client also exposes a shared introspection surface for debugging
and diagnostics: default_model(), capabilities(),
config_snapshot(), server_snapshot(), and describe().
All public built-in clients also implement close() and support with.
Examples prefer the context-manager form, while close() remains available
for explicit lifecycle control.
For install profiles and platform constraints for backend extras, see Dependencies and Extras.
Comparison matrix
Client |
Execution location |
Default model behavior |
Setup burden |
Privacy / cost / latency profile |
|---|---|---|---|---|
|
Local managed |
Defaults to |
Medium (local runtime + model download) |
Strong privacy, lowest marginal cost, variable latency by hardware |
|
Local in-process transformers runtime |
Defaults to |
Medium-high (framework + model weights) |
Strong privacy, lowest marginal cost, latency depends on device |
|
Local Apple MLX runtime |
Defaults to |
Medium (Apple silicon + MLX stack) |
Strong privacy, lowest marginal cost, strong local throughput on Apple hardware |
|
Local/self-hosted vLLM OpenAI-compatible server |
Defaults to |
Medium-high (server runtime + model provisioning) |
Strong privacy, strong throughput, ops overhead for service management |
|
Local/self-hosted Ollama daemon |
Defaults to |
Low-medium (Ollama runtime + model pull) |
Strong privacy, low setup burden, laptop-friendly local serving |
|
Local/self-hosted SGLang OpenAI-compatible server |
Defaults to |
Medium-high (server runtime + model provisioning) |
Strong privacy, high throughput, ops overhead for service management |
|
Remote OpenAI API |
Defaults to |
Low (API key) |
Lowest setup effort, network/data egress tradeoff, usage-based cost |
|
Remote Azure OpenAI API |
Defaults to |
Low-medium (API key + endpoint + API version) |
Azure-native hosted option, network/data egress tradeoff, Azure-managed billing |
|
Remote Anthropic API |
Defaults to |
Low (API key) |
Low setup effort, network/data egress tradeoff, usage-based cost |
|
Remote Gemini API |
Defaults to |
Low (API key) |
Low setup effort, network/data egress tradeoff, usage-based cost |
|
Remote Groq API |
Defaults to |
Low (API key) |
Low setup effort, network/data egress tradeoff, usage-based cost |
|
Remote or local OpenAI-compatible endpoint |
Defaults to |
Low-medium (compatible server + endpoint config) |
Flexible privacy/cost posture based on endpoint hosting |
When to choose what
Need strict data-local execution: start with
LlamaCppServerLLMClient,TransformersLocalLLMClient,MLXLocalLLMClient,VLLMServerLLMClient,OllamaLLMClient, orSGLangServerLLMClient.Need fastest onboarding and hosted quality: use
OpenAIServiceLLMClient,AzureOpenAIServiceLLMClient,AnthropicServiceLLMClient,GeminiServiceLLMClient, orGroqServiceLLMClient.Need provider portability or self-hosted OpenAI-compatible infra: use
OpenAICompatibleHTTPLLMClient.Need policy-driven choice between local and remote options: use Model Selection.
See examples
examples/clients/README.md