TransformersLocalLLMClient
TransformersLocalLLMClient runs inference locally via the transformers
stack.
Default behavior
Default model id and model name:
distilgpt2Default device policy:
autoLocal in-process execution
Constructor-first usage
from design_research_agents import TransformersLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest
client = TransformersLocalLLMClient(model_id="distilgpt2")
response = client.generate(
LLMRequest(
messages=(LLMMessage(role="user", content="Summarize this transcript section."),),
model=client.default_model(),
)
)
Dependencies and environment
Install transformers backend extras:
pip install -e \".[transformers]\"Sufficient local CPU/GPU memory for selected model
Model notes for local runs
Start with smaller instruct checkpoints for fast iteration and lower memory pressure.
Move to larger checkpoints only when quality gains justify latency/footprint.
Keep
default_modelaligned with the checkpoint your runtime can sustain for repeated workflow runs.
Examples
examples/clients/transformers_local_client.py
Attribution
Homepage: Hugging Face