TransformersLocalLLMClient#
TransformersLocalLLMClient runs inference locally via the transformers
stack.
Default behavior#
Default model id and model name:
distilgpt2Default device policy:
autoLocal in-process execution
Constructor-first usage#
from design_research_agents import TransformersLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest
client = TransformersLocalLLMClient(model_id="distilgpt2")
response = client.generate(
LLMRequest(
messages=(LLMMessage(role="user", content="Summarize this transcript section."),),
model=client.default_model(),
)
)
Dependencies and environment#
Install transformers backend extras:
pip install -e \".[transformers]\"Sufficient local CPU/GPU memory for selected model
Model notes for local runs#
Start with smaller instruct checkpoints for fast iteration and lower memory pressure.
Move to larger checkpoints only when quality gains justify latency/footprint.
Keep
default_modelaligned with the checkpoint your runtime can sustain for repeated workflow runs.
Examples#
examples/clients/transformers_local_client.py
Attribution#
Homepage: Hugging Face