TransformersLocalLLMClient#

TransformersLocalLLMClient runs inference locally via the transformers stack.

Default behavior#

Default model id and model name: distilgpt2
Default device policy: auto
Local in-process execution

Constructor-first usage#

from design_research_agents import TransformersLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

client = TransformersLocalLLMClient(model_id="distilgpt2")
response = client.generate(
    LLMRequest(
        messages=(LLMMessage(role="user", content="Summarize this transcript section."),),
        model=client.default_model(),
    )
)

Dependencies and environment#

Install transformers backend extras: pip install -e \".[transformers]\"
Sufficient local CPU/GPU memory for selected model

Model notes for local runs#

Start with smaller instruct checkpoints for fast iteration and lower memory pressure.
Move to larger checkpoints only when quality gains justify latency/footprint.
Keep default_model aligned with the checkpoint your runtime can sustain for repeated workflow runs.

Examples#

examples/clients/transformers_local_client.py

Attribution#

Docs: Hugging Face Transformers docs
Homepage: Hugging Face