TransformersLocalLLMClient

TransformersLocalLLMClient runs inference locally via the transformers stack.

Default behavior

  • Default model id and model name: distilgpt2

  • Default device policy: auto

  • Local in-process execution

Constructor-first usage

from design_research_agents import TransformersLocalLLMClient
from design_research_agents.llm import LLMMessage, LLMRequest

client = TransformersLocalLLMClient(model_id="distilgpt2")
response = client.generate(
    LLMRequest(
        messages=(LLMMessage(role="user", content="Summarize this transcript section."),),
        model=client.default_model(),
    )
)

Dependencies and environment

  • Install transformers backend extras: pip install -e \".[transformers]\"

  • Sufficient local CPU/GPU memory for selected model

Model notes for local runs

  • Start with smaller instruct checkpoints for fast iteration and lower memory pressure.

  • Move to larger checkpoints only when quality gains justify latency/footprint.

  • Keep default_model aligned with the checkpoint your runtime can sustain for repeated workflow runs.

Examples

  • examples/clients/transformers_local_client.py

Attribution