TransformersLocalLLMClient ========================== ``TransformersLocalLLMClient`` runs inference locally via the transformers stack. Default behavior ---------------- - Default model id and model name: ``distilgpt2`` - Default device policy: ``auto`` - Local in-process execution Constructor-first usage ----------------------- .. code-block:: python from design_research_agents import TransformersLocalLLMClient from design_research_agents.llm import LLMMessage, LLMRequest client = TransformersLocalLLMClient(model_id="distilgpt2") response = client.generate( LLMRequest( messages=(LLMMessage(role="user", content="Summarize this transcript section."),), model=client.default_model(), ) ) Dependencies and environment ---------------------------- - Install transformers backend extras: ``pip install -e \".[transformers]\"`` - Sufficient local CPU/GPU memory for selected model Model notes for local runs -------------------------- - Start with smaller instruct checkpoints for fast iteration and lower memory pressure. - Move to larger checkpoints only when quality gains justify latency/footprint. - Keep ``default_model`` aligned with the checkpoint your runtime can sustain for repeated workflow runs. Examples -------- - ``examples/clients/transformers_local_client.py`` Attribution ----------- - Docs: `Hugging Face Transformers docs `_ - Homepage: `Hugging Face `_