Unified Table Schema#

Purpose#

The unified table schema is the canonical input contract across all analysis families in this package. It enables repeatable pipelines while still allowing loose real-world data.

If your input originated in design-research-experiments, see Experiments-To-Analysis Handoff for the recommended events.csv validation and join workflow.

Column Expectations#

Required:

  • timestamp

Strongly recommended:

  • record_id

  • text

  • session_id

  • actor_id

  • event_type

Optional:

  • meta_json

Loose Schema Strategy#

Missing values for actor_id and event_type can be derived with deterministic mapper functions before running sequence analyses.

In the experiments export handoff, record_id may also be derived when the upstream artifact keeps stable event rows but does not emit explicit record identifiers.

Key API surfaces:

Example#

from design_research_analysis import (
    derive_columns,
    validate_unified_table,
)

rows = [
    {"timestamp": "2026-01-01T10:00:00Z", "text": "hello", "speaker": "alice"},
    {"timestamp": "2026-01-01T10:00:01Z", "text": "world", "speaker": "bob"},
]

rows = derive_columns(
    rows,
    actor_mapper=lambda row: row["speaker"],
    event_mapper=lambda _row: "utterance",
)
report = validate_unified_table(rows)
assert report.is_valid