Condition Pair Significance#

Source: examples/condition_pair_significance.py

Introduction#

Turn canonical runs.csv / conditions.csv / evaluations.csv style rows into a joined condition-metric table, then compute pairwise permutation tests and effect sizes without hand-rolled analysis glue.

Technical Implementation#

  1. Define in-memory canonical export rows for runs, conditions, and evaluations.

  2. Build a normalized run-level table for market_share_proxy by joining condition labels onto evaluation metrics.

  3. Compare ordered condition pairs and print a concise brief plus one significance row shaped for design_research_experiments.render_significance_brief.

 1from __future__ import annotations
 2
 3import design_research_analysis as dran
 4
 5
 6def main() -> None:
 7    """Run a compact condition-comparison workflow over canonical export rows."""
 8    runs = [
 9        {"run_id": "run-1", "condition_id": "cond-random"},
10        {"run_id": "run-2", "condition_id": "cond-random"},
11        {"run_id": "run-3", "condition_id": "cond-random"},
12        {"run_id": "run-4", "condition_id": "cond-neutral"},
13        {"run_id": "run-5", "condition_id": "cond-neutral"},
14        {"run_id": "run-6", "condition_id": "cond-neutral"},
15        {"run_id": "run-7", "condition_id": "cond-profit"},
16        {"run_id": "run-8", "condition_id": "cond-profit"},
17        {"run_id": "run-9", "condition_id": "cond-profit"},
18    ]
19
20    conditions = [
21        {"condition_id": "cond-random", "selection_strategy": "random_selection"},
22        {"condition_id": "cond-neutral", "selection_strategy": "neutral_prompt"},
23        {"condition_id": "cond-profit", "selection_strategy": "profit_focus_prompt"},
24    ]
25
26    evaluations = [
27        {"run_id": "run-1", "metric_name": "market_share_proxy", "metric_value": 0.40},
28        {"run_id": "run-2", "metric_name": "market_share_proxy", "metric_value": 0.43},
29        {"run_id": "run-3", "metric_name": "market_share_proxy", "metric_value": 0.41},
30        {"run_id": "run-4", "metric_name": "market_share_proxy", "metric_value": 0.57},
31        {"run_id": "run-5", "metric_name": "market_share_proxy", "metric_value": 0.59},
32        {"run_id": "run-6", "metric_name": "market_share_proxy", "metric_value": 0.60},
33        {"run_id": "run-7", "metric_name": "market_share_proxy", "metric_value": 0.69},
34        {"run_id": "run-8", "metric_name": "market_share_proxy", "metric_value": 0.72},
35        {"run_id": "run-9", "metric_name": "market_share_proxy", "metric_value": 0.71},
36    ]
37
38    joined = dran.build_condition_metric_table(
39        runs,
40        metric="market_share_proxy",
41        condition_column="selection_strategy",
42        conditions=conditions,
43        evaluations=evaluations,
44    )
45    report = dran.compare_condition_pairs(
46        joined,
47        condition_pairs=[
48            ("neutral_prompt", "random_selection"),
49            ("profit_focus_prompt", "neutral_prompt"),
50            ("profit_focus_prompt", "random_selection"),
51        ],
52        alternative="greater",
53        seed=17,
54    )
55
56    print(f"Joined rows: {len(joined)}")
57    print(report.render_brief())
58    print(report.to_significance_rows()[0])
59
60
61if __name__ == "__main__":
62    main()

Expected Results#

Run Command

PYTHONPATH=src python examples/condition_pair_significance.py

Prints the normalized joined-row count, a markdown-ready condition comparison brief, and the first structured significance row.

References#

  • docs/experiments_handoff.rst

  • docs/analysis_recipes.rst