API#

This page documents the supported top-level public API from design_research_analysis.__all__.

Top-level groups:

  • Package metadata: __version__

  • Comparison: ComparisonResult

  • Module facades: dataset, embedding_maps, integration, language, runtime, sequence, stats, visualization

  • Unified table contracts: UnifiedTableConfig, UnifiedTableValidationReport, coerce_unified_table, derive_columns, validate_unified_table

  • Sequence: MarkovChainResult, DiscreteHMMResult, GaussianHMMResult, DecodeResult, fit_markov_chain_from_table, fit_discrete_hmm_from_table, fit_text_gaussian_hmm_from_table, decode_hmm, plot_transition_matrix, plot_state_graph

  • Visualization: plot_design_process_timeline, plot_idea_trajectory, plot_convergence_curve

  • Language: compute_language_convergence, compute_semantic_distance_trajectory, fit_topic_model, score_sentiment

  • Embedding maps: EmbeddingResult, EmbeddingMapResult, embed_records, build_embedding_map, cluster_embedding_map, compare_embedding_maps, compute_design_space_coverage, compute_idea_space_trajectory, compute_divergence_convergence, plot_embedding_map, plot_embedding_map_grid

  • Statistics: compare_groups, fit_regression, fit_mixed_effects, permutation_test, build_condition_metric_table, compare_condition_pairs, bootstrap_ci, rank_tests_one_stop, estimate_sample_size, power_curve, minimum_detectable_effect

  • Dataset + runtime: profile_dataframe, validate_dataframe, generate_codebook, capture_run_context, attach_provenance, is_notebook, is_google_colab, write_run_manifest

Typed analysis result objects also support standardized comparison helpers: difference(other) and effect(other), plus operator shorthands left - right and left / right.

Curated public exports for design-research-analysis.

class design_research_analysis.ComparisonResult(operation, left_type, right_type, metric, estimate, statistic=None, p_value=None, effect_size=None, details=<factory>, interpretation='')[source]#

Structured output for algebraic result-object comparisons.

details#
effect_size#
estimate#
interpretation#
left_type#
metric#
operation#
p_value#
right_type#
statistic#
to_dict()[source]#

Convert the comparison output to a JSON-serializable dictionary.

class design_research_analysis.DecodeResult(algorithm, log_probability, states, lengths, backend)[source]#

Serializable decoded-state output from an HMM.

algorithm#
backend#
lengths#
log_probability#
states#
to_dict()[source]#

Convert the decode output to a JSON-serializable dictionary.

class design_research_analysis.DiscreteHMMResult(model, backend='hmmlearn', n_states=0, seed=0, lengths=None, startprob=<factory>, transmat=<factory>, emissionprob=<factory>, vocab=<factory>, token_to_id=<factory>, train_log_likelihood=0.0, config=<factory>)[source]#

Serializable result container for a discrete-emission HMM.

backend#
config#
emissionprob#
lengths#
model#
n_states#
seed#
startprob#
to_dict()[source]#

Convert the result to a JSON-serializable dictionary.

token_to_id#
train_log_likelihood#
transmat#
vocab#
class design_research_analysis.EmbeddingMapResult(coordinates, record_ids, method, config=<factory>, explained_variance_ratio=None)[source]#

Lower-dimensional coordinates plus method metadata for one embedding map.

config#
coordinates#
explained_variance_ratio#
method#
property projection#

Compatibility alias for the legacy projection attribute name.

record_ids#
to_dict()[source]#

Convert result metadata to JSON-serializable format.

class design_research_analysis.EmbeddingResult(embeddings, record_ids, texts, config=<factory>)[source]#

Embedding output container.

config#
embeddings#
record_ids#
texts#
to_dict()[source]#

Convert result metadata to JSON-serializable format.

class design_research_analysis.GaussianHMMResult(model, backend='hmmlearn', n_states=0, covariance_type='diag', seed=0, lengths=None, startprob=<factory>, transmat=<factory>, means=<factory>, covars=<factory>, train_log_likelihood=0.0, config=<factory>)[source]#

Serializable result container for a Gaussian HMM.

backend#
config#
covariance_type#
covars#
lengths#
means#
model#
n_states#
seed#
startprob#
to_dict()[source]#

Convert the result to a JSON-serializable dictionary.

train_log_likelihood#
transmat#
class design_research_analysis.MarkovChainResult(order, states, transition_matrix, startprob, smoothing, n_sequences, n_observations, config=<factory>, _transition_counts=<factory>, _start_counts=<factory>)[source]#

Serializable result container for an order-k Markov chain.

config#
n_observations#
n_sequences#
order#
smoothing#
startprob#
states#
to_dict()[source]#

Convert the result to a JSON-serializable dictionary.

transition_matrix#
class design_research_analysis.UnifiedTableConfig(required_columns=('timestamp',), recommended_columns=('record_id', 'text', 'session_id', 'actor_id', 'event_type'), optional_columns=('meta_json',), timestamp_column='timestamp', parse_timestamps=True, sort_by_timestamp=True, allow_extra_columns=True)[source]#

Configuration for coercing and validating a unified table.

Parameters:
  • required_columns – Columns that must be present.

  • recommended_columns – Columns that are strongly encouraged.

  • optional_columns – Common optional fields documented by the package.

  • timestamp_column – Name of the canonical timestamp column.

  • parse_timestamps – Whether to parse timestamp values into datetime objects.

  • sort_by_timestamp – Whether to return rows sorted by timestamp.

  • allow_extra_columns – Whether columns outside known sets are allowed.

allow_extra_columns#
known_columns()[source]#

Return the known column names implied by this configuration.

optional_columns#
parse_timestamps#
recommended_columns#
required_columns#
sort_by_timestamp#
timestamp_column#
class design_research_analysis.UnifiedTableValidationReport(is_valid, n_rows, columns, missing_required, missing_recommended, errors, warnings)[source]#

Validation report for a unified table.

Parameters:
  • is_valid – Whether validation passed.

  • n_rows – Number of rows observed.

  • columns – Ordered columns found in the table.

  • missing_required – Required columns missing from the table.

  • missing_recommended – Recommended columns missing from the table.

  • errors – Validation errors.

  • warnings – Validation warnings.

columns#
errors#
is_valid#
missing_required#
n_rows#
to_dict()[source]#

Return a JSON-serializable representation of the report.

warnings#
design_research_analysis.attach_provenance(result, context)[source]#

Return a copy of result enriched with a provenance field.

design_research_analysis.bootstrap_ci(x, *, stat='mean', y=None, n_resamples=10000, ci=0.95, method='percentile', seed=0)[source]#

Estimate bootstrap confidence intervals for one- and two-sample statistics.

design_research_analysis.build_condition_metric_table(runs, *, metric, condition_column='condition', evaluations=None, conditions=None, run_id_column='run_id', condition_id_column='condition_id', evaluation_metric_column='metric_name', evaluation_value_column='metric_value')[source]#

Build a normalized run-level condition/metric table from experiment exports.

design_research_analysis.build_embedding_map(embeddings, *, method='pca', n_components=2, record_ids=None, random_state=0, perplexity=30.0, n_neighbors=15, min_dist=0.1, pacmap_mn_ratio=0.5, pacmap_fp_ratio=2.0, trimap_n_inliers=12, trimap_n_outliers=4, trimap_n_random=3)[source]#

Map higher-dimensional vectors into a lower-dimensional embedding space.

design_research_analysis.capture_run_context(*, seed=None, input_paths=None, extra=None)[source]#

Capture deterministic provenance metadata for an analysis run.

design_research_analysis.cluster_embedding_map(embedding_map, *, method='kmeans', n_clusters=3, random_state=0, max_iter=100)[source]#

Cluster embedding-map coordinates.

design_research_analysis.coerce_unified_table(data, *, config=None)[source]#

Coerce input data to normalized row-oriented unified table records.

Parameters:
  • data – Row-oriented sequence of mappings, column-oriented mapping, or a .csv, .tsv, or .json path.

  • config – Optional table configuration.

Returns:

Normalized table rows.

design_research_analysis.compare_condition_pairs(data, *, condition_column='condition', metric_column='value', metric_name=None, condition_pairs=None, alternative='two-sided', alpha=0.05, exact_threshold=250000, n_permutations=20000, seed=0)[source]#

Compare all or selected condition pairs on one numeric metric.

design_research_analysis.compare_embedding_maps(embeddings, *, methods, n_components=2, record_ids=None, random_state=0, perplexity=30.0, n_neighbors=15, min_dist=0.1, pacmap_mn_ratio=0.5, pacmap_fp_ratio=2.0, trimap_n_inliers=12, trimap_n_outliers=4, trimap_n_random=3)[source]#

Build multiple embedding maps with aligned record IDs.

design_research_analysis.compare_groups(values=None, groups=None, *, data=None, value_column='value', group_column='group', method='auto')[source]#

Compare outcomes across groups using t-test, ANOVA, or Kruskal-Wallis.

design_research_analysis.compute_design_space_coverage(embeddings, *, method='convex_hull')[source]#

Compute geometry-aware coverage summaries for embedding or map spaces.

design_research_analysis.compute_divergence_convergence(trajectory, *, window=3, tolerance=1e-06)[source]#

Summarize divergence and convergence phases from trajectory output.

design_research_analysis.compute_idea_space_trajectory(embeddings, *, timestamps=None, groups=None)[source]#

Compute grouped trajectories through an embedding or map space.

design_research_analysis.compute_language_convergence(data, *, text_column='text', group_column='session_id', window_size=3, slope_tolerance=1e-06, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None)[source]#

Compute convergence/divergence of language trajectories by group.

Negative slope indicates convergence toward the final language centroid. Positive slope indicates divergence.

design_research_analysis.compute_semantic_distance_trajectory(data, *, text_column='text', group_column='session_id', window_size=3, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None)[source]#

Compute semantic distance trajectories to a group’s final language state.

Parameters:
  • data – Unified table rows or a simple text list.

  • text_column – Text column for table input.

  • group_column – Grouping column for trajectory computation.

  • window_size – Sliding window size for centroid estimation.

  • model_name – Sentence transformer model name when embedder is omitted.

  • normalize – Whether to normalize embeddings when using built-in embedding.

  • batch_size – Embedding batch size.

  • device – Embedding device.

  • embedder – Optional custom embedding function.

  • text_mapper – Optional mapper used to derive missing text values.

Returns:

Mapping of group -> [distance_t0, distance_t1, ...].

design_research_analysis.decode_hmm(model_result, observations, *, algorithm='viterbi', lengths=None)[source]#

Decode the most likely hidden-state sequence for observations.

Parameters:
  • model_result – Fitted Gaussian or discrete HMM result object.

  • observations – Observation matrix (Gaussian) or token sequences (discrete).

  • algorithm – Decoding algorithm, viterbi or map.

  • lengths – Optional sequence lengths for batched observations.

Returns:

Decoded state sequence and log probability.

design_research_analysis.derive_columns(table, *, actor_mapper=None, event_mapper=None, session_mapper=None, text_mapper=None, record_id_mapper=None)[source]#

Derive canonical columns from deterministic user-provided mappers.

Existing non-blank values are preserved. Mappers are only applied to blank or missing values. record_id defaults to the row index if not provided.

Parameters:
  • table – Unified table rows.

  • actor_mapper – Optional mapper for actor_id.

  • event_mapper – Optional mapper for event_type.

  • session_mapper – Optional mapper for session_id.

  • text_mapper – Optional mapper for text.

  • record_id_mapper – Optional mapper for record_id.

Returns:

New rows with derived columns.

design_research_analysis.embed_records(data, *, text_column='text', record_id_column='record_id', model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None, record_id_mapper=None)[source]#

Embed record text from a unified table.

design_research_analysis.estimate_sample_size(effect_size, *, test, alpha=0.05, power=0.8, ratio=1.0, alternative='two-sided')[source]#

Estimate total sample size for supported t-test families.

design_research_analysis.fit_discrete_hmm_from_table(table, *, n_states=3, n_iter=100, seed=0, backend='hmmlearn', event_column='event_type', session_column='session_id', actor_column='actor_id', include_actor_in_token=False, actor_mapper=None, event_mapper=None, session_mapper=None, table_config=None)[source]#

Fit a discrete HMM from unified-table event records.

design_research_analysis.fit_markov_chain_from_table(table, *, order=1, smoothing=1.0, event_column='event_type', session_column='session_id', actor_column='actor_id', include_actor_in_token=False, actor_mapper=None, event_mapper=None, session_mapper=None, table_config=None)[source]#

Fit a Markov chain from unified-table event records.

design_research_analysis.fit_mixed_effects(data, *, formula, group_column, backend='statsmodels', reml=True, max_iter=200)[source]#

Fit a mixed-effects model using statsmodels.

design_research_analysis.fit_regression(X, y, *, feature_names=None, add_intercept=True)[source]#

Fit an ordinary least squares regression model with numpy.

design_research_analysis.fit_text_gaussian_hmm_from_table(table, *, text_column='text', session_column='session_id', n_states=3, embedder=None, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', covariance_type='diag', n_iter=100, seed=0, backend='hmmlearn', session_mapper=None, text_mapper=None, table_config=None)[source]#

Embed text from unified-table rows and fit a Gaussian HMM.

design_research_analysis.fit_topic_model(data, *, n_topics=5, max_features=5000, random_state=0, text_column='text', top_k_terms=10)[source]#

Fit an LDA topic model and return topic summaries.

Parameters:
  • data – Unified table rows or a list of texts.

  • n_topics – Number of latent topics.

  • max_features – Maximum vocabulary size.

  • random_state – Random seed.

  • text_column – Text column for table input.

  • top_k_terms – Number of representative terms per topic.

Returns:

JSON-serializable topic summary.

design_research_analysis.generate_codebook(df, *, descriptions=None)[source]#

Generate a compact codebook from a DataFrame or supported dataset file.

design_research_analysis.is_google_colab()[source]#

Return True when running inside Google Colab.

design_research_analysis.is_notebook()[source]#

Return True when running in a notebook-style interactive shell.

design_research_analysis.minimum_detectable_effect(n, *, test, alpha=0.05, power=0.8, ratio=1.0, alternative='two-sided')[source]#

Solve for the smallest detectable standardized effect size.

design_research_analysis.permutation_test(x, y, *, stat='diff_means', n_permutations=20000, alternative='two-sided', seed=0)[source]#

Run a two-sample permutation test.

design_research_analysis.plot_convergence_curve(metric_series, *, ax=None, title='Convergence Curve', ylabel='Metric Value')[source]#

Plot one or more stepwise convergence or divergence curves.

Parameters:
  • metric_series – Either a single numeric series or group -> series.

  • ax – Optional Matplotlib axis.

  • title – Plot title.

  • ylabel – Y-axis label.

Returns:

(figure, axis) tuple.

design_research_analysis.plot_design_process_timeline(events, *, session_id=None, session_column='session_id', actor_column='actor_id', event_column='event_type', timestamp_column='timestamp', ax=None, title='Design Process Timeline')[source]#

Plot one session as an actor-by-time event timeline.

Parameters:
  • events – Unified-table rows to visualize.

  • session_id – Explicit session to render when multiple sessions are present.

  • session_column – Session identifier column.

  • actor_column – Actor identifier column.

  • event_column – Event label column.

  • timestamp_column – Timestamp column used for ordering.

  • ax – Optional Matplotlib axis.

  • title – Plot title.

Returns:

(figure, axis) tuple.

design_research_analysis.plot_embedding_map(embedding_map, data, *, record_id_column='record_id', trace_column=None, order_column=None, value_column=None, ax=None, cmap='viridis', title=None)[source]#

Plot one embedding map with optional trace overlays and scalar coloring.

design_research_analysis.plot_embedding_map_grid(embedding_maps, data, *, record_id_column='record_id', trace_column=None, order_column=None, value_column=None, cmap='viridis', title='Embedding Map Comparison')[source]#

Plot multiple embedding maps with shared trace overlays and color scale.

design_research_analysis.plot_idea_trajectory(projection, *, groups=None, timestamps=None, ax=None, title='Idea Trajectory')[source]#

Plot ordered 2D trajectories through idea space.

Parameters:
  • projection – Two-dimensional point matrix or projection result.

  • groups – Optional group labels that split the trajectory into paths.

  • timestamps – Optional timestamps used to order points within each group.

  • ax – Optional Matplotlib axis.

  • title – Plot title.

Returns:

(figure, axis) tuple.

design_research_analysis.plot_state_graph(transition, *, state_labels=None, threshold=0.0, ax=None, seed=0, title='State Transition Graph')[source]#

Render a directed state-transition graph.

Parameters:
  • transition – Result object or raw square matrix.

  • state_labels – Optional display labels for states.

  • threshold – Draw edges with probability strictly above this value.

  • ax – Optional Matplotlib axis.

  • seed – Random seed passed to layout generation.

  • title – Plot title.

Returns:

(figure, axis) tuple.

design_research_analysis.plot_transition_matrix(transition, *, state_labels=None, ax=None, cmap='Blues', annotate=True, fmt='.2f', title='Transition Matrix')[source]#

Plot a transition matrix as a heatmap.

Parameters:
  • transition – Result object or raw square matrix.

  • state_labels – Optional display labels for states.

  • ax – Optional Matplotlib axis.

  • cmap – Heatmap colormap.

  • annotate – Whether to annotate each cell with probabilities.

  • fmt – Format string for annotations.

  • title – Plot title.

Returns:

(figure, axis) tuple.

design_research_analysis.power_curve(effect_sizes, *, n, test, alpha=0.05, ratio=1.0, alternative='two-sided')[source]#

Compute achieved power over a sequence of effect sizes.

design_research_analysis.profile_dataframe(df, *, max_categorical_levels=20)[source]#

Profile a DataFrame or supported dataset file without mutating inputs.

design_research_analysis.rank_tests_one_stop(x, y=None, groups=None, *, paired=None, kind=None, alternative='two-sided', alpha=0.05)[source]#

Dispatch to a nonparametric rank test with consistent structured output.

design_research_analysis.score_sentiment(data, *, text_column='text')[source]#

Score sentiment with a deterministic lexicon-based approach.

This lightweight scorer is intentionally simple and offline-friendly.

design_research_analysis.validate_dataframe(df, schema)[source]#

Validate a DataFrame or supported dataset file against a declarative schema.

design_research_analysis.validate_unified_table(table, *, config=None)[source]#

Validate a unified table against the configured contract.

Parameters:
  • table – Coerced unified table rows or a supported path-like input.

  • config – Optional table configuration.

Returns:

Validation report with errors and warnings.

design_research_analysis.write_run_manifest(context, outpath)[source]#

Write a run-context dictionary to a JSON manifest file.