API#
This page documents the supported top-level public API from
design_research_analysis.__all__.
Top-level groups:
Package metadata:
__version__Comparison:
ComparisonResultModule facades:
dataset,embedding_maps,integration,language,runtime,sequence,stats,visualizationUnified table contracts:
UnifiedTableConfig,UnifiedTableValidationReport,coerce_unified_table,derive_columns,validate_unified_tableSequence:
MarkovChainResult,DiscreteHMMResult,GaussianHMMResult,DecodeResult,fit_markov_chain_from_table,fit_discrete_hmm_from_table,fit_text_gaussian_hmm_from_table,decode_hmm,plot_transition_matrix,plot_state_graphVisualization:
plot_design_process_timeline,plot_idea_trajectory,plot_convergence_curveLanguage:
compute_language_convergence,compute_semantic_distance_trajectory,fit_topic_model,score_sentimentEmbedding maps:
EmbeddingResult,EmbeddingMapResult,embed_records,build_embedding_map,cluster_embedding_map,compare_embedding_maps,compute_design_space_coverage,compute_idea_space_trajectory,compute_divergence_convergence,plot_embedding_map,plot_embedding_map_gridStatistics:
compare_groups,fit_regression,fit_mixed_effects,permutation_test,build_condition_metric_table,compare_condition_pairs,bootstrap_ci,rank_tests_one_stop,estimate_sample_size,power_curve,minimum_detectable_effectDataset + runtime:
profile_dataframe,validate_dataframe,generate_codebook,capture_run_context,attach_provenance,is_notebook,is_google_colab,write_run_manifest
Typed analysis result objects also support standardized comparison helpers:
difference(other) and effect(other), plus operator shorthands
left - right and left / right.
Curated public exports for design-research-analysis.
- class design_research_analysis.ComparisonResult(operation, left_type, right_type, metric, estimate, statistic=None, p_value=None, effect_size=None, details=<factory>, interpretation='')[source]#
Structured output for algebraic result-object comparisons.
- details#
- effect_size#
- estimate#
- interpretation#
- left_type#
- metric#
- operation#
- p_value#
- right_type#
- statistic#
- class design_research_analysis.DecodeResult(algorithm, log_probability, states, lengths, backend)[source]#
Serializable decoded-state output from an HMM.
- algorithm#
- backend#
- lengths#
- log_probability#
- states#
- class design_research_analysis.DiscreteHMMResult(model, backend='hmmlearn', n_states=0, seed=0, lengths=None, startprob=<factory>, transmat=<factory>, emissionprob=<factory>, vocab=<factory>, token_to_id=<factory>, train_log_likelihood=0.0, config=<factory>)[source]#
Serializable result container for a discrete-emission HMM.
- backend#
- config#
- emissionprob#
- lengths#
- model#
- n_states#
- seed#
- startprob#
- token_to_id#
- train_log_likelihood#
- transmat#
- vocab#
- class design_research_analysis.EmbeddingMapResult(coordinates, record_ids, method, config=<factory>, explained_variance_ratio=None)[source]#
Lower-dimensional coordinates plus method metadata for one embedding map.
- config#
- coordinates#
- explained_variance_ratio#
- method#
- property projection#
Compatibility alias for the legacy projection attribute name.
- record_ids#
- class design_research_analysis.EmbeddingResult(embeddings, record_ids, texts, config=<factory>)[source]#
Embedding output container.
- config#
- embeddings#
- record_ids#
- texts#
- class design_research_analysis.GaussianHMMResult(model, backend='hmmlearn', n_states=0, covariance_type='diag', seed=0, lengths=None, startprob=<factory>, transmat=<factory>, means=<factory>, covars=<factory>, train_log_likelihood=0.0, config=<factory>)[source]#
Serializable result container for a Gaussian HMM.
- backend#
- config#
- covariance_type#
- covars#
- lengths#
- means#
- model#
- n_states#
- seed#
- startprob#
- train_log_likelihood#
- transmat#
- class design_research_analysis.MarkovChainResult(order, states, transition_matrix, startprob, smoothing, n_sequences, n_observations, config=<factory>, _transition_counts=<factory>, _start_counts=<factory>)[source]#
Serializable result container for an order-k Markov chain.
- config#
- n_observations#
- n_sequences#
- order#
- smoothing#
- startprob#
- states#
- transition_matrix#
- class design_research_analysis.UnifiedTableConfig(required_columns=('timestamp',), recommended_columns=('record_id', 'text', 'session_id', 'actor_id', 'event_type'), optional_columns=('meta_json',), timestamp_column='timestamp', parse_timestamps=True, sort_by_timestamp=True, allow_extra_columns=True)[source]#
Configuration for coercing and validating a unified table.
- Parameters:
required_columns – Columns that must be present.
recommended_columns – Columns that are strongly encouraged.
optional_columns – Common optional fields documented by the package.
timestamp_column – Name of the canonical timestamp column.
parse_timestamps – Whether to parse timestamp values into
datetimeobjects.sort_by_timestamp – Whether to return rows sorted by timestamp.
allow_extra_columns – Whether columns outside known sets are allowed.
- allow_extra_columns#
- optional_columns#
- parse_timestamps#
- recommended_columns#
- required_columns#
- sort_by_timestamp#
- timestamp_column#
- class design_research_analysis.UnifiedTableValidationReport(is_valid, n_rows, columns, missing_required, missing_recommended, errors, warnings)[source]#
Validation report for a unified table.
- Parameters:
is_valid – Whether validation passed.
n_rows – Number of rows observed.
columns – Ordered columns found in the table.
missing_required – Required columns missing from the table.
missing_recommended – Recommended columns missing from the table.
errors – Validation errors.
warnings – Validation warnings.
- columns#
- errors#
- is_valid#
- missing_recommended#
- missing_required#
- n_rows#
- warnings#
- design_research_analysis.attach_provenance(result, context)[source]#
Return a copy of
resultenriched with aprovenancefield.
- design_research_analysis.bootstrap_ci(x, *, stat='mean', y=None, n_resamples=10000, ci=0.95, method='percentile', seed=0)[source]#
Estimate bootstrap confidence intervals for one- and two-sample statistics.
- design_research_analysis.build_condition_metric_table(runs, *, metric, condition_column='condition', evaluations=None, conditions=None, run_id_column='run_id', condition_id_column='condition_id', evaluation_metric_column='metric_name', evaluation_value_column='metric_value')[source]#
Build a normalized run-level condition/metric table from experiment exports.
- design_research_analysis.build_embedding_map(embeddings, *, method='pca', n_components=2, record_ids=None, random_state=0, perplexity=30.0, n_neighbors=15, min_dist=0.1, pacmap_mn_ratio=0.5, pacmap_fp_ratio=2.0, trimap_n_inliers=12, trimap_n_outliers=4, trimap_n_random=3)[source]#
Map higher-dimensional vectors into a lower-dimensional embedding space.
- design_research_analysis.capture_run_context(*, seed=None, input_paths=None, extra=None)[source]#
Capture deterministic provenance metadata for an analysis run.
- design_research_analysis.cluster_embedding_map(embedding_map, *, method='kmeans', n_clusters=3, random_state=0, max_iter=100)[source]#
Cluster embedding-map coordinates.
- design_research_analysis.coerce_unified_table(data, *, config=None)[source]#
Coerce input data to normalized row-oriented unified table records.
- Parameters:
data – Row-oriented sequence of mappings, column-oriented mapping, or a
.csv,.tsv, or.jsonpath.config – Optional table configuration.
- Returns:
Normalized table rows.
- design_research_analysis.compare_condition_pairs(data, *, condition_column='condition', metric_column='value', metric_name=None, condition_pairs=None, alternative='two-sided', alpha=0.05, exact_threshold=250000, n_permutations=20000, seed=0)[source]#
Compare all or selected condition pairs on one numeric metric.
- design_research_analysis.compare_embedding_maps(embeddings, *, methods, n_components=2, record_ids=None, random_state=0, perplexity=30.0, n_neighbors=15, min_dist=0.1, pacmap_mn_ratio=0.5, pacmap_fp_ratio=2.0, trimap_n_inliers=12, trimap_n_outliers=4, trimap_n_random=3)[source]#
Build multiple embedding maps with aligned record IDs.
- design_research_analysis.compare_groups(values=None, groups=None, *, data=None, value_column='value', group_column='group', method='auto')[source]#
Compare outcomes across groups using t-test, ANOVA, or Kruskal-Wallis.
- design_research_analysis.compute_design_space_coverage(embeddings, *, method='convex_hull')[source]#
Compute geometry-aware coverage summaries for embedding or map spaces.
- design_research_analysis.compute_divergence_convergence(trajectory, *, window=3, tolerance=1e-06)[source]#
Summarize divergence and convergence phases from trajectory output.
- design_research_analysis.compute_idea_space_trajectory(embeddings, *, timestamps=None, groups=None)[source]#
Compute grouped trajectories through an embedding or map space.
- design_research_analysis.compute_language_convergence(data, *, text_column='text', group_column='session_id', window_size=3, slope_tolerance=1e-06, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None)[source]#
Compute convergence/divergence of language trajectories by group.
Negative slope indicates convergence toward the final language centroid. Positive slope indicates divergence.
- design_research_analysis.compute_semantic_distance_trajectory(data, *, text_column='text', group_column='session_id', window_size=3, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None)[source]#
Compute semantic distance trajectories to a group’s final language state.
- Parameters:
data – Unified table rows or a simple text list.
text_column – Text column for table input.
group_column – Grouping column for trajectory computation.
window_size – Sliding window size for centroid estimation.
model_name – Sentence transformer model name when
embedderis omitted.normalize – Whether to normalize embeddings when using built-in embedding.
batch_size – Embedding batch size.
device – Embedding device.
embedder – Optional custom embedding function.
text_mapper – Optional mapper used to derive missing text values.
- Returns:
Mapping of
group -> [distance_t0, distance_t1, ...].
- design_research_analysis.decode_hmm(model_result, observations, *, algorithm='viterbi', lengths=None)[source]#
Decode the most likely hidden-state sequence for observations.
- Parameters:
model_result – Fitted Gaussian or discrete HMM result object.
observations – Observation matrix (Gaussian) or token sequences (discrete).
algorithm – Decoding algorithm,
viterbiormap.lengths – Optional sequence lengths for batched observations.
- Returns:
Decoded state sequence and log probability.
- design_research_analysis.derive_columns(table, *, actor_mapper=None, event_mapper=None, session_mapper=None, text_mapper=None, record_id_mapper=None)[source]#
Derive canonical columns from deterministic user-provided mappers.
Existing non-blank values are preserved. Mappers are only applied to blank or missing values.
record_iddefaults to the row index if not provided.- Parameters:
table – Unified table rows.
actor_mapper – Optional mapper for
actor_id.event_mapper – Optional mapper for
event_type.session_mapper – Optional mapper for
session_id.text_mapper – Optional mapper for
text.record_id_mapper – Optional mapper for
record_id.
- Returns:
New rows with derived columns.
- design_research_analysis.embed_records(data, *, text_column='text', record_id_column='record_id', model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', embedder=None, text_mapper=None, record_id_mapper=None)[source]#
Embed record text from a unified table.
- design_research_analysis.estimate_sample_size(effect_size, *, test, alpha=0.05, power=0.8, ratio=1.0, alternative='two-sided')[source]#
Estimate total sample size for supported t-test families.
- design_research_analysis.fit_discrete_hmm_from_table(table, *, n_states=3, n_iter=100, seed=0, backend='hmmlearn', event_column='event_type', session_column='session_id', actor_column='actor_id', include_actor_in_token=False, actor_mapper=None, event_mapper=None, session_mapper=None, table_config=None)[source]#
Fit a discrete HMM from unified-table event records.
- design_research_analysis.fit_markov_chain_from_table(table, *, order=1, smoothing=1.0, event_column='event_type', session_column='session_id', actor_column='actor_id', include_actor_in_token=False, actor_mapper=None, event_mapper=None, session_mapper=None, table_config=None)[source]#
Fit a Markov chain from unified-table event records.
- design_research_analysis.fit_mixed_effects(data, *, formula, group_column, backend='statsmodels', reml=True, max_iter=200)[source]#
Fit a mixed-effects model using
statsmodels.
- design_research_analysis.fit_regression(X, y, *, feature_names=None, add_intercept=True)[source]#
Fit an ordinary least squares regression model with
numpy.
- design_research_analysis.fit_text_gaussian_hmm_from_table(table, *, text_column='text', session_column='session_id', n_states=3, embedder=None, model_name='all-MiniLM-L6-v2', normalize=True, batch_size=32, device='auto', covariance_type='diag', n_iter=100, seed=0, backend='hmmlearn', session_mapper=None, text_mapper=None, table_config=None)[source]#
Embed text from unified-table rows and fit a Gaussian HMM.
- design_research_analysis.fit_topic_model(data, *, n_topics=5, max_features=5000, random_state=0, text_column='text', top_k_terms=10)[source]#
Fit an LDA topic model and return topic summaries.
- Parameters:
data – Unified table rows or a list of texts.
n_topics – Number of latent topics.
max_features – Maximum vocabulary size.
random_state – Random seed.
text_column – Text column for table input.
top_k_terms – Number of representative terms per topic.
- Returns:
JSON-serializable topic summary.
- design_research_analysis.generate_codebook(df, *, descriptions=None)[source]#
Generate a compact codebook from a DataFrame or supported dataset file.
- design_research_analysis.is_notebook()[source]#
Return
Truewhen running in a notebook-style interactive shell.
- design_research_analysis.minimum_detectable_effect(n, *, test, alpha=0.05, power=0.8, ratio=1.0, alternative='two-sided')[source]#
Solve for the smallest detectable standardized effect size.
- design_research_analysis.permutation_test(x, y, *, stat='diff_means', n_permutations=20000, alternative='two-sided', seed=0)[source]#
Run a two-sample permutation test.
- design_research_analysis.plot_convergence_curve(metric_series, *, ax=None, title='Convergence Curve', ylabel='Metric Value')[source]#
Plot one or more stepwise convergence or divergence curves.
- Parameters:
metric_series – Either a single numeric series or
group -> series.ax – Optional Matplotlib axis.
title – Plot title.
ylabel – Y-axis label.
- Returns:
(figure, axis)tuple.
- design_research_analysis.plot_design_process_timeline(events, *, session_id=None, session_column='session_id', actor_column='actor_id', event_column='event_type', timestamp_column='timestamp', ax=None, title='Design Process Timeline')[source]#
Plot one session as an actor-by-time event timeline.
- Parameters:
events – Unified-table rows to visualize.
session_id – Explicit session to render when multiple sessions are present.
session_column – Session identifier column.
actor_column – Actor identifier column.
event_column – Event label column.
timestamp_column – Timestamp column used for ordering.
ax – Optional Matplotlib axis.
title – Plot title.
- Returns:
(figure, axis)tuple.
- design_research_analysis.plot_embedding_map(embedding_map, data, *, record_id_column='record_id', trace_column=None, order_column=None, value_column=None, ax=None, cmap='viridis', title=None)[source]#
Plot one embedding map with optional trace overlays and scalar coloring.
- design_research_analysis.plot_embedding_map_grid(embedding_maps, data, *, record_id_column='record_id', trace_column=None, order_column=None, value_column=None, cmap='viridis', title='Embedding Map Comparison')[source]#
Plot multiple embedding maps with shared trace overlays and color scale.
- design_research_analysis.plot_idea_trajectory(projection, *, groups=None, timestamps=None, ax=None, title='Idea Trajectory')[source]#
Plot ordered 2D trajectories through idea space.
- Parameters:
projection – Two-dimensional point matrix or projection result.
groups – Optional group labels that split the trajectory into paths.
timestamps – Optional timestamps used to order points within each group.
ax – Optional Matplotlib axis.
title – Plot title.
- Returns:
(figure, axis)tuple.
- design_research_analysis.plot_state_graph(transition, *, state_labels=None, threshold=0.0, ax=None, seed=0, title='State Transition Graph')[source]#
Render a directed state-transition graph.
- Parameters:
transition – Result object or raw square matrix.
state_labels – Optional display labels for states.
threshold – Draw edges with probability strictly above this value.
ax – Optional Matplotlib axis.
seed – Random seed passed to layout generation.
title – Plot title.
- Returns:
(figure, axis)tuple.
- design_research_analysis.plot_transition_matrix(transition, *, state_labels=None, ax=None, cmap='Blues', annotate=True, fmt='.2f', title='Transition Matrix')[source]#
Plot a transition matrix as a heatmap.
- Parameters:
transition – Result object or raw square matrix.
state_labels – Optional display labels for states.
ax – Optional Matplotlib axis.
cmap – Heatmap colormap.
annotate – Whether to annotate each cell with probabilities.
fmt – Format string for annotations.
title – Plot title.
- Returns:
(figure, axis)tuple.
- design_research_analysis.power_curve(effect_sizes, *, n, test, alpha=0.05, ratio=1.0, alternative='two-sided')[source]#
Compute achieved power over a sequence of effect sizes.
- design_research_analysis.profile_dataframe(df, *, max_categorical_levels=20)[source]#
Profile a DataFrame or supported dataset file without mutating inputs.
- design_research_analysis.rank_tests_one_stop(x, y=None, groups=None, *, paired=None, kind=None, alternative='two-sided', alpha=0.05)[source]#
Dispatch to a nonparametric rank test with consistent structured output.
- design_research_analysis.score_sentiment(data, *, text_column='text')[source]#
Score sentiment with a deterministic lexicon-based approach.
This lightweight scorer is intentionally simple and offline-friendly.
- design_research_analysis.validate_dataframe(df, schema)[source]#
Validate a DataFrame or supported dataset file against a declarative schema.
- design_research_analysis.validate_unified_table(table, *, config=None)[source]#
Validate a unified table against the configured contract.
- Parameters:
table – Coerced unified table rows or a supported path-like input.
config – Optional table configuration.
- Returns:
Validation report with errors and warnings.