REPORT — porteretalndpuzzleaboutknowledge

What I processed

Target PDF: inbox/Porter_et_al_2025.pdf
Paper key: porteretalndpuzzleaboutknowledge
Raw dataset used for recomputation: OSF GPP study 3 nlp.csv (https://osf.io/download/59qd6/)

Local extraction outputs:
out/tables/camelot_stream_p4_t1.csv (sample demographics / passed check counts)
out/tables/camelot_stream_p9_t4.csv (published evidence-fixed stakes coefficients)
out/tables/camelot_stream_p13_t7.csv (published evidence-seeking stakes coefficients)
out/fulltext.md
OSF materials:
GPP study 3 nlp.csv (participant-level raw data)
A Puzzle About Knowledge Ascription.Rmd (analysis script used by authors)

papers/porteretalndpuzzleaboutknowledge/analysis/effect_sizes.qmd now computes split effects from raw data:

Evidence-fixed (binary q2_knowledge):
split by evidence strength (weak = num_checks == "O", strong = num_checks == "F")
stakes contrast within each split via exact 2x2 counts
effect size via esc::esc_2x2(es.type = "d")
continuity correction +0.5 only if any 2x2 cell is zero
Evidence-seeking (numeric nlp):
split by evidence strength (weak/strong)
stakes contrast within each split via group means/SDs
effect size via esc::esc_mean_sd(es.type = "d")

Sign convention everywhere:

Filtering logic applied (matching paper workflow):

papers/porteretalndpuzzleaboutknowledge/porteretalndpuzzleaboutknowledge.yaml contains four extracted effects per site/language sample:

Each effect now includes:

split-specific groups (low_stakes, high_stakes)
raw-data-based reported_test.notes
effect_size from esc when computable
needs_review: true + quality_flags: [insufficient_data_for_split_effect] when not computable

Updated on 2026-04-20: the YAML now treats Porter et al. as one paper-reported cross-cultural dam study, not as 15 separate studies.
The 15 former site/language study entries were moved into effect-level site_id, site_label, language, language_other, and sample fields.
All 60 site×evidence-strength×outcome effect sizes and effect IDs were retained. Effect subgroup labels are prefixed with the site label to keep exported rows readable.
This change is semantic/structural only: sites no longer inflate the study count, while site-specific sample metadata remains available to the exporter through effect-level overrides.

Machine-readable audit file:

s6_e1 (India - Meitei, evidence-fixed, weak): n_low=0, n_high=20 reason: missing one stakes group in this evidence stratum.
s6_e3 (India - Meitei, evidence-seeking, weak): n_low=0, n_high=18 reason: insufficient per-group data for esc_mean_sd.
s9_e3 (Peru - Shipibo, evidence-seeking, weak): n_low=0, n_high=1 reason: insufficient per-group data for esc_mean_sd.
s9_e4 (Peru - Shipibo, evidence-seeking, strong): n_low=0, n_high=2 reason: insufficient per-group data for esc_mean_sd.
s12_e3 (South Africa - Sepedi, evidence-seeking, weak): n_low=1, n_high=10 reason: insufficient per-group data for esc_mean_sd (SD undefined in low group).
s12_e4 (South Africa - Sepedi, evidence-seeking, strong): n_low=1, n_high=10 reason: insufficient per-group data for esc_mean_sd (SD undefined in low group).
s13_e3 (South Africa - isiZulu, evidence-seeking, weak): n_low=0, n_high=7 reason: insufficient per-group data for esc_mean_sd.
s13_e4 (South Africa - isiZulu, evidence-seeking, strong): n_low=1, n_high=4 reason: insufficient per-group data for esc_mean_sd (SD undefined in low group).

This report supersedes the older Table-4/Table-7-only computation notes for this paper.
The old statement that all evidence-seeking effects were non-computable from local extracted tables is still true for table-only extraction, but raw OSF data now supports split effect computation for most effects.