porteretalndpuzzleaboutknowledge
/data/papers/porteretalndpuzzleaboutknowledge/REPORT.md# REPORT — porteretalndpuzzleaboutknowledge
## What I processed
- Target PDF: `inbox/Porter_et_al_2025.pdf`
- Paper key: `porteretalndpuzzleaboutknowledge`
- Raw dataset used for recomputation: OSF `GPP study 3 nlp.csv` (`https://osf.io/download/59qd6/`)
## Sources used
- Local extraction outputs:
- `out/tables/camelot_stream_p4_t1.csv` (sample demographics / passed check counts)
- `out/tables/camelot_stream_p9_t4.csv` (published evidence-fixed stakes coefficients)
- `out/tables/camelot_stream_p13_t7.csv` (published evidence-seeking stakes coefficients)
- `out/fulltext.md`
- OSF materials:
- `GPP study 3 nlp.csv` (participant-level raw data)
- `A Puzzle About Knowledge Ascription.Rmd` (analysis script used by authors)
## Current computation workflow (updated)
`papers/porteretalndpuzzleaboutknowledge/analysis/effect_sizes.qmd` now computes split effects from raw data:
- Evidence-fixed (binary `q2_knowledge`):
- split by evidence strength (`weak` = `num_checks == "O"`, `strong` = `num_checks == "F"`)
- stakes contrast within each split via exact 2x2 counts
- effect size via `esc::esc_2x2(es.type = "d")`
- continuity correction `+0.5` only if any 2x2 cell is zero
- Evidence-seeking (numeric `nlp`):
- split by evidence strength (`weak`/`strong`)
- stakes contrast within each split via group means/SDs
- effect size via `esc::esc_mean_sd(es.type = "d")`
Sign convention everywhere:
- `d = mean(low stakes) - mean(high stakes)`
Filtering logic applied (matching paper workflow):
- `q1_importance < 3`
- merge Russia sub-sites into `russia`
- `age >= 18`
- comprehension-check pass proxy: `stakes == importance`
- for split logic: valid evidence code (`num_checks` in `{O, F}`)
## YAML update summary
`papers/porteretalndpuzzleaboutknowledge/porteretalndpuzzleaboutknowledge.yaml` contains four extracted effects per site/language sample:
- `sX_e1`: Evidence-fixed, weak evidence
- `sX_e2`: Evidence-fixed, strong evidence
- `sX_e3`: Evidence-seeking, weak evidence
- `sX_e4`: Evidence-seeking, strong evidence
Each effect now includes:
- split-specific `groups` (`low_stakes`, `high_stakes`)
- raw-data-based `reported_test.notes`
- `effect_size` from `esc` when computable
- `needs_review: true` + `quality_flags: [insufficient_data_for_split_effect]` when not computable
## Recoding decision — study/site/effect structure
- Updated on 2026-04-20: the YAML now treats Porter et al. as one paper-reported cross-cultural dam study, not as 15 separate studies.
- The 15 former site/language study entries were moved into effect-level `site_id`, `site_label`, `language`, `language_other`, and `sample` fields.
- All 60 site×evidence-strength×outcome effect sizes and effect IDs were retained. Effect subgroup labels are prefixed with the site label to keep exported rows readable.
- This change is semantic/structural only: sites no longer inflate the study count, while site-specific sample metadata remains available to the exporter through effect-level overrides.
## Computability summary
- Evidence-fixed split effects: `29 / 30` computable
- Evidence-seeking split effects: `23 / 30` computable
- Total: `52 / 60` computable
Machine-readable audit file:
- `papers/porteretalndpuzzleaboutknowledge/scratch/split_effects_from_raw.csv`
## 8 non-computable split effects (explicit)
1. `s6_e1` (India - Meitei, evidence-fixed, weak): `n_low=0`, `n_high=20`
reason: missing one stakes group in this evidence stratum.
2. `s6_e3` (India - Meitei, evidence-seeking, weak): `n_low=0`, `n_high=18`
reason: insufficient per-group data for `esc_mean_sd`.
3. `s9_e3` (Peru - Shipibo, evidence-seeking, weak): `n_low=0`, `n_high=1`
reason: insufficient per-group data for `esc_mean_sd`.
4. `s9_e4` (Peru - Shipibo, evidence-seeking, strong): `n_low=0`, `n_high=2`
reason: insufficient per-group data for `esc_mean_sd`.
5. `s12_e3` (South Africa - Sepedi, evidence-seeking, weak): `n_low=1`, `n_high=10`
reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).
6. `s12_e4` (South Africa - Sepedi, evidence-seeking, strong): `n_low=1`, `n_high=10`
reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).
7. `s13_e3` (South Africa - isiZulu, evidence-seeking, weak): `n_low=0`, `n_high=7`
reason: insufficient per-group data for `esc_mean_sd`.
8. `s13_e4` (South Africa - isiZulu, evidence-seeking, strong): `n_low=1`, `n_high=4`
reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).
## Notes
- This report supersedes the older Table-4/Table-7-only computation notes for this paper.
- The old statement that all evidence-seeking effects were non-computable from local extracted tables is still true for table-only extraction, but raw OSF data now supports split effect computation for most effects.