porteretalndpuzzleaboutknowledge
/data/papers/porteretalndpuzzleaboutknowledge/REPORT.md
# REPORT — porteretalndpuzzleaboutknowledge

## What I processed

- Target PDF: `inbox/Porter_et_al_2025.pdf`
- Paper key: `porteretalndpuzzleaboutknowledge`
- Raw dataset used for recomputation: OSF `GPP study 3 nlp.csv` (`https://osf.io/download/59qd6/`)

## Sources used

- Local extraction outputs:
  - `out/tables/camelot_stream_p4_t1.csv` (sample demographics / passed check counts)
  - `out/tables/camelot_stream_p9_t4.csv` (published evidence-fixed stakes coefficients)
  - `out/tables/camelot_stream_p13_t7.csv` (published evidence-seeking stakes coefficients)
  - `out/fulltext.md`
- OSF materials:
  - `GPP study 3 nlp.csv` (participant-level raw data)
  - `A Puzzle About Knowledge Ascription.Rmd` (analysis script used by authors)

## Current computation workflow (updated)

`papers/porteretalndpuzzleaboutknowledge/analysis/effect_sizes.qmd` now computes split effects from raw data:

- Evidence-fixed (binary `q2_knowledge`):
  - split by evidence strength (`weak` = `num_checks == "O"`, `strong` = `num_checks == "F"`)
  - stakes contrast within each split via exact 2x2 counts
  - effect size via `esc::esc_2x2(es.type = "d")`
  - continuity correction `+0.5` only if any 2x2 cell is zero

- Evidence-seeking (numeric `nlp`):
  - split by evidence strength (`weak`/`strong`)
  - stakes contrast within each split via group means/SDs
  - effect size via `esc::esc_mean_sd(es.type = "d")`

Sign convention everywhere:

- `d = mean(low stakes) - mean(high stakes)`

Filtering logic applied (matching paper workflow):

- `q1_importance < 3`
- merge Russia sub-sites into `russia`
- `age >= 18`
- comprehension-check pass proxy: `stakes == importance`
- for split logic: valid evidence code (`num_checks` in `{O, F}`)

## YAML update summary

`papers/porteretalndpuzzleaboutknowledge/porteretalndpuzzleaboutknowledge.yaml` contains four extracted effects per site/language sample:

- `sX_e1`: Evidence-fixed, weak evidence
- `sX_e2`: Evidence-fixed, strong evidence
- `sX_e3`: Evidence-seeking, weak evidence
- `sX_e4`: Evidence-seeking, strong evidence

Each effect now includes:

- split-specific `groups` (`low_stakes`, `high_stakes`)
- raw-data-based `reported_test.notes`
- `effect_size` from `esc` when computable
- `needs_review: true` + `quality_flags: [insufficient_data_for_split_effect]` when not computable

## Recoding decision — study/site/effect structure

- Updated on 2026-04-20: the YAML now treats Porter et al. as one paper-reported cross-cultural dam study, not as 15 separate studies.
- The 15 former site/language study entries were moved into effect-level `site_id`, `site_label`, `language`, `language_other`, and `sample` fields.
- All 60 site×evidence-strength×outcome effect sizes and effect IDs were retained. Effect subgroup labels are prefixed with the site label to keep exported rows readable.
- This change is semantic/structural only: sites no longer inflate the study count, while site-specific sample metadata remains available to the exporter through effect-level overrides.

## Computability summary

- Evidence-fixed split effects: `29 / 30` computable
- Evidence-seeking split effects: `23 / 30` computable
- Total: `52 / 60` computable

Machine-readable audit file:

- `papers/porteretalndpuzzleaboutknowledge/scratch/split_effects_from_raw.csv`

## 8 non-computable split effects (explicit)

1. `s6_e1` (India - Meitei, evidence-fixed, weak): `n_low=0`, `n_high=20`
   reason: missing one stakes group in this evidence stratum.
2. `s6_e3` (India - Meitei, evidence-seeking, weak): `n_low=0`, `n_high=18`
   reason: insufficient per-group data for `esc_mean_sd`.
3. `s9_e3` (Peru - Shipibo, evidence-seeking, weak): `n_low=0`, `n_high=1`
   reason: insufficient per-group data for `esc_mean_sd`.
4. `s9_e4` (Peru - Shipibo, evidence-seeking, strong): `n_low=0`, `n_high=2`
   reason: insufficient per-group data for `esc_mean_sd`.
5. `s12_e3` (South Africa - Sepedi, evidence-seeking, weak): `n_low=1`, `n_high=10`
   reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).
6. `s12_e4` (South Africa - Sepedi, evidence-seeking, strong): `n_low=1`, `n_high=10`
   reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).
7. `s13_e3` (South Africa - isiZulu, evidence-seeking, weak): `n_low=0`, `n_high=7`
   reason: insufficient per-group data for `esc_mean_sd`.
8. `s13_e4` (South Africa - isiZulu, evidence-seeking, strong): `n_low=1`, `n_high=4`
   reason: insufficient per-group data for `esc_mean_sd` (SD undefined in low group).

## Notes

- This report supersedes the older Table-4/Table-7-only computation notes for this paper.
- The old statement that all evidence-seeking effects were non-computable from local extracted tables is still true for table-only extraction, but raw OSF data now supports split effect computation for most effects.