wundfolkspuristspragmatic
/data/papers/wundfolkspuristspragmatic/analysis/effect_sizes.qmd
---
title: "Effect size computations: wundfolkspuristspragmatic"
format:
  html:
    toc: true
execute:
  echo: true
  warning: true
  message: false
---

Computes standardized mean differences (`d`) and sampling variances (`v`) for the
extraction YAML `papers/wundfolkspuristspragmatic/wundfolkspuristspragmatic.yaml`.

## Data sources

- Experiment 1 knowledge attribution counts: `../out/tables/camelot_stream_p12_t2.csv`
- Experiment 2 knowledge attribution counts: `../out/tables/camelot_stream_p18_t5.csv`
- Experiment 3 knowledge attribution counts: `../out/tables/camelot_stream_p23_t7.csv`

## Outcome coding

- Experiments 1–2: knowledge attribution = options A&B (knows) vs C&D (does not know).
- Experiment 3: knowledge attribution = option A (knows without checking) vs option B.

Stakes contrast sign convention (per extraction instructions):
`d = mean(low) - mean(high)` where the “mean” is the proportion of knowledge attributions.

## Conversion: exact 2×2 counts (OR) → d

For each stakes contrast, we use exact 2×2 counts and compute Cohen's `d` via
`esc::esc_2x2(..., es.type = "d")`, which converts the odds ratio (OR) to `d`.

Group mapping is explicit for sign:
- `grp1 = low stakes`
- `grp2 = high stakes`

so positive `d` means higher knowledge attribution in low-stakes than high-stakes.

```{r}
paper_key <- "wundfolkspuristspragmatic"

# Each row defines one effect (scenario within experiment).
effects <- data.frame(
  study_id = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  effect_id = c("s1_e1", "s1_e2", "s1_e3", "s2_e1", "s2_e2", "s2_e3", "s3_e1", "s3_e2", "s3_e3"),
  scenario = c("bank", "airport", "spelling", "bank", "airport", "spelling", "bank", "airport", "spelling"),
  table_ref = c(
    rep("camelot_stream_p12_t2.csv", 3),
    rep("camelot_stream_p18_t5.csv", 3),
    rep("camelot_stream_p23_t7.csv", 3)
  ),
  page = c(rep(12, 3), rep(18, 3), rep(23, 3)),
  # Counts: knowledge attribution (yes) vs no, by stakes.
  yes_low = c(40, 40, 29, 36, 35, 22, 36, 34, 33),
  no_low = c(10, 6, 19, 6, 9, 20, 12, 8, 31),
  yes_high = c(32, 41, 27, 36, 48, 29, 47, 57, 25),
  no_high = c(12, 4, 12, 12, 6, 13, 21, 16, 26),
  stringsAsFactors = FALSE
)

compute_from_2x2 <- function(yes_low, no_low, yes_high, no_high) {
  n_low <- yes_low + no_low
  n_high <- yes_high + no_high
  p_low <- yes_low / n_low
  p_high <- yes_high / n_high
  odds_ratio <- (yes_low / no_low) / (yes_high / no_high)
  fit <- esc::esc_2x2(
    grp1yes = yes_low,
    grp1no = no_low,
    grp2yes = yes_high,
    grp2no = no_high,
    es.type = "d"
  )

  list(
    n_low = n_low,
    n_high = n_high,
    p_low = p_low,
    p_high = p_high,
    odds_ratio = odds_ratio,
    d = as.numeric(fit$es),
    v = as.numeric(fit$var)
  )
}

rows <- lapply(seq_len(nrow(effects)), function(i) {
  r <- effects[i, ]
  out <- compute_from_2x2(r$yes_low, r$no_low, r$yes_high, r$no_high)
  cbind(r, as.data.frame(out, stringsAsFactors = FALSE))
})

results <- do.call(rbind, rows)
results
```

## YAML copy/paste snippets

```{r}
for (i in seq_len(nrow(results))) {
  cat(sprintf(
    "%s (study %d; %s): d=%.12f v=%.12f (p_low=%.4f, p_high=%.4f; OR=%.6f)\n",
    results$effect_id[i],
    results$study_id[i],
    results$scenario[i],
    results$d[i],
    results$v[i],
    results$p_low[i],
    results$p_high[i],
    results$odds_ratio[i]
  ))
}
```