---
title: "Gold-Standard Data Download and Benchmark Setup"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Gold-Standard Data Download and Benchmark Setup}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

This vignette shows how to move from dataset discovery to a runnable benchmark
manifest for external gold-standard validation.

## 1. Preconditions

To download open datasets, you need:

1. `bash` and `wget`
2. DNS/HTTPS access to external hosts (for example, `physionet.org`,
   `ncbi.nlm.nih.gov`, `ftp.ncbi.nlm.nih.gov`)

In this monorepo, the helper script is:

- `publication/scripts/download_open_medrehab_gold_data.sh`

```{bash, eval = FALSE}
bash publication/scripts/download_open_medrehab_gold_data.sh data/external
```

Controlled datasets (for example, MIMIC-IV full, eICU, MOST, OAI controlled
resources) require account approval and DUA steps before download.

## 2. Create a benchmark manifest

Use a template and then replace file paths with your real prediction/reference
pairs (`.csv`, `.mot`, `.sto`, `.trc`).

```{r}
library(PhysioMoCap)

manifest <- benchmarkManifestTemplate(n = 2)
manifest
```

Write the template to CSV if needed:

```{r}
tmp_manifest <- tempfile("benchmark_manifest_", fileext = ".csv")
writeBenchmarkManifest(tmp_manifest, n = 2, overwrite = TRUE)
tmp_manifest
```

## 3. Point manifest rows to your downloaded files

Below is a minimal pattern. File names are examples; replace them with paths
that exist in your local `data_dir`.

```{r}
data_dir <- tempfile("gold_data_")
dir.create(data_dir, recursive = TRUE, showWarnings = FALSE)

# Demo-only placeholder files so the validation example can run end-to-end.
pred <- data.frame(knee_angle = rnorm(200, sd = 0.02),
                   hip_angle = rnorm(200, sd = 0.02))
ref  <- data.frame(knee_angle = pred$knee_angle + rnorm(200, sd = 0.01),
                   hip_angle = pred$hip_angle + rnorm(200, sd = 0.01))

utils::write.csv(pred, file.path(data_dir, "prediction_trial1.csv"), row.names = FALSE)
utils::write.csv(ref,  file.path(data_dir, "reference_trial1.csv"), row.names = FALSE)

manifest <- benchmarkManifestTemplate(n = 1)
manifest$benchmark_id[1] <- "trial1_external"
manifest$prediction_file[1] <- "prediction_trial1.csv"
manifest$reference_file[1] <- "reference_trial1.csv"
manifest$modality[1] <- "mocap"
manifest$units[1] <- "SI"
manifest$sampling_rate[1] <- 100
manifest
```

## 4. Validate manifest integrity

```{r}
v <- validateBenchmarkManifest(manifest, data_dir = data_dir)
v
```

If `v$valid` is `FALSE`, inspect `v$issues` and fix file paths or required
columns (`benchmark_id`, `prediction_file`, `reference_file`).

## 5. Run benchmark suite

```{r}
suite <- runBenchmarkSuite(
  manifest = manifest,
  data_dir = data_dir,
  thresholds = defaultBenchmarkThresholds("balanced"),
  alignment = "truncate"
)

suite$suite_summary
head(suite$metrics)
```

## 6. Replace placeholders with real external data

1. Keep the same manifest contract.
2. Replace the placeholder paths with real downloaded files (`.csv`, `.mot`, `.sto`, `.trc`).
3. Re-run `validateBenchmarkManifest()` and `runBenchmarkSuite()`.
4. Optionally set `report_dir` in `runBenchmarkSuite()` to export reports.

For repository-level data availability and source links, see:

- `publication/data_availability_and_download_guide_ja.md`
- `publication/gold_standard_data_search_framework_medrehab_ja.md`