--- title: "BIDS Format Support and DuckDB Backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{BIDS Format Support and DuckDB Backend} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Introduction PhysioIO provides two complementary features for managing collections of physiological recordings: 1. **BIDS support** -- read and write data in the Brain Imaging Data Structure (BIDS) format, the community standard for organizing neuroimaging and electrophysiology datasets. 2. **DuckDB backend** -- store experiment metadata, channel information, events, and (optionally) signal data in an embedded analytical database for fast querying across many sessions. This vignette demonstrates both capabilities and shows how they work together. ## BIDS format support ### What is BIDS? The Brain Imaging Data Structure (BIDS) is a standard for organizing neuroimaging data into a consistent directory layout with machine-readable metadata files. PhysioIO supports the BIDS-EEG and BIDS-iEEG extensions. A minimal BIDS dataset looks like this: ``` my_dataset/ dataset_description.json sub-01/ eeg/ sub-01_task-rest_eeg.edf sub-01_task-rest_eeg.json sub-01_task-rest_channels.tsv sub-01_task-rest_events.tsv sub-01_task-rest_electrodes.tsv ``` **Reference:** Gorgolewski KJ, et al. (2016). "The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments." *Scientific Data*, 3, 160044. ### Reading from a BIDS dataset ```{r bids-read} library(PhysioIO) # Read a single recording pe <- readBIDS("path/to/bids", subject = "01", task = "rest") # With session and run pe <- readBIDS( "path/to/bids", subject = "01", session = "baseline", task = "oddball", run = 1, modality = "eeg" ) # Skip loading events pe <- readBIDS("path/to/bids", subject = "02", task = "rest", load_events = FALSE) ``` `readBIDS()` automatically loads the EDF data file and merges information from the companion `_channels.tsv`, `_events.tsv`, and `_electrodes.tsv` sidecar files when they are present. ### Writing to BIDS format ```{r bids-write} # Export a PhysioExperiment to BIDS layout writeBIDS(pe, "output/bids", subject = "01", task = "rest") # With session and run identifiers writeBIDS(pe, "output/bids", subject = "01", session = "baseline", task = "oddball", run = 1) ``` `writeBIDS()` creates the required directory structure, writes the EDF data file, and generates the sidecar TSV and JSON files. ### Exploring a BIDS dataset ```{r bids-explore} # List all subjects subjects <- listBIDSSubjects("path/to/bids") print(subjects) # List sessions for a subject sessions <- listBIDSSessions("path/to/bids", subject = "01") print(sessions) ``` ### Validating BIDS compliance ```{r bids-validate} result <- validateBIDS("path/to/bids") if (result$valid) { message("Dataset is BIDS-compliant") message("Subjects found: ", result$n_subjects) } else { message("Errors: ", paste(result$errors, collapse = "; ")) message("Warnings: ", paste(result$warnings, collapse = "; ")) } ``` The validator checks for `dataset_description.json`, subject directory naming, modality directories, and required metadata fields. ## DuckDB database backend ### Why use a database? When a study involves dozens or hundreds of recording sessions, it becomes impractical to load every file just to find which subjects had a particular task or which recordings fall within a date range. The DuckDB backend lets you register experiment metadata once and query it efficiently. **Reference:** Raasveldt M, Muehleisen H (2019). "DuckDB: an embeddable analytical database." Proceedings of the 2019 International Conference on Management of Data (SIGMOD). ### Connecting and initializing ```{r db-connect} # In-memory database (useful for testing) con <- connectDatabase() # File-based database (persistent) con <- connectDatabase("my_study.duckdb") # Create the schema tables initPhysioSchema(con) ``` ### Database schema `initPhysioSchema()` creates the following tables: | Table | Purpose | |-------|---------| | `experiments` | One row per recording session (subject, task, sampling rate, etc.) | | `channels` | Channel metadata (label, type, unit, electrode position) | | `events` | Experimental events (onset, duration, type) | | `signal_chunks` | Optional chunked signal data stored as BLOBs | | `epochs` | Epoched data metadata | | `annotations` | Free-form annotations | ### Registering experiments ```{r db-register} # Read a recording pe <- readEDF("sub01_rest.edf") # Register metadata only exp_id <- registerExperiment(con, pe, subject_id = "sub-01", task = "rest" ) # Register with signal data stored in the database exp_id <- registerExperiment(con, pe, subject_id = "sub-01", task = "rest", store_signals = TRUE, chunk_size = 10000L ) ``` ### Querying experiments ```{r db-query} # All experiments all_exps <- queryExperiments(con) # Filter by subject sub01 <- queryExperiments(con, subject_id = "sub-01") # Filter by task rest <- queryExperiments(con, task = "rest") # Filter by date range recent <- queryExperiments(con, date_range = c("2025-01-01", "2025-12-31") ) ``` ### Loading experiments from the database ```{r db-load} # Load metadata (signal matrix is NA-filled) pe <- loadExperiment(con, exp_id) # Load with signal data pe <- loadExperiment(con, exp_id, load_signals = TRUE) ``` ### Managing experiments ```{r db-manage} # Get database statistics stats <- dbStats(con) cat("Experiments:", stats$n_experiments, "\n") cat("Channels:", stats$n_channels, "\n") cat("Events:", stats$n_events, "\n") cat("Subjects:", paste(stats$subjects, collapse = ", "), "\n") # Delete an experiment and all related records deleteExperiment(con, exp_id) ``` ### Disconnecting ```{r db-disconnect} # Always disconnect when done disconnectDatabase(con) ``` ## Combining BIDS and DuckDB A common workflow is to organize raw data in BIDS format on disk and register each recording in a DuckDB database for fast metadata queries. ```{r combined-workflow} # Set up database con <- connectDatabase("study.duckdb") initPhysioSchema(con) # Iterate over BIDS subjects bids_root <- "path/to/bids" subjects <- listBIDSSubjects(bids_root) for (subj in subjects) { # Read from BIDS pe <- readBIDS(bids_root, subject = subj, task = "rest") # Register in database registerExperiment(con, pe, subject_id = paste0("sub-", subj), task = "rest" ) } # Now query across all subjects all_rest <- queryExperiments(con, task = "rest") print(all_rest) disconnectDatabase(con) ``` ## Session info ```{r sessioninfo} sessionInfo() ```