--- title: "Guide to Supported File Formats" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Guide to Supported File Formats} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview PhysioIO provides read and write support for the most common file formats used in physiological signal research. This vignette walks through each format, explains when to use it, and demonstrates the basic API. | Format | Read | Write | Package dependency | |--------|:----:|:-----:|:------------------:| | EDF/EDF+ | `readEDF()` | `writeEDF()` | (none -- built-in) | | HDF5 | `readPhysioHDF5()` | `writePhysioHDF5()` | rhdf5, HDF5Array | | CSV/TSV | `readCSV()` | `writeCSV()` | (none -- built-in) | | MATLAB .mat | `readMAT()` | `writeMAT()` | R.matlab | | BIDS | `readBIDS()` | `writeBIDS()` | (none -- built-in) | | Clinical CSV | `readClinicalMetadataCSV()` | -- | (none -- built-in) | | RDS | `readPhysio()` | `writePhysio()` | (none -- built-in) | All readers return a `PhysioExperiment` object, so downstream analysis code is identical regardless of the input format. ## EDF / EDF+ European Data Format (EDF) is the *de facto* standard for polysomnography and clinical EEG recordings. EDF+ extends the original format with support for annotations and discontinuous recordings. ### Reading EDF files ```{r edf-read} library(PhysioIO) # Read an entire EDF file pe <- readEDF("recording.edf") # Read only a subset of channels pe <- readEDF("recording.edf", channels = c("Fp1", "Fp2", "C3", "C4")) # Read a specific time window (seconds) pe <- readEDF("recording.edf", start_time = 10, end_time = 60) ``` ### Writing EDF files ```{r edf-write} writeEDF(pe, "output.edf") ``` **Reference:** Kemp B, et al. (1992). "A simple format for exchange of digitized polygraphic recordings." *Electroencephalography and Clinical Neurophysiology*, 82(5), 391--393. ## HDF5 HDF5 is ideal for large datasets because it supports chunked, compressed, out-of-memory storage. PhysioIO uses the Bioconductor `rhdf5` and `HDF5Array` packages so that the data can remain on disk while you operate on it. ### Writing to HDF5 ```{r hdf5-write} # Save with default compression (level 6) writePhysioHDF5(pe, "data.h5") # Higher compression writePhysioHDF5(pe, "data.h5", overwrite = TRUE, compression_level = 9) ``` ### Reading from HDF5 ```{r hdf5-read} # Load into memory pe <- readPhysioHDF5("data.h5") # Keep data on disk (HDF5-backed) pe_lazy <- readPhysioHDF5("data.h5", on_disk = TRUE) # Check if an object is HDF5-backed isHDF5Backed(pe_lazy) ``` ### Selective assay I/O ```{r hdf5-assay} # Write a single assay to an existing HDF5 file writeAssayHDF5(pe, "data.h5", assay_name = "filtered") # Bring HDF5-backed data into memory pe_mem <- realizeHDF5(pe_lazy) ``` **Reference:** The HDF Group (1997--2024). "Hierarchical Data Format, version 5." ## CSV / TSV CSV is the most portable format and is useful for small- to medium-sized datasets or for interoperability with spreadsheet software and Python/pandas. ### Reading CSV ```{r csv-read} # Wide format: one column per channel pe <- readCSV("signals.csv", time_col = "time", sampling_rate = 256) # Without a time column (auto-generate from sampling rate) pe <- readCSV("signals.csv", sampling_rate = 256) # Long (tidy) format: channel, value columns pe <- readCSV("signals_long.csv", format = "long", sampling_rate = 256) # TSV variant pe <- readCSV("signals.tsv", sep = "\t", sampling_rate = 256) ``` ### Writing CSV ```{r csv-write} # Wide format (default) writeCSV(pe, "output.csv") # Long / tidy format writeCSV(pe, "output_long.csv", format = "long") # Export a specific assay writeCSV(pe, "filtered.csv", assay_name = "filtered") ``` ### Events and electrode positions ```{r csv-events} # Read events from CSV events <- readEventsCSV("events.csv") # Write events writeEventsCSV(events, "events_out.csv") # Read electrode positions positions <- readElectrodePositionsCSV("electrodes.csv") ``` **Reference:** Wickham H (2014). "Tidy Data." *Journal of Statistical Software*, 59(10), 1--23. ## MATLAB .mat files PhysioIO can read and write MATLAB `.mat` files via the `R.matlab` package. Auto-detection logic handles common EEG toolbox conventions (EEGLAB, FieldTrip). ### Reading .mat files ```{r mat-read} # Auto-detect data variable pe <- readMAT("eeg_data.mat") # Specify variable names explicitly pe <- readMAT("data.mat", data_var = "signal", sr_var = "fs") # Read EEGLAB .set file pe <- readMAT("EEG.set", data_var = "EEG") ``` ### Writing .mat files ```{r mat-write} writeMAT(pe, "output.mat") # Custom variable name writeMAT(pe, "output.mat", data_var = "EEG_data") ``` **Reference:** MathWorks (2024). "MAT-File Format." Technical documentation. ## Clinical metadata CSV For rehabilitation and clinical research, PhysioIO provides utilities to load, validate, and harmonize clinical assessment metadata (e.g., FIM, Barthel Index scores) that accompany physiological recordings. ### Reading and validating clinical metadata ```{r clinical-read} # Read with column renaming and automatic validation df <- readClinicalMetadataCSV( "assessments.csv", col_map = c(sid = "subject_id", vid = "visit_id") ) # Validate an existing data frame result <- validateClinicalMetadata(df) result$valid ``` ### Harmonizing clinical codes across sites ```{r clinical-map} # Map site-specific scale names to standard codes mapping <- c(fim_total = "FIM", Berg = "BBS") df_std <- mapClinicalCodes(df, mapping, code_col = "scale_name") ``` **Reference:** Goldberger AL, et al. (2000). "PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals." *Circulation*, 101(23), e215--e220. ## RDS (native R serialization) For quick save/restore within R, you can serialize a `PhysioExperiment` to an RDS file. This preserves all slots and metadata exactly. ```{r rds} # Write writePhysio(pe, "experiment.rds") # Read pe <- readPhysio("experiment.rds") ``` ## Choosing a format | Scenario | Recommended format | |----------|--------------------| | Long-term archival or sharing | EDF or HDF5 | | Very large datasets (> 1 GB) | HDF5 (on-disk) | | Interoperability with Python | CSV or HDF5 | | Interoperability with MATLAB | .mat | | BIDS-compliant data sharing | BIDS (EDF underneath) | | Quick R-only save/restore | RDS | ## Session info ```{r sessioninfo} sessionInfo() ```