---
title: "Guide to Supported File Formats"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Guide to Supported File Formats}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
## Overview
PhysioIO provides read and write support for the most common file formats
used in physiological signal research. This vignette walks through each
format, explains when to use it, and demonstrates the basic API.
| Format | Read | Write | Package dependency |
|--------|:----:|:-----:|:------------------:|
| EDF/EDF+ | `readEDF()` | `writeEDF()` | (none -- built-in) |
| HDF5 | `readPhysioHDF5()` | `writePhysioHDF5()` | rhdf5, HDF5Array |
| CSV/TSV | `readCSV()` | `writeCSV()` | (none -- built-in) |
| MATLAB .mat | `readMAT()` | `writeMAT()` | R.matlab |
| BIDS | `readBIDS()` | `writeBIDS()` | (none -- built-in) |
| Clinical CSV | `readClinicalMetadataCSV()` | -- | (none -- built-in) |
| RDS | `readPhysio()` | `writePhysio()` | (none -- built-in) |
All readers return a `PhysioExperiment` object, so downstream analysis
code is identical regardless of the input format.
## EDF / EDF+
European Data Format (EDF) is the *de facto* standard for polysomnography
and clinical EEG recordings. EDF+ extends the original format with
support for annotations and discontinuous recordings.
### Reading EDF files
```{r edf-read}
library(PhysioIO)
# Read an entire EDF file
pe <- readEDF("recording.edf")
# Read only a subset of channels
pe <- readEDF("recording.edf", channels = c("Fp1", "Fp2", "C3", "C4"))
# Read a specific time window (seconds)
pe <- readEDF("recording.edf", start_time = 10, end_time = 60)
```
### Writing EDF files
```{r edf-write}
writeEDF(pe, "output.edf")
```
**Reference:** Kemp B, et al. (1992). "A simple format for exchange of
digitized polygraphic recordings." *Electroencephalography and Clinical
Neurophysiology*, 82(5), 391--393.
## HDF5
HDF5 is ideal for large datasets because it supports chunked,
compressed, out-of-memory storage. PhysioIO uses the Bioconductor
`rhdf5` and `HDF5Array` packages so that the data can remain on disk
while you operate on it.
### Writing to HDF5
```{r hdf5-write}
# Save with default compression (level 6)
writePhysioHDF5(pe, "data.h5")
# Higher compression
writePhysioHDF5(pe, "data.h5", overwrite = TRUE, compression_level = 9)
```
### Reading from HDF5
```{r hdf5-read}
# Load into memory
pe <- readPhysioHDF5("data.h5")
# Keep data on disk (HDF5-backed)
pe_lazy <- readPhysioHDF5("data.h5", on_disk = TRUE)
# Check if an object is HDF5-backed
isHDF5Backed(pe_lazy)
```
### Selective assay I/O
```{r hdf5-assay}
# Write a single assay to an existing HDF5 file
writeAssayHDF5(pe, "data.h5", assay_name = "filtered")
# Bring HDF5-backed data into memory
pe_mem <- realizeHDF5(pe_lazy)
```
**Reference:** The HDF Group (1997--2024). "Hierarchical Data Format,
version 5."
## CSV / TSV
CSV is the most portable format and is useful for small- to
medium-sized datasets or for interoperability with spreadsheet software
and Python/pandas.
### Reading CSV
```{r csv-read}
# Wide format: one column per channel
pe <- readCSV("signals.csv", time_col = "time", sampling_rate = 256)
# Without a time column (auto-generate from sampling rate)
pe <- readCSV("signals.csv", sampling_rate = 256)
# Long (tidy) format: channel, value columns
pe <- readCSV("signals_long.csv", format = "long", sampling_rate = 256)
# TSV variant
pe <- readCSV("signals.tsv", sep = "\t", sampling_rate = 256)
```
### Writing CSV
```{r csv-write}
# Wide format (default)
writeCSV(pe, "output.csv")
# Long / tidy format
writeCSV(pe, "output_long.csv", format = "long")
# Export a specific assay
writeCSV(pe, "filtered.csv", assay_name = "filtered")
```
### Events and electrode positions
```{r csv-events}
# Read events from CSV
events <- readEventsCSV("events.csv")
# Write events
writeEventsCSV(events, "events_out.csv")
# Read electrode positions
positions <- readElectrodePositionsCSV("electrodes.csv")
```
**Reference:** Wickham H (2014). "Tidy Data." *Journal of Statistical
Software*, 59(10), 1--23.
## MATLAB .mat files
PhysioIO can read and write MATLAB `.mat` files via the `R.matlab`
package. Auto-detection logic handles common EEG toolbox conventions
(EEGLAB, FieldTrip).
### Reading .mat files
```{r mat-read}
# Auto-detect data variable
pe <- readMAT("eeg_data.mat")
# Specify variable names explicitly
pe <- readMAT("data.mat", data_var = "signal", sr_var = "fs")
# Read EEGLAB .set file
pe <- readMAT("EEG.set", data_var = "EEG")
```
### Writing .mat files
```{r mat-write}
writeMAT(pe, "output.mat")
# Custom variable name
writeMAT(pe, "output.mat", data_var = "EEG_data")
```
**Reference:** MathWorks (2024). "MAT-File Format." Technical
documentation.
## Clinical metadata CSV
For rehabilitation and clinical research, PhysioIO provides utilities to
load, validate, and harmonize clinical assessment metadata (e.g., FIM,
Barthel Index scores) that accompany physiological recordings.
### Reading and validating clinical metadata
```{r clinical-read}
# Read with column renaming and automatic validation
df <- readClinicalMetadataCSV(
"assessments.csv",
col_map = c(sid = "subject_id", vid = "visit_id")
)
# Validate an existing data frame
result <- validateClinicalMetadata(df)
result$valid
```
### Harmonizing clinical codes across sites
```{r clinical-map}
# Map site-specific scale names to standard codes
mapping <- c(fim_total = "FIM", Berg = "BBS")
df_std <- mapClinicalCodes(df, mapping, code_col = "scale_name")
```
**Reference:** Goldberger AL, et al. (2000). "PhysioBank, PhysioToolkit,
and PhysioNet: components of a new research resource for complex
physiologic signals." *Circulation*, 101(23), e215--e220.
## RDS (native R serialization)
For quick save/restore within R, you can serialize a `PhysioExperiment`
to an RDS file. This preserves all slots and metadata exactly.
```{r rds}
# Write
writePhysio(pe, "experiment.rds")
# Read
pe <- readPhysio("experiment.rds")
```
## Choosing a format
| Scenario | Recommended format |
|----------|--------------------|
| Long-term archival or sharing | EDF or HDF5 |
| Very large datasets (> 1 GB) | HDF5 (on-disk) |
| Interoperability with Python | CSV or HDF5 |
| Interoperability with MATLAB | .mat |
| BIDS-compliant data sharing | BIDS (EDF underneath) |
| Quick R-only save/restore | RDS |
## Session info
```{r sessioninfo}
sessionInfo()
```