Ensuring fMRI Preprocessing Pipeline Reliability: A Guide for Reproducible Research and Clinical Translation

Nathan Hughes Nov 26, 2025 88

Functional magnetic resonance imaging (fMRI) is a cornerstone of modern neuroscience and is increasingly used to inform drug development and clinical diagnostics.

Ensuring fMRI Preprocessing Pipeline Reliability: A Guide for Reproducible Research and Clinical Translation

Abstract

Functional magnetic resonance imaging (fMRI) is a cornerstone of modern neuroscience and is increasingly used to inform drug development and clinical diagnostics. However, the validity of its findings hinges on the reliability of the preprocessing pipeline, which cleans and standardizes the complex BOLD signal. This article provides a comprehensive guide for researchers and drug development professionals on establishing robust fMRI preprocessing workflows. We explore the foundational steps and common pitfalls, evaluate current methodologies from standardized tools to emerging foundation models, and present optimization strategies for specific clinical populations. Finally, we outline a framework for the quantitative validation and comparative assessment of pipelines, emphasizing metrics that enhance reproducibility and ensure findings are both statistically sound and biologically meaningful.

The Foundation of Reliable fMRI: Understanding Core Preprocessing Steps and Their Impact on Data Integrity

Functional magnetic resonance imaging (fMRI) has become a cornerstone technique for studying brain function in both basic neuroscience and clinical applications. However, the path from raw fMRI data to a scientifically valid and clinically actionable inference is fraught with methodological challenges. The reliability of the entire analytical process is fundamentally constrained by the first and most critical stage: data preprocessing. Variations in preprocessing methodologies across different software toolboxes and research groups have been identified as a major source of analytical variability, undermining the reproducibility of neuroimaging findings [1]. This application note examines the intrinsic link between preprocessing pipeline reliability and reproducible inference, framing the discussion within the context of a broader thesis on fMRI preprocessing pipeline reliability research. We explore the requirements, methodologies, and practical implementations for achieving standardized, robust preprocessing workflows that can support both scientific discovery and clinical decision-making.

The Standardization Imperative: From Analytical Variability to Reproducible Workflows

The Challenge of Analytical Variability

The neuroimaging field has produced a diverse ecosystem of software tools (e.g., AFNI, FreeSurfer, FSL, SPM) with varying implementations of common processing steps [1]. This methodological richness, while beneficial for knowledge formalization and accessibility, has gradually revealed a significant drawback: methodological variability has become an obstacle to obtaining reliable results and interpretations [1]. The Neuroimaging Analysis Replication and Prediction Study (NARPS) starkly illustrated this problem when 70 teams of fMRI experts analyzed the same dataset to test identical hypotheses [1]. The results demonstrated poor agreement in conclusions across teams, with methodological variability identified as the core source of divergent results [1].

Standardization as a Solution

Within classical test theory, standardization emerges as a powerful approach to enhance measurement reliability by reducing sources of variability relating to the measurement instrumentation [1]. For fMRI preprocessing, this involves strictly predetermining all experimental choices and establishing unique workflows. Standardized preprocessing offers numerous benefits for enhancing reliability and reproducibility:

Reduced analytical variability: By limiting the domain of coexisting analytical alternatives, standardization constrains the multiverse of possible pipelines that must be traversed in mapping result variability [1].
Improved comparability: Standardized outputs facilitate comparison between studies conducted at different sites, by different sponsors, and with different molecules [2].
Enhanced quality control: Consistent workflows enable systematic quality assessment and validation across diverse datasets [3].

However, standardization is not without trade-offs. A reliable measure is not necessarily "valid," and standardization may enforce specific assumptions about the data that introduce biases [1]. The challenge lies in developing standardized approaches that maintain robustness across data diversity while preserving flexibility for legitimate methodological variations required by specific research questions.

Quantitative Evidence: Measuring the Impact of Pipeline Reliability

Reproducibility Metrics for fMRI Pipelines

The evaluation of preprocessing pipeline performance requires robust metrics that capture different aspects of reliability. The NPAIRS split-half resampling framework provides prediction and reproducibility metrics that enable empirical optimization of pipeline components [4]. Studies utilizing this approach have demonstrated that both prediction and reproducibility metrics are required for pipeline optimization and often yield somewhat different results, highlighting the multi-faceted nature of pipeline reliability [4].

Single-Subject Reproducibility in Clinical Context

For clinical applications, single-subject reproducibility is particularly critical, as clinicians focus on individual patients rather than group averages. Established test-retest reliability guidelines based on intra-class correlation (ICC) interpret values below 0.40 as poor, 0.40â€“0.59 as fair, 0.60â€“0.74 as good, and above 0.75 as excellent [5]. For scientific purposes, a fair test-retest reliability of at least 0.40 is suggested, while an excellent correlation of at least 0.75 is required for clinical applications [5].

Table 1: Single-Subject Reproducibility Improvements with Optimized Filtering

Pipeline Type	Time Course Reproducibility (r)	Connectivity Correlation (r)	Clinical Applicability
Conventional SPM Pipeline	0.26	0.44	Not suitable (Poor)
Data-Driven SG Filter Framework	0.41	0.54	Potential (Fair)
Improvement	+57.7%	+22.7%	Poor â†’ Fair

Data derived from [5] demonstrates that conventional preprocessing pipelines yield single-subject time course reproducibility of only r = 0.26, which is far below the threshold for clinical utility [5]. However, implementing a data-driven Savitzky-Golay (SG) filter framework can improve average reproducibility correlation to r = 0.41, representing a 57.7% enhancement that brings single-subject reproducibility to a "fair" level according to established guidelines [5]. This improvement is substantial but also highlights the significant gap that remains before fMRI reaches the "excellent" reliability (ICC > 0.75) required for routine clinical use [5].

Foundation Models and Reproducibility

Recent advances in foundation models for fMRI analysis offer promising approaches to enhance reproducibility through large-scale pre-training. The NeuroSTORM model, pre-trained on 28.65 million fMRI frames from over 50,000 subjects, demonstrates how standardized representation learning can achieve consistent performance across diverse downstream tasks including age and gender prediction, phenotype prediction, and disease diagnosis [6]. By learning generalizable representations directly from 4D fMRI volumes, such models reduce sensitivity to acquisition variations and mitigate variability introduced by preprocessing pipelines [6].

Protocols for Pipeline Optimization and Validation

Protocol 1: Savitzky-Golay Filter Optimization for Enhanced Single-Subject Reproducibility

Background: Single-subject fMRI time course reproducibility is critical for clinical applications but remains limited in conventional pipelines. This protocol outlines a method for optimizing Savitzky-Golay (SG) filter parameters to enhance reproducibility [5].

Materials:

fMRI data from a working memory task (e.g., Sternberg tasks)
FreeSurfer/FsFast pipelines for preprocessing
MATLAB with Custom CleanBrain code (https://github.com/hinata2305/CleanBrain)

Procedure:

Data Acquisition and Preprocessing:
- Acquire fMRI data using a block-designed working memory task (e.g., verbal and spatial Sternberg tasks)
- Perform initial preprocessing using FreeSurfer's recon-all pipeline for segmentation
- Co-register 3D gray matter time courses to 2D FS_average space using FsFast spherical alignment
- Extract time courses from 34 working memory-related ROIs based on meta-analysis

Empirical Predictor Time Course Generation:
- Calculate subject-specific hemodynamic response function (HRF) by averaging task-related signal changes in the time course
- Use this empirical HRF rather than a canonical HRF to account for individual variations in BOLD expression
Parameter Optimization:
- Define optimization targets: maximize correlation between filtered time course and empirical predictor while maintaining autocorrelations within predefined limits
- Use brute-force algorithms to test different combinations of SG filter parameters (window size, polynomial order)
- Set autocorrelation limits based on the empirical predictor time course
Validation:
- Apply optimized SG filters to a distinct cognitive experiment
- Assess improvement in test-retest reliability of individual subject time courses
- Evaluate enhancement in detectable connectivity
- Verify that residual noise time courses do not contain task-related frequency bands

Expected Outcomes: Implementation of this protocol typically improves average time course reproducibility from r = 0.26 to r = 0.41 and connectivity correlation from r = 0.44 to r = 0.54 [5].

Protocol 2: Foundation Model Pre-training for Generalizable fMRI Analysis

Background: Foundation models represent a paradigm-shifting approach for enhancing reproducibility through large-scale pre-training and adaptable architectures. This protocol outlines the pre-training procedure for the NeuroSTORM foundation model [6].

Materials:

Large-scale fMRI datasets (UK Biobank, ABCD, HCP-YA, HCP-A, HCP-D)
Computational resources capable of handling 4D fMRI volumes (up to 10^6 voxels per scan)
PyTorch framework with custom NeuroSTORM implementation (github.com/CUHK-AIM-Group/NeuroSTORM)

Procedure:

Data Curation:
- Assemble a diverse corpus integrating multiple large-scale neuroimaging datasets
- Ensure data spans diverse demographics (ages 5-100), clinical conditions, and acquisition protocols
- Preprocess all data following BIDS standards and consistent preprocessing protocols

Model Architecture Implementation:
- Implement a Shifted-Window Mamba (SWM) backbone combining linear-time state-space modeling with shifted-window mechanisms
- Design architecture to efficiently process 4D fMRI volumes while reducing computational complexity and GPU memory usage
Pre-training Strategy:
- Apply Spatiotemporal Redundancy Dropout (STRD) module during pre-training to learn inherent characteristics in fMRI data
- Use self-supervised learning objectives to capture noise-resilient patterns
- Employ multi-task learning across different fMRI paradigms (resting-state, task-based)
Downstream Adaptation:
- Implement Task-specific Prompt Tuning (TPT) strategy for fine-tuning on specific applications
- Use minimal trainable, task-specific parameters when adapting to new tasks
- Validate across multiple downstream tasks: age/gender prediction, phenotype prediction, disease diagnosis, fMRI retrieval, tfMRI state classification

Expected Outcomes: A general-purpose fMRI foundation model that achieves state-of-the-art performance across diverse tasks, with enhanced reproducibility and transferability across populations and acquisition protocols [6].

Table 2: Key Research Reagents and Computational Resources for fMRI Pipeline Development

Resource	Type	Function	Access
fMRIflows	Software Pipeline	Fully automatic neuroimaging pipelines for fMRI analysis, performing standardized preprocessing, 1st- and 2nd-level univariate and multivariate analyses	https://github.com/miykael/fmriflows [3]
NeuroSTORM	Foundation Model	General-purpose fMRI analysis through large-scale pre-training, enabling enhanced reproducibility and transferability	https://github.com/CUHK-AIM-Group/NeuroSTORM [6]
BIDS Standard	Data Standard	Consistent framework for structuring data directories, naming conventions, and metadata specifications to maximize shareability	https://bids.neuroimaging.io/ [1]
fMRIPrep	Software Pipeline	Robust automated fMRI preprocessing pipeline with BIDS compliance, generating quality control measures	https://fmriprep.org/ [7] [1]
CleanBrain	MATLAB Package	Implementation of data-driven SG filter framework for enhancing single-subject time course reproducibility	https://github.com/hinata2305/CleanBrain [5]
OpenNeuro	Data Repository	Platform for sharing BIDS-formatted neuroimaging data, enabling testing of robustness across hundreds of datasets	https://openneuro.org/ [3] [1]

Visualizing the Preprocessing Reliability Framework

Relationship Between Pipeline Standardization and Inference Reliability

fMRI Pipeline Optimization and Validation Workflow

Application in Drug Development and Clinical Translation

fMRI in the Drug Development Pipeline

The reliability of fMRI preprocessing pipelines has particular significance in drug development, where functional neuroimaging has potential applications across multiple phases:

Phase 0/1: Detecting functional CNS effects of pharmacological treatment in appropriate brain regions, providing indirect evidence of target engagement [2].
Phase 2/3: Demonstrating normalization of disease-related fMRI signals at one or few dose levels, providing objective demonstration of disease modification [2].

For regulatory acceptance, fMRI readouts must be both reproducible and modifiable by pharmacological agents [2]. The high burden of proof for biomarker qualification requires rigorous characterization of precision and reproducibility, which directly depends on preprocessing pipeline reliability [2]. Currently, no fMRI biomarkers have been fully qualified by regulatory agencies, though initiatives like the European Autism Interventions project have requested qualification of fMRI biomarkers for stratifying autism spectrum disorder populations [2].

Clinical Applicability and Current Limitations

Despite technical advances, the clinical applicability of fMRI remains constrained by reliability limitations. A recent study concluded that roughly 10-30% of the population may benefit from optimized fMRI pipelines in a clinical setting, while this number was negligible for conventional pipelines [5]. This highlights both the potential value of pipeline optimization and the substantial work still required to make fMRI clinically viable for broader populations.

For presurgical mapping, a meta-analysis demonstrated that conducting fMRI mapping prior to surgical procedures reduces the likelihood of functional deterioration afterward (odds ratio: 0.25; 95% CI: 0.12, 0.53; P < .001) [5]. This evidence supports the clinical value of fMRI when properly implemented, underscoring the importance of reliable preprocessing pipelines for generating clinically actionable results.

The critical link between preprocessing pipeline reliability and reproducible inference in fMRI analysis cannot be overstated. As the field moves toward more clinical applications and larger-scale studies, standardization efforts through initiatives like BIDS, NiPreps, and foundation models offer promising pathways to enhanced reproducibility [6] [1]. The quantitative evidence presented demonstrates that methodical optimization of preprocessing components can substantially improve single-subject reproducibility, though significant gaps remain before fMRI reaches the reliability standards required for routine clinical use [5].

Future developments in fMRI pipeline reliability research should focus on several key areas: (1) enhanced computational efficiency to enable more sophisticated processing on large-scale datasets; (2) improved adaptability across diverse populations, from infancy to old age [7]; (3) more rigorous validation metrics that capture real-world clinical utility; and (4) greater integration with artificial intelligence approaches that can learn robust representations from large, multi-site datasets [6]. By addressing these challenges through collaborative, open-source development of standardized preprocessing tools, the neuroimaging community can strengthen the foundation upon which reproducible scientific inference and clinical decision-making are built.

A Step-by-Step Deconstruction of the Standard Preprocessing Workflow

Functional Magnetic Resonance Imaging (fMRI) has revolutionized our ability to non-invasively study brain function and connectivity. The preprocessing of raw fMRI data constitutes an essential foundation for all subsequent neurological and clinical inferences, as it transforms noisy, artifact-laden raw signals into standardized, analyzable data. The inherent complexity of fMRI data, which captures spontaneous blood oxygen-level dependent (BOLD) signals alongside numerous non-neuronal contributions, necessitates a rigorous preprocessing workflow to ensure valid scientific conclusions [8]. Within the broader context of fMRI preprocessing pipeline reliability research, this protocol deconstructs the standard workflow, emphasizing how each step contributes to the enhancement of data quality, reproducibility, and ultimately, the validity of neuroscientific and clinical findings. The establishment of robust, standardized protocols is particularly crucial for multi-site studies and clinical applications, such as drug development, where consistent measurement across time and location is paramount for detecting subtle treatment effects [9] [10].

The neuroimaging field has developed several sophisticated software packages to address the challenges of fMRI preprocessing. While implementations differ, they converge on a common set of objectives: removing unwanted artifacts, correcting for anatomical and acquisition-based distortions, and transforming data into a standard coordinate system for group-level analysis. The following diagram illustrates the logical sequence of a standard, volume-based preprocessing workflow, from raw data input to a preprocessed output ready for statistical analysis.

Figure 1: A standard volume-based fMRI preprocessing workflow. The yellow start node indicates raw data input, green nodes represent core preprocessing steps, red ellipses indicate optional or conditional data inputs, and the blue end node signifies the final preprocessed data ready for analysis.

Several major software pipelines implement this standard workflow, each with distinct strengths, methodological approaches, and suitability for different research contexts. The table below provides a structured comparison of these widely-used tools.

Table 1: Comparative Analysis of Major fMRI Preprocessing Pipelines

Pipeline Name	Core Methodology	Primary Output Space	Key Advantages	Typical Use Cases
fMRIPrep [11]	Analysis-agnostic, robust integration of best-in-breed tools (ANTs, FSL, FreeSurfer)	Volume & Surface	High reproducibility, minimal manual intervention, less uncontrolled spatial smoothness	Diverse fMRI data; large-scale, reproducible studies
CONN - Default Pipeline [12]	SPM12-based with realignment, unwarp, slice-time correction, direct normalization	Volume	User-friendly GUI, integrated denoising and connectivity analysis	Volume-based functional connectivity studies
FuNP [8]	Fusion of AFNI, FSL, FreeSurfer, Workbench components	Volume & Surface	Incorporates recent methodological developments, user-friendly GUI	Studies requiring combined volume/surface analysis
DeepPrep [13]	Replaces time-consuming steps (e.g., registration) with deep learning models	Volume & Surface	Dramatically reduced computation time (minutes vs. hours)	Rapid processing of large datasets; studies leveraging AI
HALFpipe [14]	Semi-automated pipeline based on fMRIPrep, designed for distributed analysis	Volume & Surface	Standardized for ENIGMA consortium; enables meta-analyses without raw data sharing	Large-scale, multi-site consortium studies

Detailed Protocol: Step-by-Step Deconstruction

Functional Realignment and Motion Correction

Purpose: To correct for head motion during the scanning session, which is a major source of artifact and spurious correlations in functional connectivity MRI networks [11].

Detailed Methodology: The functional time-series is realigned using a rigid-body registration where all scans are coregistered and resampled to a reference image (typically the first scan of the first session) using b-spline interpolation [12]. As part of this step, the realign & unwarp procedure in SPM12 also estimates the derivatives of the deformation field with respect to head movement. This addresses susceptibility-distortion-by-motion interactions, a key factor in improving data quality. When a double-echo sequence is available, the field inhomogeneity (fieldmap) inside the scanner is estimated and used for Susceptibility Distortion Correction (SDC), resampling the functional data along the phase-encoded direction to correct absolute deformation [12].

Outputs: The primary outputs are the realigned functional images, a new reference image (the average across all scans after realignment), and estimated motion parameters. These motion parameters (typically a .txt file with rp_ prefix) are critical as they are used for outlier identification in subsequent steps and as nuisance regressors during denoising [12].

Slice-Timing Correction

Purpose: To correct for the temporal misalignment between different slices introduced by the sequential nature of the fMRI acquisition protocol.

Detailed Methodology: Slice-timing correction (STC) is performed using sinc-interpolation to time-shift and resample the signal from each slice to match the time of a single reference slice (usually the middle of the acquisition time, TA). The specific slice acquisition order (ascending, interleaved, etc.) must be specified by the user or read automatically from the sidecar .json file in a BIDS-formatted dataset [12].

Outputs: The STC-corrected functional data, typically stored with an 'a' filename prefix in SPM-based pipelines [12].

Outlier Identification ("Scrubbing")

Purpose: To identify and flag individual volume acquisitions (scans) that are contaminated by excessive motion or abrupt global signal changes.

Detailed Methodology: Potential outlier scans are identified using framewise displacement (FD) and global BOLD signal changes. A common threshold, as implemented in CONN, flags acquisitions with FD above 0.9mm or global BOLD signal changes above 5 standard deviations [12]. Framewise displacement is computed by estimating the largest displacement among six control points placed at the center of a bounding box around the brain. These flagged scans are not immediately removed but are later used to create a "scrubbing" regressor for denoising, or the volumes can be outright removed from analysis.

Outputs: A list of potential outliers (imported as a 'scrubbing' first-level covariate) and a file containing scan-to-scan global BOLD change and head-motion measures for quality control (QC) [12].

Coregistration and Spatial Normalization

Purpose: To align the functional data with the subject's high-resolution anatomical image and subsequently warp both into a standard stereotaxic space (e.g., MNI) to enable group-level analysis.

Detailed Methodology: This is typically a two-step process.

Coregistration: A rigid registration aligns the mean functional image (after realignment) to the subject's T1-weighted anatomical image. This optimizes the alignment based on the boundary between gray and white matter [13].
Spatial Normalization: The functional and anatomical data are normalized into standard MNI space. In a Direct approach (e.g., CONN's default), unified segmentation, and normalization is applied separately to the functional and structural data. In an Indirect approach, often considered higher quality, the non-linear transformation is estimated from the high-resolution structural data and then applied to the functional data [12]. Deep learning pipelines like DeepPrep use tools like SynthMorph to achieve this non-linear volumetric registration in minutes instead of hours [13].

Outputs: MNI-space functional and anatomical data, and tissue class masks (Grey Matter, White Matter, CSF) which are used to create masks for extracting signals and for denoising [12].

Spatial Smoothing

Purpose: To increase the BOLD signal-to-noise ratio, suppress high-frequency noise, and accommodate residual anatomical variability across subjects.

Detailed Methodology: The normalized functional data is spatially convolved with a 3D Gaussian kernel. The full width at half maximum (FWHM) of the kernel is a key parameter; a common default is 8mm FWHM for volume-based analyses [12]. Surface-based pipelines perform smoothing along the cortical surface manifold rather than in 3D volume space.

Outputs: The final preprocessed, smoothed functional data, typically stored with an 's' filename prefix, ready for statistical analysis and denoising [12].

The Scientist's Toolkit: Essential Research Reagents

A successful preprocessing experiment relies on a suite of software tools and data resources. The following table details the key "research reagents" required for implementing a standard fMRI preprocessing workflow.

Table 2: Essential Materials and Software Tools for fMRI Preprocessing

Item Name	Function/Purpose	Specifications & Alternatives
fMRIPrep [11] [15]	A robust, analysis-agnostic pipeline for preprocessing diverse fMRI data. Ensures reproducibility and minimizes manual intervention.	Version 23.1.0+. Alternative: CONN Toolbox, SPM.
Reference Atlas [12]	Standard brain template for spatial normalization, enabling cross-subject and cross-study comparison.	MNI152 (ICBM 2009b Non-linear Symmetric). Alternatives: Colin27, FSAverage for surface-based analysis.
Tissue Probability Maps (TPMs) [12]	Prior maps of gray matter, white matter, and CSF used to guide the segmentation of structural and functional images.	Default TPMs from SPM12 or FSL.
FieldMap Data [12]	Optional but recommended data to estimate and correct for susceptibility-induced distortions (geometric distortions and signal loss).	Requires specific sequence: double-echo (magnitude and phase-difference images) or pre-computed fieldmap in Hz.
Quality Control Metrics [11] [8]	Quantitative measures to assess the success of preprocessing and identify potential data quality issues.	Framewise Displacement, Global Signal Change, Image Quality Metrics (IQMs) from MRIQC.
1-(Prop-2-yn-1-yl)piperazine-2-one	1-(Prop-2-yn-1-yl)piperazine-2-one\|RUO\|Supplier	1-(Prop-2-yn-1-yl)piperazine-2-one is a high-purity biochemical reagent for research use only (RUO). It is not for human or veterinary use.
5-Fluoro-1-methyl-3-nitropyridin-2(1H)-one	5-Fluoro-1-methyl-3-nitropyridin-2(1H)-one\|CAS 1616526-85-2

Reliability Considerations and Protocol Validation

The reliability of fMRI preprocessing pipelines is not merely a technical concern but a fundamental prerequisite for clinical translation. Poor test-retest reliability, often quantified by low intraclass correlation coefficients (ICCs), can undermine the detection of true biological effects, including those induced by therapeutic interventions [10]. Several strategies can optimize reliability:

Pipeline Robustness: Using standardized, automated pipelines like fMRIPrep reduces uncontrolled variability introduced by ad-hoc workflows and manual intervention, directly enhancing reproducibility [11].
Handling Special Populations: Preprocessing must be adapted for specific clinical populations. For example, in stroke patients, a lesion-specific pipeline that accounts for brain lesions when computing tissue masks and incorporates ICA to address lesion-driven artifacts has been shown to significantly reduce spurious connectivity [16].
Cross-Species Standardization: The principles of standardization extend to animal models. Protocols for rodent fMRI that are generalizable across laboratories and scanner platforms facilitate the acquisition of large, comparable datasets essential for uncovering often small effects, thereby improving the reliability of preclinical findings [9].

The validation of any preprocessing protocol should include a quantitative quality control step. This involves calculating image quality metrics (IQMs) such as framewise displacement, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR), and comparing resting-state networks (RSNs) obtained with the pipeline against pre-defined canonical networks to ensure biological validity [8]. By rigorously deconstructing and standardizing each step of the preprocessing workflow, researchers can significantly enhance the reliability of their fMRI data, paving the way for more robust and reproducible discoveries in basic neuroscience and clinical drug development.

Functional magnetic resonance imaging (fMRI) has become an indispensable tool for non-invasively investigating human brain function and functional connectivity [17]. However, the blood-oxygen-level-dependent (BOLD) signal measured in fMRI is inherently characterized by a poor signal-to-noise ratio (SNR), presenting a major barrier to its spatiotemporal resolution, utility, and ultimate impact [17]. The BOLD signal fluctuations related to neuronal activity are subtle, often representing only 1â€“2% of the total signal change under optimal conditions, and are dwarfed by various noise sources [18]. Effectively identifying and mitigating these artifacts is therefore a prerequisite for any reliable fMRI preprocessing pipeline. The major sources of noise can be categorized into three primary types: motion artifacts, physiological noise, and artifacts from magnetic field inhomogeneities. This application note details the characteristics of these noise sources, provides quantitative assessments of their impact, and outlines structured protocols for their mitigation to enhance the reliability of fMRI data in research and clinical applications.

Motion Artifacts

Characterization and Impact

Head motion during an fMRI scan is a major confound, causing disruptions in the BOLD signal through several mechanisms. It changes the tissue composition within a voxel, distorts the local magnetic field, and disrupts the steady-state magnetization recovery of the spins in the slices that have moved [19]. These effects lead to signal dropouts and artifactual amplitude changes that can dwarf true neuronal signals [18]. Crucially, motion artifacts can introduce spurious correlations in resting-state fMRI, with spatial patterns that can even resemble known resting-state networks like the default mode network, severely compromising the interpretation of functional connectivity [18]. The problem is exacerbated in clinical populations and pediatric studies where subject compliance may be variable.

Quantitative Metrics and Data

Table 1: Motion Artifact Impact and Mitigation Strategies

Metric/Strategy	Description	Impact on Data	Recommended Correction
Framewise Displacement (FD)	Measures volume-to-volume head movement.	Volumes with high FD can cause signal changes exceeding true BOLD signal.	Volume Censoring: Removing high-motion volumes and adjoining frames [19].
Distance-Dependent Bias	A systematic bias where correlations between signals from nearby regions are artificially enhanced.	Renders functional connectivity metrics unreliable [19].	Structured Matrix Completion: A low-rank matrix completion approach to recover censored data [19].
QC-FC Correlation	Correlates motion parameters with functional connectivity matrices.	High values indicate motion is inflating correlations; a key diagnostic metric [20].	Concatenated Regression: Using all nuisance regressors in a single model, though sequential regression may offer superior test-retest reliability [20].

Experimental Protocol for Motion Correction

Objective: To implement a motion correction pipeline that effectively minimizes motion-induced variance without reintroducing artifacts or sacrificing data integrity.

Data Acquisition & Realignment: Acquire fMRI data using fast imaging sequences (e.g., Echo Planar Imaging - EPI) to minimize within-volume motion. Perform six-parameter rigid body transformation to realign all volumes to a reference volume (e.g., the first volume). This assumes the head is a rigid body and corrects for translational and rotational movements between volumes [18].
Nuisance Regression: Generate nuisance regressors from the realignment parameters. Evidence suggests that a concatenated regression approach, where all regressors (motion, physiological, etc.) are included in a single model, is more effective than sequential application at preventing the reintroduction of artifacts [20].
Identification and Treatment of High-Motion Volumes:
- Calculate Framewise Displacement (FD) for each volume.
- Censoring (Scrubbing): Identify and remove volumes where FD exceeds a predetermined threshold (e.g., 0.2-0.5 mm). Also remove the volumes immediately preceding and following these high-motion time points to account for spin-history effects [19].
Advanced Recovery (Optional): To address the data loss and discontinuity caused by censoring, employ a structured low-rank matrix completion method. This technique models the fMRI time series using a Linear Recurrence Relation (LRR) and fills the censored entries by exploiting the underlying structure and correlations in the data, resulting in a continuous, motion-compensated time series [19].

Physiological Artifacts

Characterization and Impact

Physiological noise originates from non-neuronal, periodic bodily processes, primarily the cardiac cycle and respiration. These processes cause small head movements, changes in chest volume that alter the magnetic field, and variations in cerebral blood flow and volume, all of which introduce structured noise into the fMRI signal [18]. In resting-state fMRI, where spontaneous neuronal signal changes are typically only 1-2%, the signal contributions from physiological noise remain a considerable fraction, posing a significant challenge for analysis [18]. Unlike thermal noise, physiological noise is structured and non-white, meaning it has a specific temporal signature and cannot be removed by simple averaging.

Quantitative Metrics and Data

Table 2: Physiological Noise Sources and Correction Tools

Noise Source	Primary Effect	Tool/Algorithm for Mitigation	Function
Cardiac Pulsation	Rhythmic signal changes, particularly near major blood vessels.	RETROICOR	Uses physiological recordings to create noise regressors based on the phase of the cardiac and respiratory cycles.
Respiration	Causes magnetic field fluctuations and spin history effects.	Respiratory Volume per Time (RVT)	Models the low-frequency influence of breathing volume on the BOLD signal.
Non-Neuronal Global Signal	Widespread signal fluctuations of non-neuronal origin.	ICA-based Denoising (e.g., tedana)	Identifies and removes noise components deemed to be non-BOLD related based on their TE-dependence or other statistics [21].

Experimental Protocol for Physiological Noise Correction

Objective: To separate and remove signal components arising from cardiac and respiratory cycles from the neurally-derived BOLD signal.

Physiological Data Recording: Simultaneously with the fMRI scan, record the subject's cardiac pulse (using a pulse oximeter) and respiration (using a pneumatic belt). These signals should be recorded at a high sampling rate (e.g., 100 Hz or more).
Noise Model Generation:
- For cardiac and respiratory phase-based noise, use the RETROICOR algorithm. This involves determining the phase of the cardiac and respiratory cycles at each fMRI volume acquisition and generating Fourier series regressors to model the noise associated with these phases.
- For low-frequency respiratory effects, calculate the Respiratory Volume per Time (RVT) from the respiration trace and convolved it with a canonical respiratory response function to create appropriate regressors.
Component-Based Correction (for Multi-Echo data): If multi-echo fMRI data is available, use a tool like tedana. This approach leverages the fact that BOLD-induced signal changes exhibit a linear dependence on echo time (TE), while many physiological and motion-related artifacts do not. Independent components analysis (ICA) is performed, and components are classified and removed based on their TE-dependence [21].
Regression: Include the generated physiological noise regressors (from step 2) in the same concatenated nuisance regression model as the motion parameters to remove them from the BOLD time series.

Magnetic Field Inhomogeneities

Characterization and Impact

Magnetic field inhomogeneities refer to distortions in the main static magnetic field (B0) caused by variations in magnetic susceptibility at tissue interfaces (e.g., between air in sinuses and brain tissue). These inhomogeneities are significantly increased at higher magnetic field strengths (e.g., 3T and 7T) [22]. In fMRI, these distortions manifest as geometric warping, signal loss (dropouts), and blurring, particularly in regions near the sinuses and ear canals, such as the frontal and temporal lobes [23]. These artifacts compromise spatial specificity and can lead to misalignment between functional data and anatomical references.

Quantitative Metrics and Data

Table 3: Distortion Correction Methods at High Magnetic Fields

Correction Method	Principle	Performance at 7T (High-Resolution)	Key Metric Improvement
B0 Field Mapping	Acquires a map of the static magnetic field inhomogeneities and corrects the EPI data during reconstruction.	Improves cortical alignment.	Moderate improvement in Dice Coefficient (DC) and Correlation Ratio (CR) compared to no correction [23].
Reverse Phase-Encoding (Reversed-PE)	Acquires two EPI volumes with opposite phase-encoding directions to estimate the distortion field.	Shows superior performance in achieving faithful anatomical alignment, especially in frontal/temporal regions [23].	More substantial improvements in DC and CR, with the largest benefit in regions of high susceptibility [23].

Experimental Protocol for Distortion Correction

Objective: To correct for geometric distortions and signal loss in fMRI data caused by magnetic field inhomogeneities.

Data Acquisition for Correction:
- Option A (Field Map): Acquire a field map. This typically involves a dual-echo gradient echo sequence that allows for the calculation of a B0 field map, which quantifies the field inhomogeneity at each voxel.
- Option B (Reverse Phase-Encoding): Acquire two additional EPI volumes (one in the Anterior-Posterior phase-encoding direction and one in the Posterior-Anterior direction) with identical parameters to the functional run. This is often the preferred method for high-resolution fMRI at ultra-high fields [23].
Application of Correction:
- For Field Map correction, use the field map to unwarp the functional time series, correcting the geometric distortions.
- For Reverse Phase-Encoding correction, use tools such as FSL's topup to estimate the distortion field from the two opposing phase-encoding volumes and then apply this field to correct the entire functional time series.
Validation: After correction, validate the results by inspecting the alignment of the corrected functional data with the subject's high-resolution anatomical scan. Quantitative metrics like the Dice Coefficient (DC) and Correlation Ratio (CR) can be used to assess the improvement in cortical alignment [23].

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Tools for fMRI Noise Mitigation

Tool/Software	Type	Primary Function in Noise Mitigation
NORDIC PCA	Denoising Algorithm	Suppresses thermal noise, the dominant noise source in high-resolution (e.g., 0.5 mm isotropic) fMRI, leading to major gains in tSNR and functional CNR without blurring [17].
Total Variation (TV) Denoising	Denoising Algorithm	Enforces smoothness in the BOLD signal by minimizing total variation. Yields denoised multi-echo fMRI data and enables estimation of smooth, dynamic T2* maps [21].
FuNP	Preprocessing Pipeline	A fully automated, wrapper software that combines components from AFNI, FSL, FreeSurfer, and Workbench to provide both volume- and surface-based preprocessing pipelines [8].
tedana	Preprocessing Toolbox	Specialized for multi-echo fMRI data; uses ICA to denoise data by identifying and removing components that do not exhibit linear TE-dependence [21].
Structured Matrix Completion	Advanced Motion Correction	Recovers missing entries from censored (scrubbed) fMRI time series using a low-rank prior, mitigating motion artifacts while maintaining data continuity [19].
5-Hydroxybenzothiazole-2-carboxylic acid	5-Hydroxybenzothiazole-2-carboxylic acid, CAS:1261809-89-5, MF:C8H5NO3S, MW:195.2 g/mol	Chemical Reagent
3-(4-Chlorothiophen-2-yl)propanoic acid	3-(4-Chlorothiophen-2-yl)propanoic acid\|CAS 89793-51-1

Integrated Noise Mitigation Workflow

The following diagram illustrates a logical, integrated workflow for addressing the three major noise sources in a coordinated preprocessing pipeline.

Figure 1: A recommended sequential workflow for mitigating major noise sources in fMRI preprocessing. The process begins with motion correction, followed by physiological noise removal, magnetic field inhomogeneity correction, and concludes with advanced denoising techniques for final signal enhancement.

A rigorous approach to identifying and mitigating noise is fundamental to the reliability of any fMRI preprocessing pipeline. Motion, physiological processes, and magnetic field inhomogeneities represent the most significant sources of artifact that can confound the interpretation of the BOLD signal. By implementing the structured protocols and utilizing the advanced tools outlined in this documentâ€”such as motion censoring with matrix completion, model-based physiological noise regression, reverse phase-encoding distortion correction, and powerful denoising algorithms like NORDIC and Total Variation minimizationâ€”researchers can significantly enhance the quality and fidelity of their data. This, in turn, ensures more robust and reproducible results in both basic neuroscience research and clinical drug development applications.

Functional Magnetic Resonance Imaging (fMRI) has become a cornerstone of modern neuroscience, enabling non-invasive investigation of brain function and connectivity. However, the reliability of fMRI findings is fundamentally contingent upon the preprocessing pipelines used to remove noise and artifacts from the raw data. Inadequate preprocessing strategies systematically introduce spurious correlations and false activations, threatening the validity of neuroimaging research and its applications in clinical and drug development settings. This application note examines the primary sources of these artifacts, their impacts on functional connectivity and activation analyses, and provides detailed protocols for mitigating these issues within the broader context of fMRI preprocessing pipeline reliability research.

The complex nature of fMRI data, which captures endogenous blood oxygen level-dependent (BOLD) signals alongside numerous confounding factors, necessitates rigorous preprocessing before meaningful inferences can be drawn. As highlighted across multiple studies, failure to adequately address artifacts from head motion, physiological processes, and preprocessing methodologies themselves can generate spurious network connectivity and significantly distort brain-behavior relationships [24] [25] [26]. These issues are particularly pronounced in clinical populations where increased movement and pathological conditions amplify artifacts, potentially leading to erroneous conclusions about brain function and treatment effects.

Head Motion Artifacts

Head movement during fMRI acquisition introduces systematic noise that persists despite standard motion correction approaches. Even after spatial realignment and regression of motion parameters, residual motion artifacts continue to corrupt resting-state functional connectivity MRI (rs-fcMRI) data [24]. These artifacts exhibit distinctive patterns:

Decreased long-distance correlations between brain regions
Increased short-distance correlations in proximate brain areas
Non-linear relationships that linear regression techniques fail to fully capture

The impact of motion is not uniform across studies. Research demonstrates that motion artifacts have particularly severe consequences for clinical populations, including patients with disorders of consciousness (DoC), where inherent limitations in compliance and increased discomfort lead to greater movement [26]. In these populations, standard preprocessing pipelines may fail to detect known networks such as the default mode network (DMN), potentially leading to incorrect conclusions about network preservation.

Surface-Based Analysis Biases

Surface-based analysis of fMRI data, while offering advantages in cortical alignment, introduces its own unique artifacts. The mapping from volumetric voxels to surface vertices creates uneven inter-vertex distances across the cortical sheet [27]. This spatial bias manifests as:

Higher vertex density in sulci compared to gyral regions
Stronger correlations between neighboring sulcal vertices than between gyral vertices
Incorporation of anatomical folding information into fMRI time series

This "gyral bias" systematically distorts a range of common analyses including test-retest reliability, functional fingerprinting, parcellation approaches, and regional homogeneity measures [27]. Critically, because vertex density tracks individual cortical folding patterns, the bias introduces subject-specific anatomical information into functional connectivity measures, creating spurious correlations that can be misinterpreted as neural phenomena.

Filter-Induced Correlations

Standard preprocessing pipelines commonly employ band-pass filters (typically 0.009-0.08 Hz or 0.01-0.10 Hz) to isolate frequencies of interest in resting-state fMRI data. However, these filters artificially inflate correlation estimates between independent time series [25]. The statistical consequences are severe:

Increased false positive rates in functional connectivity analyses
Failure of multiple comparison corrections to fully control Type I errors
Up to 50-60% of detected correlations in white noise signals remain significant after correction

The cyclic nature of biological signals, combined with filter-induced autocorrelation, creates a fundamental statistical challenge for rs-fMRI. Without appropriate mitigation strategies, these filters systematically amplify spurious correlations, potentially invalidating connectivity findings.

Physiological and Non-Neural Signals

fMRI signals incorporate substantial non-neural contributions from various physiological processes and scanner-related artifacts. Cardiac and respiratory cycles introduce rhythmic fluctuations, while white matter and cerebrospinal fluid signals contain non-neural information that can contaminate connectivity metrics [28] [8]. Traditional denoising approaches based on linear regression may be insufficient to remove nonlinear statistical dependencies between brain regions induced by shared noise sources [28].

Table 1: Major Sources of Spurious Connectivity in fMRI Data

Source	Impact on Connectivity	Affected Analyses
Head Motion	Increases short-distance correlations; decreases long-distance correlations	Seed-based correlation, network analyses, group comparisons
Surface Vertex Density	Inflates sulcal correlations compared to gyral regions	Surface-based analyses, parcellation, regional homogeneity
Band-Pass Filtering	Artificially inflates correlation coefficients between time series	Resting-state functional connectivity, network detection
Physiological Noise	Introduces shared fluctuations unrelated to neural activity	All connectivity analyses, especially those without physiological monitoring

Experimental Evidence and Quantitative Impacts

Motion Artifact Quantification

Research by Power et al. demonstrated that subject motion produces substantial changes in resting-state fcMRI timecourses despite spatial registration and motion parameter regression [24]. In their analysis of multiple cohorts, they found that:

Framewise displacement varied significantly across age groups, with children showing higher motion parameters
Data "scrubbing" (removing high-motion frames) produced structured changes in correlation patterns
Short-distance correlations decreased by up to 30% after rigorous motion correction
Medium- to long-distance correlations increased by up to 20% after motion artifact removal

The impact of motion was particularly pronounced in developmental and clinical populations, with one child cohort requiring removal of up to 58% of data frames due to excessive motion [24]. These findings highlight how motion artifacts can create spurious developmental or group differences if not adequately addressed.

Surface Analysis Bias Magnitude

The surface-based analysis bias described by Feilong et al. creates substantial distortions in connectivity metrics [27]. Their investigation revealed:

Inter-vertex distances varied from approximately 1mm in sulci to 3mm at gyral crests
Vertex areas (proportional to the square of inter-vertex distances) varied by an even greater factor
Resting-state fMRI correlation between a vertex and its immediate neighbors showed a strong negative association with inter-vertex distance (r = -0.653, p < 0.001)
Normalized local correlation followed individual cortical folding patterns (r = -0.500, p < 0.001 with sulcal depth)

This bias has particular significance for studies examining individual differences in connectivity, as the artifact incorporates subject-specific anatomical information into functional measures [27].

Filter-Induced Inflation of Correlations

Recent work on the statistical limitations of rsfMRI has quantified the impact of band-pass filtering on correlation inflation [25]. Key findings include:

Standard band-pass filters (0.009-0.08 Hz) significantly increase correlation estimates between independent time series
When applied to white noise signals, these filters result in 50-60% of correlations remaining statistically significant after multiple comparison correction
The combination of cyclic biological signals and narrow band-pass filters creates autocorrelation that invalidates standard statistical assumptions
Sampling rate selection critically influences the degree of bias introduced by filtering

These findings challenge the validity of many resting-state connectivity studies and emphasize the need for specialized statistical approaches that account for filter-induced artifacts.

Table 2: Quantitative Impact of Preprocessing Artifacts on Connectivity Measures

Artifact Type	Measurement	Effect Size	Consequence
Head Motion	Change in short-distance correlations	25-30% increase	False local network detection
Head Motion	Change in long-distance correlations	15-20% decrease	Missed long-range connections
Surface Bias	Range of inter-vertex distances	1mm (sulci) to 3mm (gyri)	Sulcal-gyral correlation differences
Surface Bias	Correlation vs. inter-vertex distance	r = -0.653	Spatial sampling bias
Band-Pass Filter	Significant correlations in white noise	50-60% remain significant	Inflated false positive rate

Methodological Protocols for Artifact Mitigation

Motion Artifact Correction Protocol

Based on evidence from multiple studies, the following comprehensive motion correction protocol is recommended:

Step 1: Frame-Wise Displacement Calculation

Compute framewise displacement (FD) from rigid body head realignment parameters
Apply threshold of FD > 0.2-0.5mm to identify contaminated volumes (adjust based on data quality)
Generate DVARS (root mean square variance over voxels) to complement FD measures

Step 2: Motion Regression

Include 24 motion parameters (6 rigid body parameters, their derivatives, and squares) in regression model
Expand motion parameters using Volterra expansion to capture nonlinear relationships

Step 3: Data Scrubbing

Remove identified high-motion volumes from analysis
Replace with interpolation using adjacent low-motion frames
Alternatively, include spike regressors for contaminated volumes

Step 4: Quality Assessment

Calculate motion summary statistics for each participant
Exclude subjects with excessive motion (e.g., >25% frames contaminated)
Match motion parameters across groups in case-control studies

This comprehensive approach has been shown to significantly reduce motion-related artifacts while preserving neural signals [24] [26].

Surface Analysis Bias Mitigation Protocol

To address surface-based analysis biases, implement the following steps:

Step 1: Surface Mesh Evaluation

Quantify inter-vertex distances across the cortical surface
Identify regions with particularly high or low vertex density
Consider using surface meshes with more uniform vertex spacing

Step 2: Spatial Smoothing Adjustment

Account for variable vertex density when applying spatial smoothing kernels
Adjust smoothing kernel size based on local inter-vertex distances
Use geodesic distance-based smoothing rather than simple Gaussian kernels

Step 3: Validation with Surrogate Data

Generate random noise time series on the surface mesh
Calculate local correlation maps for surrogate data
Compare empirical results to surrogate data to identify bias

Step 4: Control for Sulcal Depth

Include sulcal depth as a covariate in group-level analyses
Examine whether findings persist after controlling for anatomical patterns

These steps help mitigate the uneven sampling bias inherent in surface-based analyses [27].

Filtering and Statistical Correction Protocol

To address filter-induced correlations and statistical artifacts:

Step 1: Filter Design

Adjust sampling rates to align with analyzed frequency band
Avoid excessively narrow band-pass filters without statistical correction
Consider alternative filtering approaches that minimize autocorrelation

Step 2: Surrogate Data Analysis

Generate phase-randomized surrogate data with preserved autocorrelation structure
Compare empirical correlation distributions to surrogate null distributions
Use Fourier Transform-based surrogates to account for cyclic properties

Step 3: Pre-whitening Approaches

Apply pre-whitening to address autocorrelation in time series
Use autoregressive models to account for temporal dependencies
Validate pre-whitening effectiveness with diagnostic tests

Step 4: Multiple Comparison Correction

Implement permutation-based correction methods
Use false discovery rate (FDR) approaches appropriate for correlated tests
Consider network-based statistics for connectivity matrices

This statistical framework helps control for artifactual correlations while preserving true neural connectivity [25].

Diagram 1: Comprehensive motion artifact correction workflow integrating framewise displacement calculation, motion parameter regression, and data scrubbing.

Specialized Pipelines for Clinical Populations

Stroke-Specific Preprocessing Pipeline

Recent research has developed specialized preprocessing pipelines for stroke patients with brain lesions [16]. The protocol includes:

Lesion-Aware Processing:

Account for lesions when computing tissue masks
Implement lesion masking to prevent spurious connectivity estimates
Adjust spatial normalization to accommodate structural abnormalities

Artifact Removal:

Incorporate independent component analysis (ICA) to address lesion-driven artifacts
Implement specialized denoising strategies for peri-lesional regions
Validate connectivity measures in both lesioned and preserved networks

Validation:

Assess pipeline performance using connectivity mean strength metrics
Evaluate functional connectivity contrast between networks
Verify that pipeline preserves behavioral prediction accuracy

This stroke-specific pipeline has been shown to significantly reduce spurious connectivity without impacting behavioral predictions [16].

Pipeline for Disorders of Consciousness

For patients with disorders of consciousness, the following enhanced protocol is recommended [26]:

Enhanced Motion Correction:

Implement aggressive motion parameter regression
Apply frame-wise exclusion with liberal thresholds
Use global signal regression as a nuisance covariate

Physiological Noise Removal:

Incorporate tissue-based regressors (white matter, CSF)
Apply band-pass filtering (0.009-0.08 Hz) after regression
Consider component-based noise correction (CompCor)

Validation with DMN Detection:

Verify default mode network detectability across pipelines
Compare independent component analysis and seed-based approaches
Use network detection as quality control metric

This enhanced protocol has demonstrated significantly improved DMN detection in patients with disorders of consciousness [26].

Diagram 2: Specialized preprocessing pipelines for clinical populations including stroke patients and disorders of consciousness (DoC), incorporating lesion masking and enhanced noise removal.

Emerging Solutions and Foundation Models

Automated Preprocessing Platforms

Several automated preprocessing platforms have emerged to address reproducibility challenges in fMRI analysis:

fMRIPrep: A robust, standardized preprocessing pipeline that incorporates best practices from multiple software packages while providing comprehensive quality control outputs [8] [3].

FuNP (Fusion of Neuroimaging Preprocessing): Integrates components from AFNI, FSL, FreeSurfer, and Workbench into a unified platform with both volume- and surface-based processing streams [8].

fMRIflows: Extends beyond preprocessing to include univariate and multivariate single-subject and group analyses, with flexible temporal and spatial filtering options optimized for high-temporal resolution data [3].

These automated platforms reduce pipeline variability and implement current best practices consistently across studies.

The NeuroSTORM Foundation Model

A recent innovation in fMRI analysis is the development of NeuroSTORM, a general-purpose foundation model trained on an unprecedented 28.65 million fMRI frames from over 50,000 subjects [6]. This model:

Employs a shifted scanning strategy based on a Mamba backbone for efficient 4D fMRI processing
Incorporates spatial-temporal optimized pre-training with task-specific prompt tuning
Demonstrates superior performance across diverse downstream tasks including phenotype prediction and disease diagnosis
Maintains high clinical relevance in predicting psychological/cognitive phenotypes

Foundation models like NeuroSTORM represent a paradigm shift toward standardized, transferable fMRI analysis that may help overcome many current preprocessing challenges [6].

Table 3: Automated Preprocessing Pipelines and Their Specialized Capabilities

Pipeline	Key Features	Specialized Applications	Validation
fMRIPrep	Robust integration of best practices, comprehensive QC	General-purpose processing, multi-site studies	Extensive validation against manual pipelines
FuNP	Combines AFNI, FSL, FreeSurfer, Workbench; GUI interface	Both volume- and surface-based analysis	RSN matching with pre-defined networks
fMRIflows	Univariate and multivariate analyses, flexible filtering	Machine learning preparation, high-temporal resolution data	Comparison with FSL, SPM, fMRIPrep
fMRIStroke	Lesion-aware processing, ICA for lesion artifacts	Stroke patients with brain lesions	Reduced spurious connectivity, preserved predictions

The Scientist's Toolkit: Essential Research Reagents

Table 4: Essential Software Tools and Processing Components for fMRI Preprocessing

Tool/Component	Function	Implementation Considerations
fMRIPrep	Automated preprocessing pipeline	Default pipeline for standard studies; BIDS-compliant
CompCor	Component-based noise correction	Effective for physiological noise removal; linear method that captures nonlinear dependencies
FSL	FMRIB Software Library	MELODIC ICA for data-driven artifact identification
FreeSurfer	Surface-based reconstruction	Essential for surface-based analyses; provides cortical surface models
fMRIStroke	Lesion-specific preprocessing	Critical for stroke populations; open-source tool available
NeuroSTORM	Foundation model for fMRI	Emerging approach for standardized analysis; requires significant computational resources
Nilearn	Python machine learning library	Provides masking, filtering, and connectivity analysis tools
Nipype	Pipeline integration framework	Enables custom pipeline development combining multiple packages
Fluometuron-N-desmethyl-4-hydroxy	Fluometuron-N-desmethyl-4-hydroxy, CAS:1174758-89-4, MF:C9H9F3N2O2, MW:234.17 g/mol	Chemical Reagent
6-(3-Methoxyphenyl)pyrimidin-4-ol	6-(3-Methoxyphenyl)pyrimidin-4-ol, CAS:1239736-95-8, MF:C11H10N2O2, MW:202.21 g/mol	Chemical Reagent

Spurious connectivity and false activations arising from poor preprocessing represent a fundamental challenge for fMRI research with significant implications for basic neuroscience and clinical applications. The artifacts introduced by head motion, surface analysis biases, filtering procedures, and physiological noise systematically distort functional connectivity measures and can lead to invalid conclusions. However, as detailed in this application note, rigorous methodological protocols employing frame-wise motion correction, surface bias mitigation, advanced statistical approaches, and population-specific pipelines can substantially reduce these artifacts. The development of automated preprocessing platforms and foundation models offers promising pathways toward more standardized, reproducible fMRI analysis. By implementing these detailed protocols and maintaining critical awareness of preprocessing limitations, researchers can enhance the validity and reliability of their fMRI findings, ultimately advancing our understanding of brain function in health and disease.

Quality control (QC) is a fundamental component of functional magnetic resonance imaging (fMRI) research, serving as the critical checkpoint that ensures data validity, analytical robustness, and ultimately, reproducible scientific findings. In the context of fMRI preprocessing pipeline reliability research, establishing standardized QC metrics is paramount for comparing results across studies, validating new methodological approaches, and building confidence in neuroimaging biomarkers. As the field increasingly moves toward larger multi-site studies and analysis of shared datasets, the implementation of consistent, comprehensive QC protocols becomes indispensable for distinguishing true neurological effects from methodological artifacts [29]. This protocol outlines the essential quality control metrics that should be examined in every fMRI preprocessing pipeline, providing researchers with a standardized framework for evaluating data quality throughout the processing workflow.

Core Quality Control Metrics Framework

Data Acquisition Quality Metrics

The foundation of quality fMRI research begins with assessing the intrinsic quality of the raw data acquired from the scanner. These metrics evaluate whether the basic data characteristics support meaningful scientific interpretation.

Table 1: Essential Data Acquisition Quality Metrics

Metric Category	Specific Metrics	Acceptance Criteria	Potential Issues
Spatial Coverage	Whole brain coverage, Voxel resolution	Complete coverage of regions of interest, Consistent dimensions across participants	Missing brain regions, Cropped cortical areas
Image Artifacts	Signal dropout, Ghosting, Reconstruction errors	Minimal visible artifacts, Consistent signal across brain	Susceptibility artifacts, Scanner hardware issues
Basic Signal Quality	Signal-to-Noise Ratio (SNR), Temporal SNR (tSNR)	Consistent across participants and sessions	Poor image contrast, System noise
Data Integrity	Header information, Timing files, Parameter consistency	Correct matching of acquisition parameters	Mismatched timing, Incorrect repetition time (TR)

The evaluation of raw data represents the first critical checkpoint in the QC pipeline [30]. Researchers should verify that images include complete whole brain coverage without missing regions, particularly in areas relevant to their research questions. Dropout artifacts, which often occur in regions prone to susceptibility artifacts such as orbitofrontal cortex and temporal poles, must be identified as they can render these areas unusable for analysis [30]. Reconstruction errors stemming from scanner hardware limitations should be flagged, as they introduce inaccuracies in the fundamental image representation [30].

Preprocessing Verification Metrics

Once basic data quality is established, the focus shifts to verifying the execution of preprocessing steps. These metrics evaluate the technical success of spatial and temporal transformations applied to the data.

Table 2: Preprocessing Step Verification Metrics

Processing Step	Evaluation Metrics	Quality Indicators	Tools for Assessment
Head Motion Correction	Framewise displacement (FD), Translation/rotation parameters	Mean FD < 0.2-0.3mm, Limited spikes in motion timecourses	FSL, SPM, AFNI, fMRIPrep
Functional-Anatomical Coregistration	Cross-correlation, Boundary alignment	Precise alignment of gray matter boundaries	SPM Check Registration, Visual inspection
Spatial Normalization	Dice coefficients, Tissue overlap metrics	High overlap with template (>0.8-0.9)	ANTs, FSL, SPM
Segmentation Quality	Tissue probability maps, Misclassification rates	Clear differentiation of GM, WM, CSF	SPM, FSL, Visual inspection
Susceptibility Distortion Correction	Alignment of opposed phase-encode directions	Reduced distortion in susceptibility-prone regions	FSL topup, AFNI 3dQwarp

Head motion correction represents one of the most critical preprocessing steps, with framewise displacement (FD) serving as the primary quantitative metric [31]. FD quantifies the relative movement of the head between consecutive volumes, with values exceeding 0.2-0.3mm typically indicating problematic motion levels [31]. The temporal pattern of motion should also be examined, as concentrated spikes of motion may require specialized denoising approaches or censoring [32].

Functional to anatomical coregistration is typically evaluated through visual inspection, where researchers verify alignment of functional data with anatomical boundaries [31]. Quantitative metrics such as normalized mutual information or cross-correlation can supplement visual assessment. Spatial normalization to standard templates (e.g., MNI space) should be evaluated using overlap metrics like Dice coefficients, with values typically exceeding 0.8-0.9 indicating successful normalization [33] [31].

Denoising and Confound Management Metrics

Following initial preprocessing, effective noise removal becomes essential for isolating biologically meaningful BOLD signals from various confounding sources.

Table 3: Denoising and Confound Assessment Metrics

Confound Type	Extraction Method	QC Metrics	Interpretation
Motion Parameters	Realignment algorithms (FSL MCFLIRT, SPM realign)	Framewise displacement, DVARS	Identification of motion-affected timepoints
Physiological Noise	CompCor, RETROICOR, Physiological recording	Spectral characteristics, Component timecourses	Verification of physiological noise removal
Global Signal	Global signal regression (GSR)	Correlation patterns with motion	Assessment of GSR impact on connectivity
Temporal Artifacts	ICA-AROMA, FIX	Component classification accuracy	Identification of residual noise components
Temporal Quality	Temporal SNR (tSNR), DVARS	Regional tSNR values, DVARS spikes	Overall temporal signal stability

Denoising efficacy must be evaluated in the context of the specific research question, as optimal strategies vary across applications [32]. For resting-state fMRI studies, the impact of different denoising pipelines on functional connectivity measures and their relationship with motion artifacts should be carefully examined [32]. Component-based methods such as ICA-AROMA require verification that noise components are correctly classified and removed without eliminating neural signals of interest [33].

Temporal signal-to-noise ratio (tSNR) provides a comprehensive measure of signal quality after preprocessing, with higher values indicating more stable BOLD time series [29]. DVARS measures the rate of change of BOLD signal across the entire brain at each timepoint, with spikes often corresponding to motion artifacts or abrupt signal changes [33]. The relationship between motion parameters and denoised signals should be examined to confirm successful uncoupling of motion artifacts from neural signals [32].

Experimental Protocols for QC Implementation

Protocol 1: Comprehensive fMRI QC Framework

This protocol outlines a systematic approach to quality control spanning the entire research workflow, from study planning through final analysis.

QC During Study Planning

Define QC priorities based on research hypotheses and regions of interest
Establish standardized operating procedures for data acquisition
Plan for collection of QC-supporting data (physiological recordings, behavior logs)
Implement data organization systems using the Brain Imaging Data Structure (BIDS) standard [33]

QC During Data Acquisition

Monitor real-time data quality for artifacts and coverage issues
Document unexpected events and participant behavior
Verify consistency of acquisition parameters across sessions

QC Soon After Acquisition

Check raw data for artifacts, coverage, and orientation
Calculate basic QC metrics (SNR, tSNR, motion parameters)
Generate visualization reports for initial quality assessment

QC During Processing

Verify successful completion of each processing step
Inspect intermediate outputs for alignment and normalization accuracy
Evaluate denoising effectiveness through confound regression diagnostics
Generate comprehensive QC reports for each participant [29]

Protocol 2: SPM-Based Preprocessing and QC Protocol

This protocol provides specific implementation details for a QC pipeline using Statistical Parametric Mapping (SPM) and MATLAB, adaptable to other software environments.

Initial Data Check (Q1)

Verify consistency of imaging parameters (TR, voxel sizes, volumes) across participants
Check anatomical image orientation and position relative to MNI template using SPM Check Registration
Inspect for artifacts (ghosting, lesions) in anatomical and functional images
Manually reorient images if necessary to align with template space [31]

Anatomical Image Segmentation and Check (P1, Q2)

Segment anatomical images into gray matter, white matter, and CSF using SPM Segment
Generate bias-corrected anatomical image for improved coregistration
Verify segmentation quality by overlaying tissue probability maps on normalized images
Flag participants with tissue misclassification for exclusion or manual correction [31]

Functional Image Realignment and Motion Check (P2, Q3)

Realign functional images to first volume using SPM Realign: Estimate & Reslice
Calculate framewise displacement (FD) from rigid body transformation parameters
Plot motion parameters across the sample to identify participants with excessive motion
Establish motion exclusion criteria appropriate for the research context [31]

Coregistration and Normalization Check (Q4, Q5)

Verify alignment of functional and anatomical images using Check Registration
Inspect normalized images for accurate transformation to standard space
Confirm that normalized images maintain structural details without excessive distortion [31]

Time Series Quality Check (Q6)

Generate gray matter mask and extract mean time series
Plot time series to identify residual artifacts or drifts
Calculate and review temporal SNR maps for signal stability assessment [31]

Visualization of QC Workflows

fMRI Quality Control Ecosystem

Preprocessing Pipeline Verification Workflow

Essential Research Reagents and Tools

Table 4: Critical Software Tools for fMRI Quality Control

Tool Category	Specific Tools	Primary Function	QC Application
Preprocessing Pipelines	fMRIPrep, fMRIflows, C-PAC	Automated preprocessing	Standardized data processing with integrated QC
QC-Specific Software	MRIQC, AFNI QC tools	Quality metric extraction	Automated calculation of QC metrics
Visualization Platforms	AFNI, SPM, FSLeyes	Data inspection	Visual assessment of processing results
Data Management	BIDS Validator, NiPreps	Data organization	Ensuring standardized data structure compliance
Statistical Analysis	SPM, FSL, AFNI	Statistical modeling	Integration of QC metrics in analysis

The selection of appropriate tools should be guided by the specific research context and analytical approach. fMRIPrep has emerged as a widely adopted solution for robust, standardized preprocessing that generates comprehensive QC reports [33] [34]. For researchers implementing custom pipelines, AFNI provides extensive QC tools including automated reporting through afni_proc.py [30]. MRIQC offers specialized functionality for evaluating raw data quality, particularly useful for large datasets and data from multiple sites [3].

Establishing a comprehensive quality control framework is not an optional supplement to fMRI research, but rather a fundamental requirement for producing valid, interpretable, and reproducible results. The metrics and protocols outlined here provide a baseline for evaluating preprocessing pipeline reliability across diverse research contexts. As the field continues to evolve toward more complex analytical approaches and larger multi-site collaborations, consistent implementation of these QC standards will be essential for building a cumulative science of human brain function. Future methodological developments should continue to refine these metrics while maintaining backward compatibility to enable direct comparison across historical and contemporary datasets.

From Standard Tools to Foundation Models: A Landscape of Modern fMRI Preprocessing Methodologies

Functional magnetic resonance imaging (fMRI) has become a cornerstone technique for investigating brain function in both basic research and clinical applications. The reliability of its findings, however, is heavily dependent on the data processing pipeline employed. Inconsistent or suboptimal preprocessing can introduce variability, reduce statistical power, and ultimately undermine the validity of scientific conclusions [3]. This challenge is particularly acute in translational contexts such as drug development, where objective, reproducible biomarkers are urgently needed to improve the efficiency of central nervous system therapeutic development [35] [36].

The neuroimaging community has responded to these challenges by developing standardized, automated processing pipelines. This application note provides a comprehensive benchmarking analysis of three prominent solutions: fMRIPrep, FSL FEAT, and fMRIflows. We examine their architectural principles, computational requirements, and operational characteristics to guide researchers in selecting appropriate pipelines for their specific research objectives, with particular attention to applications in drug development where both methodological rigor and practical efficiency are paramount.

Pipeline Summaries and Design Philosophies

fMRIPrep is a robust preprocessing pipeline that exemplifies the "glass box" philosophyâ€”providing comprehensive error reporting and visual outputs to facilitate quality assessment rather than operating as a black box. It leverages the best tools from multiple neuroimaging packages (FSL, ANTs, FreeSurfer, AFNI) for different processing steps, creating a robust interface that adapts to variations in scan acquisition protocols while requiring minimal user input [37]. Its design prioritizes ease of use through adherence to the Brain Imaging Data Structure (BIDS) standard, enabling fully automatic operation while maintaining transparency through visual reports for each subject [37].

FSL FEAT represents a more traditional, yet highly established, approach to fMRI analysis. Provided within the comprehensive FSL software library, it offers a complete workflow from preprocessing to higher-level statistical analysis. The pipeline can be implemented through both graphical interfaces and scripted commands, providing flexibility for users with different programming backgrounds [38]. Its longstanding presence in the field means it has extensive documentation and community knowledge, but it typically requires more manual configuration and parameter setting compared to more recently developed automated pipelines.

fMRIflows builds upon the foundation established by fMRIPrep but extends functionality to include both univariate and multivariate statistical analyses. This pipeline addresses the critical need for standardized statistical analysis in addition to preprocessing, recognizing that code transparency and objective analysis pipelines are essential for improving reproducibility in neuroimaging studies [3]. A distinctive feature of fMRIflows is its flexible temporal and spatial filtering capabilities, which are particularly valuable for high-temporal-resolution datasets and multivariate pattern analyses where appropriate filtering can significantly improve signal decoding accuracy [3].

Table 1: Core Characteristics and Design Principles of Three fMRI Pipelines

Feature	fMRIPrep	FSL FEAT	fMRIflows
Primary Focus	Minimal preprocessing	End-to-end analysis	Preprocessing + univariate/multivariate analysis
Design Philosophy	"Glass box"	Comprehensive toolbox	Fully automatic consortium
Analysis Scope	Preprocessing only	1st, 2nd, 3rd level univariate	1st, 2nd level univariate and multivariate
Tool Integration	Multi-software (FSL, ANTs, FreeSurfer, AFNI)	FSL-native	Extends fMRIPrep with statistical analysis
Key Innovation	Robustness to acquisition variability	Established, complete workflow	Flexible filtering for machine learning

Performance and Computational Benchmarking

Recent comparative studies have quantified meaningful performance differences between pipeline approaches, particularly regarding computational efficiency and statistical sensitivity. A comprehensive analysis of carbon emissions in fMRI processing revealed that fMRIPrep demonstrated slightly superior statistical sensitivity to both FSL and SPM, with FSL also outperforming SPM [39]. This enhanced sensitivity, however, comes with substantial computational costsâ€”fMRIPrep generated carbon emissions 30 times larger than those of FSL, and 23 times those of SPM [39]. This trade-off between statistical performance and environmental impact represents a critical consideration for researchers designing large-scale studies or working in computationally constrained environments.

The statistical advantages of fMRIPrep appear to vary by brain region, suggesting that the optimal pipeline choice may depend on the specific neural systems under investigation [39]. Additionally, compatibility issues between different preprocessing and analysis stages have been reported, such as boundary-based registration problems between Nipype-based preprocessing and FSL FEAT first-level statistics that can result in empty brain masks and inaccurate smoothness estimates [40]. These findings underscore the importance of rigorous quality control procedures regardless of the chosen pipeline.

Table 2: Performance and Operational Characteristics of fMRI Pipelines

Characteristic	fMRIPrep	FSL FEAT	fMRIflows
Statistical Sensitivity	Slightly superior to FSL/SPM [39]	Moderate	Not explicitly benchmarked
Computational Demand	High (30Ã— FSL carbon footprint) [39]	Low	Expected high (extends fMRIPrep)
Regional Specificity	Varies by brain region [39]	Not specified	Not specified
Output Spaces	MNI152NLin2009cAsym, fsaverage, fsLR [41] [42]	MNI152	Inherits fMRIPrep capabilities
Container Support	Docker, Singularity [41]	Native installation	Jupyter Notebooks

Experimental Protocols and Implementation

fMRIPrep Implementation Protocol

Container Deployment and Execution fMRIPrep is primarily distributed as containerized software to ensure reproducibility and simplify dependency management. Researchers can implement it using either Docker or Singularity, with the latter being more suitable for high-performance computing (HPC) environments where root privileges are typically restricted [41]. The standard execution workflow involves:

Container Installation: For Singularity, build the container with: singularity build $HOME/fmriprep.simg docker://poldracklab/fmriprep:latest [41].
TemplateFlow Setup: Install template resources for spatial normalization: pip install templateflow --target $HOME/.cache [41].
FreeSurfer Licensing: Obtain and configure a FreeSurfer license, freely available from the FreeSurfer website, which is required for anatomical processing steps even without full FreeSurfer reconstruction [41].
Pipeline Execution: Run the core preprocessing workflow with appropriate runtime parameters [41].

Example Execution Script A typical fMRIPrep batch script includes the following key parameters:

This configuration processes a single participant's data, outputting results normalized to the MNI152NLin2009cAsym template at 2mm resolution while skipping full FreeSurfer reconstruction to reduce computational demands [41].

FSL FEAT Analysis Protocol

Pipeline Structure and Level 1 Analysis FSL FEAT organizes analysis into multiple levels, with first-level analysis examining individual runs within subjects. The standard implementation pathway includes:

Data Preparation: Convert DICOM to NIFTI format, ensure LAS radiologial orientation (right-left, anterior-posterior, inferior-superior), and perform brain extraction using BET (Brain Extraction Tool) [38].
Design Specification: Create FEAT design files (.fsf) specifying processing parameters, experimental timing, and contrast definitions.
Level 1 Execution: Run first-level analysis for individual runs, producing statistical maps for each experimental condition and contrast [43].
Higher-Level Analysis: Implement level 2 (within-subject across runs) and level 3 (group-level) analyses as needed [38].

FSL FEAT Directory Structure and Configuration A standardized directory structure is essential for organized FSL FEAT implementation:

Key configuration files include:

model_params.json: Specifies processing parameters and modeling options
condition_key.json: Maps EV numbers to condition names (e.g., "1":"congruent_correct")
task_contrasts.json: Defines contrast vectors for statistical analysis (e.g., "incongruentvscongruent":[-1,-1,1,1]) [43]

fMRIflows Implementation Framework

Pipeline Architecture and Specification fMRIflows implements a modular architecture organized across five specialized processing pipelines, each configured through JSON specification files [3]:

Specification Preparation (01_spec_preparation.ipynb): Creates JSON configuration files with execution parameters based on the dataset and default parameters using Nbabel and PyBIDS [3].
Anatomical Preprocessing (02_preproc_anat.ipynb): Processes structural images through segmentation, spatial normalization, and surface reconstruction.
Functional Preprocessing (03_preproc_func.ipynb): Implements functional image processing, building upon fMRIPrep's approach while adding flexible filtering options.
First-Level Analysis (04_first_level.ipynb): Performs within-subject statistical analysis for both univariate and multivariate approaches.
Second-Level Analysis (05_second_level.ipynb): Conducts group-level statistical inference.

Multivariate Analysis Capabilities A distinctive feature of fMRIflows is its integrated support for multivariate pattern analysis (MVPA), which includes:

Flexible spatial filtering optimized for decoding accuracy [3]
Appropriate temporal filtering for high-temporal-resolution data [3]
Automated computation of both first-level and second-level multivariate contrasts [3]
Specialized quality control metrics relevant for multivariate analysis

Workflow Visualization

Diagram 1: Comparative Workflow Architecture of Three fMRI Pipelines

Table 3: Essential Software and Computational Resources for fMRI Pipeline Implementation

Resource	Type	Function	Pipeline Application
BIDS Dataset	Data Standard	Organized neuroimaging data following community standards	All pipelines (required for fMRIPrep/fMRIflows)
Docker/Singularity	Container Platform	Reproducible software environments and dependency management	fMRIPrep (primary), fMRIflows (potential)
TemplateFlow	Template Repository	Standardized spatial templates for normalization	fMRIPrep, fMRIflows
FreeSurfer License	Software License	Enables anatomical processing capabilities	fMRIPrep, fMRIflows
High-Performance Computing	Computational Infrastructure	Parallel processing for computationally intensive steps	All pipelines (essential for fMRIPrep)
FSL Installation	Software Library	Comprehensive neuroimaging analysis tools	FSL FEAT (native), fMRIPrep (components)
Python Ecosystem	Programming Environment	Custom scripting and pipeline integration	All pipelines (extensive for fMRIflows)
Quality Control Tools	Visualization Software	Result validation and outlier detection	All pipelines (integrated in fMRIPrep)

The benchmarking analysis presented herein reveals that pipeline selection involves navigating critical trade-offs between computational efficiency, statistical sensitivity, and analytical scope. For researchers operating in drug development contexts, where both methodological rigor and practical efficiency are paramount, we offer the following evidence-based recommendations:

For maximal preprocessing reliability and reproducibility, particularly in multi-site studies, fMRIPrep offers superior robustness to acquisition variability and comprehensive quality control, despite its substantial computational demands [37] [39].
For computationally constrained environments or standardized univariate analyses, FSL FEAT provides a balanced solution with reasonable statistical sensitivity and significantly lower carbon footprint [39] [38].
For studies employing machine learning or multivariate pattern analysis, fMRIflows delivers specialized functionality with integrated analytical capabilities, building upon fMRIPrep's robust preprocessing foundation [3].

As the neuroimaging field continues to evolve toward increasingly transparent and reproducible practices, these standardized pipelines represent valuable tools for enhancing the reliability of fMRI findings in both basic research and translational applications such as drug development [35]. Future developments will likely focus on optimizing the balance between computational demands and analytical performance while expanding support for emerging analytical approaches.

The reliability of functional magnetic resonance imaging (fMRI) data is fundamentally constrained by the preprocessing pipeline employed. Noise from scanner artifacts, subject motion, and other non-neural sources introduces significant temporal correlations in the blood oxygen level-dependent (BOLD) timeseries, limiting the reliability of individual-subject results [44]. The field has historically lacked standardization, with researchers often rewriting processing pipelines for each new dataset, thereby compromising reproducibility and transparency [45]. Preprocessing parameter selection, including bandpass filter choices and noise regression techniques, significantly influences key outcome measures including data noisiness, test-retest reliability, and the ability to discriminate between clinical groups [44]. It is within this critical context that consortium-based, fully automatic pipelines like fMRIflows emerge as a transformative solution. By providing a standardized, transparent, and comprehensive framework for fMRI analysis, fMRIflows directly addresses the core challenges of preprocessing reliability, enabling researchers to achieve more valid and reproducible results at both the individual-subject and group levels [45].

fMRIflows represents a consortium of fully automatic neuroimaging pipelines developed to standardize and streamline fMRI analysis. Its primary objective is to provide a unified solution that encompasses the entire fMRI processing workflow, from initial preprocessing to advanced statistical analysis, thereby improving code transparency, quality control, and objective analysis pipelines [45]. This initiative responds to the documented need for automated and reproducible preprocessing pipelines, as exemplified by tools like Nipype and fMRIPrep, but extends further by integrating both univariate and multivariate analysis methodologies into a single, coherent framework [45] [46].

The core structure of fMRIflows is composed of multiple, interdependent pipelines. These include standardized modules for anatomical and functional preprocessing, first- and second-level univariate analysis, and multivariate pattern analysis [45] [46]. A key innovation of fMRIflows is its flexible approach to temporal and spatial filtering. This flexibility is crucial for accommodating datasets with increasingly high temporal resolution and for optimally preparing data for advanced machine learning analyses, ultimately improving the accuracy and reliability of signal decoding [45]. The toolbox is implemented in Python and is designed to be fully automatic, reducing the barrier to entry for employing sophisticated analysis techniques while ensuring consistency and reproducibility across studies [45].

Experimental Protocols and Methodologies

Protocol 1: Preprocessing Pipeline Optimization for Reliability

Objective: To identify the optimal preprocessing parameters that minimize noise and maximize test-retest reliability while retaining group discriminability in resting-state fMRI (rs-fMRI) data [44].

Materials and Methods:

Datasets: A total of 181 rs-fMRI scans and 38 subject-driven memory scans from four independent datasets.
Preprocessing Variables: The protocol systematically manipulates two key preprocessing parameters:
- Bandpass Filter Selection: Applying different frequency cutoffs (e.g., <0.08 Hz for low-frequency fluctuations) to isolate neural signals from physiological noise [44] [47].
- Noise Regression Strategies: Employing various noise regressors, including those derived from component-based noise correction (CompCor), white matter, cerebrospinal fluid signals, and global signal regression [44].
Analysis Methods: Functional connectivity is analyzed using dual regression and region-of-interest (ROI)-based approaches.
Outcome Measures: The optimized pipeline is evaluated against three primary metrics:
- Signal-Noise Separation: Quantified by the functional contrast-to-noise ratio (fCNR) and temporal signal-to-noise ratio (tSNR) [48].
- Test-Retest Reliability: Assessed using intra-class correlation coefficients (ICC) and spatial overlap metrics (e.g., Dice coefficient) across repeated scans [44] [49].
- Group Discrimination: The ability to distinguish between distinct populations (e.g., patients vs. controls) is quantified using effect sizes and classification accuracy in the context of the preprocessed data [44].

Implementation: The protocol is implemented within the fMRIflows preprocessing module, which allows for the flexible configuration of filtering and regression parameters. The pipeline outputs quality control metrics for each outcome measure, enabling empirical optimization.

Protocol 2: Comparative Validation of fMRIflows Against Established Pipelines

Objective: To validate the performance of fMRIflows against other widely used neuroimaging processing pipelines (e.g., fMRIPrep, FSL, SPM) across multiple datasets with varying acquisition parameters [45].

Materials and Methods:

Datasets: Three independent datasets with varying temporal sampling rates and acquisition parameters are used for validation.
Comparative Pipelines: fMRIflows is compared against fMRIPrep (for preprocessing), FSL (FEAT), and SPM.
Performance Metrics:
- Preprocessing Quality: Visual quality control of anatomical and functional data processing (e.g., accuracy of brain extraction, tissue segmentation, and functional-to-anatomical co-registration).
- Analysis Results Consistency: Spatial concordance of activation maps from univariate analyses and pattern reproducibility from multivariate analyses.
- Computational Efficiency: Processing time and computational resource requirements.
Statistical Analysis: Quantitative comparison of outcome measures, such as the sensitivity and specificity of activation detection, and the robustness of functional connectivity estimates.

Protocol 3: Application to a Clinical Neuroscience Question

Objective: To demonstrate the application of fMRIflows in a clinical context by investigating functional and structural brain reorganization in Age-Related Macular Degeneration (AMD) [47].

Materials and Methods:

Participants: Cohort of AMD patients and age-matched healthy controls.
fMRIflows Analysis Modules:
- Preprocessing: Utilizing the standardized anatomical and functional preprocessing pipelines.
- Resting-State fMRI Analysis: Employing the ROI-based functional connectivity or dual regression analysis to probe alterations in the default mode network, visual network, and sensorimotor network [47].
- Task-based fMRI Analysis: Using the univariate analysis module to assess brain activation in response to peripheral visual stimulation.
- Multivariate Pattern Analysis (MVPA): Applying the multivariate pipelines to decode neural representations in visual cortical areas.
Correlation with Behavior: Significant functional connectivity values are correlated with clinical scores, such as the Hospital Anxiety and Depression Scale or visual performance metrics [47].

Quantitative Data Synthesis and Comparison

Table 1: Impact of Preprocessing Choices on Key rs-fMRI Outcome Metrics (Adapted from [44])

Preprocessing Parameter	Signal-to-Noise Separation	Test-Retest Reliability (ICC)	Group Discrimination Accuracy
Stringent Bandpass Filter	Moderate Improvement	High Improvement	Variable (Risk of Signal Loss)
Liberal Bandpass Filter	Lower Improvement	Moderate Improvement	Potentially Higher
Global Signal Regression	Significant Improvement	High Improvement	May Reduce Biologically Relevant Variance
Component-Based Noise Correction	High Improvement	High Improvement	High Improvement

Table 2: Comparison of fMRIflows Features with Other Pipelines (Synthesized from [45])

Feature	fMRIflows	fMRIPrep	FSL	SPM
Standardized Preprocessing	Yes	Yes	Yes	Yes
Univariate Analysis	Yes (1st & 2nd level)	No	Yes (FEAT)	Yes
Multivariate Analysis	Yes (Including MVPA)	No	Limited	Limited
Flexible Temporal Filtering	Yes	Limited	Yes	Yes
Fully Automatic Pipeline	Yes	Yes	No	No

Table 3: Key Variance Components in Univariate vs. Multivariate fMRI Analysis (Based on [50])

Source of Variance	Sensitivity in Univariate Analysis	Sensitivity in Multivariate Analysis
Subject-Level Variability	High	Insensitive
Voxel-Level Variability	Low	High
Trial-Level Variability	High	High

Visualization of Workflows and Logical Relationships

Overall Workflow of fMRIflows

Univariate vs. Multivariate Analytical Approaches

Univariate vs. Multivariate Analysis

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 4: Essential Research Reagents and Computational Solutions for fMRIflows

Item / Solution	Function / Purpose	Example / Note
fMRIflows Software	Core analysis platform providing fully automatic pipelines for preprocessing, univariate, and multivariate analysis.	Accessible via GitHub [46].
High-Performance Computing Cluster	Executes computationally intensive preprocessing and multivariate pattern analysis.	Required for large-scale datasets.
Standardized Template (MNI)	Anatomical reference space for spatial normalization of brain images.	Ensures consistency across studies.
Paradigm Design Software	Presents stimuli and records behavioral responses during task-based fMRI.	E-Prime, PsychoPy, or Presentation.
Quality Control Metrics	Quantifies data quality for optimization (tSNR, fCNR, motion parameters).	Integrated within fMRIflows.
Neuroimaging Data Formats	Standardized file formats for data interchange (NIfTI, BIDS).	Ensures pipeline compatibility [45].
2-(Trifluoroacetyl)cycloheptanone	2-(Trifluoroacetyl)cycloheptanone, CAS:82726-77-0, MF:C9H11F3O2, MW:208.18 g/mol	Chemical Reagent
4-Chloro-2-sulfanylbenzoic acid	4-Chloro-2-sulfanylbenzoic Acid\|Research Chemical	4-Chloro-2-sulfanylbenzoic acid is a benzoic acid derivative for research use only. It is not for human or veterinary use. Explore its potential applications.

Functional magnetic resonance imaging (fMRI) has become an indispensable technique for studying brain function and connectivity in both healthy and pathological populations. However, the inherent noise and artifacts in fMRI signals can significantly compromise analysis accuracy, particularly in clinical populations such as stroke patients who present with complex neurological conditions including brain lesions [16]. Currently, the neuroimaging field lacks consensus on the optimal preprocessing approach for stroke fMRI data, leading to widespread methodological variability that undermines reproducibility and validity of findings [16]. The presence of cerebral lesions introduces unique challenges for standard preprocessing pipelines, including distorted anatomical normalization, miscalculated tissue segmentation, and lesion-driven physiological artifacts that can generate spurious functional connectivity results [51]. This application note examines specialized preprocessing workflows and lesion masking techniques designed specifically for stroke populations, framing them within the broader context of enhancing fMRI preprocessing pipeline reliability for clinical research and therapeutic development.

Quantitative Comparison of Stroke-Specific Preprocessing Pipelines

Pipeline Architectures and Performance Metrics

Recent research has systematically evaluated specialized preprocessing approaches tailored to address the unique challenges of stroke neuroimaging. Table 1 summarizes the key performance characteristics of three predominant pipeline architectures assessed for stroke fMRI data, highlighting their differential impacts on critical outcome measures relevant to clinical research and drug development.

Table 1: Comparative Performance of Stroke-Specific fMRI Preprocessing Pipelines

Pipeline Type	Key Methodological Features	Impact on Spurious Connectivity	Effect on Behavioral Prediction	Normalization Accuracy with Lesions
Standard Pipeline	Conventional volume-based processing; No lesion accounting	Baseline reference level	No significant impact	Poor with large lesions
Enhanced Pipeline	Accounts for lesions in tissue mask computation; Basic noise regression	Moderate reduction	No significant impact	Good with unified segmentation
Stroke-Specific Pipeline	ICA-based lesion artifact correction; Advanced confound regression	Significant reduction [16]	No significant impact [16]	Good with unified segmentation [51]

The empirical evidence indicates that while the stroke-specific pipeline significantly reduces spurious functional connectivity without adversely affecting behavioral prediction accuracy, all pipelines maintain comparable performance in predicting clinically relevant behavioral outcomes [16]. This suggests that pipeline selection should be guided by the specific research objectivesâ€”whether emphasizing connectivity purity or behavioral correlation.

Impact of Lesion Masking and Physiological Control

Table 2 quantifies the specific contributions of individual preprocessing components to data quality in stroke populations, providing researchers with evidence-based guidance for pipeline optimization.

Table 2: Component-Level Analysis of Preprocessing Effectiveness in Stroke fMRI

Processing Component	Key Parameters	Impact on Activation	Effect on Normalization	Recommendation for Stroke Populations
Lesion Masking	Manual or automated lesion identification; Applied during preprocessing and analysis	Significant decrease in sensorimotor activation [51]	Good accuracy with unified segmentation regardless of lesion size [51]	Essential for both preprocessing and group-level analysis
Movement Amplitude Regression	Kinematic measurements during passive tasks	Significant decrease in sensorimotor activation [51]	Not applicable	Critical for motor system studies in patients with brain lesions
Physiological Noise Control	Physiological data (e.g., cardiac, respiratory) as regressors	Significant decrease in sensorimotor activation [51]	Not applicable	Recommended for all stroke studies, particularly motor tasks

The unified segmentation routine implemented in modern tools like SPM12 demonstrates robust normalization accuracy even in the presence of stroke lesions, regardless of their size [51]. However, the incorporation of movement features and physiological noise as nuisance covariates significantly impacts sensorimotor activation patterns, making these elements particularly relevant for interpreting motor system studies in patients with brain lesions [51].

Experimental Protocols for Stroke fMRI Preprocessing

fMRIStroke Pipeline Implementation Protocol

The fMRIStroke pipeline represents a specialized BIDS application designed to run on outputs of fMRIPrep for preprocessing both task-based and resting-state fMRI data from stroke patients [52]. Below is the detailed methodological protocol for implementation:

Prerequisite Processing: Execute standard preprocessing using fMRIPrep to generate initial derivatives. fMRIStroke is explicitly designed to build upon fMRIPrep outputs and requires these for proper operation [52].
Lesion Mask Integration: Provide manually or automatically generated lesion masks in standardized space. The pipeline incorporates these masks for quality checks and confound calculation [52].
Quality Control Generation: Execute specialized quality checks including:
- Hemodynamic lag maps using Rapidtide for detecting abnormal hemodynamic timing [52]
- Homotopic connectivity analysis if FreeSurfer reconstruction was performed [52]
- Registration visualization with lesion mask overlay [52]
Confound Variable Calculation: Generate stroke-specific confound regressors including:
- Lesion signal: Average BOLD signal within the lesion mask [52]
- CSF-lesion signal: Signal from combined CSF and lesion mask [52]
- ICA components: ICA-based confounds following Yourganov et al. (2017) methodology for artifact removal [52]
Denoising and Connectivity Analysis: Apply confound regression to generate denoised BOLD series, then compute functional connectivity matrices using standardized atlases and connectivity measures [52].

Lesion Masking Protocol for Group Analysis

Research indicates that the strategic application of lesion masks at different processing stages significantly impacts final results [51]:

Preprocessing-Level Masking: Apply lesion masks during the initial preprocessing stages to improve tissue segmentation and spatial normalization accuracy. The unified segmentation approach in SPM12 demonstrates particular robustness for normalizing brains with lesions [51].
Analysis-Level Masking: Implement masking strategically during second-level (group) analysis:
- Apply a group lesion mask when no patients have lesions in target regions of interest [51]
- Omit masking when patients have lesions in regions relevant to the research question [51]
- Use voxel-based lesion-symptom mapping for studies focusing on lesion-deficit relationships [16]
Kinematic and Physiological Control: For motor tasks in spastic patients:
- Record movement kinematics (e.g., wrist extension amplitudes) during passive tasks [51]
- Acquire physiological data (cardiac, respiratory) during scanning [51]
- Include these measurements as nuisance regressors in general linear models [51]

Visualization of Stroke fMRI Preprocessing Workflows

Integrated Stroke fMRI Processing Architecture

Lesion Masking Decision Pathway

Table 3: Essential Research Tools for Stroke fMRI Preprocessing

Tool/Resource	Type	Primary Function	Application in Stroke Research
fMRIStroke	Specialized Pipeline	Stroke-specific quality checks and confound calculation	Generates lesion-aware confounds and QC metrics for stroke data [52]
fMRIPrep	Core Preprocessing Pipeline	Robust, analysis-agnostic fMRI preprocessing	Foundation for fMRIStroke; handles diverse dataset idiosyncrasies [33]
ANNs/FSL	Registration & Segmentation	Brain extraction, spatial normalization, tissue segmentation	Unified segmentation handles lesions effectively [51]
ICA-AROMA	Noise Removal	Automatic removal of motion artifacts via ICA	Adapted in stroke-specific pipeline for lesion-driven artifacts [16]
Rapidtide	Hemynamic Assessment	Hemodynamic lag mapping	Detects abnormal hemodynamic timing in peri-lesional tissue [52]
Lesion Masks	Data Input	Manual or automated lesion identification	Critical for lesion-adjusted processing and analysis [51]
Statistical Parametric Mapping (SPM12)	Statistical Analysis	General linear modeling and statistical inference	Supports lesion masking at multiple processing levels [51]

The development of specialized preprocessing pipelines for stroke fMRI represents a significant advancement in clinical neuroimaging methodology. The evidence indicates that stroke-specific workflows, particularly those incorporating lesion masking and tailored artifact correction, substantially reduce spurious functional connectivity while preserving the predictive validity of behavioral correlations [16]. For pharmaceutical researchers and clinical scientists, these methodological refinements offer enhanced sensitivity for detecting true treatment effects in therapeutic trials. The implementation of standardized, open-source tools like fMRIStroke promotes reproducibility across multi-site clinical studies, potentially accelerating the development of neurorehabilitative interventions and neuroprotective therapies [52]. As fMRI continues to illuminate the dynamic processes of functional reorganization and recovery following stroke [53], robust preprocessing methodologies ensure that observed effects reflect genuine neurobiological phenomena rather than methodological artifacts, thereby strengthening the translational pathway from basic research to clinical application.

Functional Magnetic Resonance Imaging (fMRI) has become an indispensable tool for studying brain function and connectivity in both research and clinical settings. However, the field has been plagued by significant challenges in reproducibility and transferability, largely due to the vast heterogeneity in data formats, preprocessing pipelines, and analytic models [6]. This analytic flexibility, combined with the large number of possible processing parameters, has led to considerable methodological variability across studies, contributing to what has been termed a "reproducibility crisis" in neuroimaging [54]. The emergence of foundation models presents a paradigm-shifting framework that addresses these challenges through scalable learning across tasks and improved robustness achieved via large-scale pre-training and adaptable architectures [6].

Foundation models, initially developed for natural language processing, have demonstrated remarkable multitask capabilities by training on web-scale text corpora. This success has inspired analogous developments in the medical domain, where these models are being applied to overcome challenges such as anatomical variability and limited annotated data [6]. Unlike traditional fMRI analysis methods that typically reduce dimensionality by projecting data onto pre-defined brain atlases or connectomesâ€”operations that result in irreversible information loss and impose structural biasesâ€”foundation models can learn generalizable representations directly from raw 4D fMRI volumes [6]. The NeuroSTORM model represents a significant advancement in this domain, offering a standardized, open-source foundation model to enhance reproducibility and transferability in fMRI analysis for clinical applications [6].

NeuroSTORM: Architecture and Technical Specifications

NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized and Representation Modeling) is a general-purpose fMRI foundation model specifically designed to overcome the fundamental challenges of analyzing raw 4D fMRI data, which comprises up to 10^6 voxels per scan, posing severe computational and optimization bottlenecks [6]. The model introduces several architectural innovations that enable efficient processing of fMRI data while maintaining high representational power.

Core Architectural Innovations

Shifted-Window Mamba (SWM) Backbone: NeuroSTORM employs a novel backbone that combines linear-time state-space modeling with shifted-window mechanisms to reduce computational complexity and GPU memory usage while maintaining the ability to capture long-range dependencies in fMRI data [6].
Spatiotemporal Redundancy Dropout (STRD): During pre-training, this module addresses the high spatiotemporal redundancy in 4D fMRI volumes, where standard Masked Autoencoders (MAEs) struggle to learn informative representations because masked voxels can often be trivially reconstructed from their spatial or temporal neighbors [6].
Task-specific Prompt Tuning (TPT): For downstream task adaptation, this strategy employs a minimal number of trainable, task-specific parameters when fine-tuning NeuroSTORM for new applications, providing a simple and integrated approach for applying the model across diverse domains [6].

Pre-training Corpus and Scope

Table 1: NeuroSTORM Pre-training Datasets

Dataset	Subjects	Age Range	Primary Focus
UK Biobank	40,842 participants	Adult to elderly	Population-level brain imaging
ABCD	9,448 children	Child development	Pediatric brain development
HCP-YA, HCP-A, HCP-D	>2,500 total	Multiple ranges	Lifespan brain connectivity

The extensive pre-training corpus ensures broad biological and technical variation by spanning diverse demographics (ages 5-100), clinical conditions, and acquisition protocols [6]. This diversity is crucial for developing a model with strong generalization capabilities across different populations and study designs.

Performance Benchmarking and Comparative Analysis

NeuroSTORM has been rigorously evaluated against state-of-the-art fMRI analysis methods across five diverse downstream tasks, demonstrating consistent outperformance or matching of existing methods [6]. The model's capabilities were assessed on both standard research tasks and clinically relevant applications.

Quantitative Performance Metrics

Table 2: NeuroSTORM Performance Across Downstream Tasks

Task	Datasets	Key Metric	Performance
Age & Gender Prediction	HCP-YA, HCP-A, HCP-D, UKB, ABCD	Gender classification accuracy	93.3% on HCP-YA [6]
Phenotype Prediction	HCP-YA, TCP	Psychological/cognitive trait correlation	High relevance maintained [6]
Disease Diagnosis	HCP-EP, ABIDE, ADHD200, COBRE, UCLA, MND	Diagnostic accuracy	Best performance among all methods [6]
fMRI Retrieval	NSD, LAION-5B	Retrieval accuracy	State-of-the-art performance [6]
tfMRI State Classification	HCP-YA	Classification accuracy	Consistently outperforms existing methods [6]

Clinical Validation and Transferability

A critical aspect of NeuroSTORM's evaluation involved assessing its clinical utility on real-world hospital data. The model was validated on two clinical datasets comprising patients with 17 different diagnoses from hospitals in the United States, South Korea, and Australia [6]:

Transdiagnostic Connectome Project (TCP): 245 participants from the Brain Imaging Center of Yale University or McLean Hospital, including healthy controls and individuals with a range of psychiatric disorders [6].
Motor Neuron Disease (MND): 36 patients with Amyotrophic Lateral Sclerosis and 23 controls, collected at the Royal Brisbane and Women's Hospital in Australia [6].

In these clinical validations, NeuroSTORM maintained high relevance in predicting psychological/cognitive phenotypes and achieved the best disease diagnosis performance among all existing methods [6]. Additionally, the model demonstrated robust performance in data-scarce scenarios, showing only minor performance degradation when limited proportions of fine-tuning data were available [6].

Experimental Protocols and Implementation

Data Preprocessing Protocol

The preprocessing of fMRI data for foundation model analysis requires careful consideration of data quality and standardization. The following protocol outlines the essential steps for preparing data for NeuroSTORM:

Primary Preprocessing with Established Pipelines: Ensure data has undergone primary processing using standardized pipelines such as fMRIPrep [34] [55] or HALFpipe [54]. fMRIPrep is particularly recommended as it is designed to provide "an easily accessible, state-of-the-art interface that is robust to variations in scan acquisition protocols" [34], performing minimal preprocessing including motion correction, field unwarping, normalization, bias field correction, and brain extraction [34].
Spatial Normalization: Confirm all data is aligned into MNI152 standard space to ensure consistent spatial coordinates across subjects and studies [56].
Brain Extraction: Apply brain extraction tools to remove non-brain tissues. The protocol available in the NeuroSTORM repository uses FSL BET (Brain Extraction Tool) for this purpose [56].
Data Conversion and Normalization: Use the provided NeuroSTORM tool preprocessing_volume.py to perform background removal, resampling to fixed spatial and temporal resolution, and Z-normalization, saving each frame in the processed format [56].

Model Pre-training Protocol

The pre-training phase follows a self-supervised learning approach optimized for fMRI data characteristics:

Data Loading and Augmentation:
- Load the preprocessed 4D fMRI volumes
- Apply spatial and temporal augmentations during training
- Utilize the Spatiotemporal Redundancy Dropout (STRD) to prevent trivial solutions [6]
Model Configuration:
- Set embedding dimensions to match the input data structure
- Configure window sizes and patch sizes appropriate for fMRI spatial-temporal dimensions
- Initialize the Shifted-Window Mamba (SWM) backbone with recommended parameters [6]
Training Execution:
- Run distributed training across multiple GPUs for large-scale data processing
- Monitor training progress with validation metrics
- Save model checkpoints at regular intervals

Downstream Task Fine-tuning Protocol

For applying NeuroSTORM to specific research or clinical questions, the following fine-tuning protocol is recommended:

Task Formulation: Clearly define the downstream task, which may include:
- Age and gender prediction
- Phenotype prediction
- Disease diagnosis
- fMRI-to-image retrieval
- Task-based fMRI state classification [6]
Data Preparation for Specific Tasks:
- For region-of-interest (ROI) analyses, convert 4D volumes to 2D ROI data using available brain atlases
- Ensure proper train/validation/test splits (typically 8:1:1 ratio) [6]
- Address class imbalance in clinical datasets through appropriate sampling techniques
Task-Specific Prompt Tuning:
- Implement the Task-specific Prompt Tuning (TPT) strategy
- Freeze the pre-trained backbone parameters
- Only update the task-specific prompt parameters and classification head [6]
Performance Validation:
- Evaluate on held-out test sets
- Compare against appropriate baseline models
- Perform statistical testing to confirm significance of improvements

Workflow Integration and Visualization

The integration of foundation models into existing fMRI analysis workflows represents a significant shift from traditional approaches. The following diagrams illustrate key workflows in the application of NeuroSTORM for fMRI analysis.

Diagram 1: Overall NeuroSTORM workflow from data to applications.

Model Architecture and Pre-training

Diagram 2: NeuroSTORM architecture and pre-training process.

Research Reagent Solutions

Implementing foundation models for fMRI analysis requires specific computational tools and resources. The following table details essential research reagents for working with models like NeuroSTORM.

Table 3: Essential Research Reagents for fMRI Foundation Models

Resource	Type	Function	Access
NeuroSTORM Platform	Software Framework	Complete platform for pre-training, fine-tuning, and evaluating fMRI foundation models	GitHub: CUHK-AIM-Group/NeuroSTORM [56]
fMRIPrep	Preprocessing Pipeline	Robust, analysis-agnostic tool for preprocessing fMRI data with minimal user input	https://fmriprep.org/ [34]
HALFpipe	Standardized Pipeline	Harmonized Analysis of Functional MRI pipeline, extends fMRIPrep functionality	Integrated tool [54]
UK Biobank	Dataset	Large-scale population dataset for pre-training	Application required [6]
ABCD Study	Dataset	Pediatric brain development data for pre-training	Controlled access [6]
HCP Datasets	Dataset	Multi-age brain connectivity data for pre-training	Controlled access [6]
Pre-trained Weights	Model Parameters	Pre-trained NeuroSTORM model checkpoints	Available with code [56]

Foundation models like NeuroSTORM represent a transformative approach to fMRI analysis that directly addresses the reproducibility crisis in neuroimaging. By learning generalizable representations directly from massive, diverse datasets and employing innovative architectural solutions to handle the unique challenges of 4D fMRI data, these models establish a new paradigm for brain connectivity analysis. The strong performance across diverse downstream tasksâ€”from basic demographic prediction to clinical diagnosisâ€”combined with robustness in data-scarce scenarios, positions foundation models as essential tools for next-generation fMRI research. The open-source nature of the NeuroSTORM platform ensures that these advances are accessible to the broader research community, potentially accelerating progress in both basic neuroscience and clinical applications.

Functional magnetic resonance imaging (fMRI) preprocessing serves as a critical foundation for reproducible neuroscience research and reliable clinical applications. Within the context of a broader thesis on fMRI preprocessing pipeline reliability, this document establishes that the selection of an appropriate preprocessing pipeline is not merely a technical preliminary but a fundamental methodological decision that directly influences the validity of subsequent scientific inferences. The rapid evolution of neuroimaging softwareâ€”encompassing volume- and surface-based approaches, classical algorithmic methods, and modern deep learning solutionsâ€”presents researchers with a complex landscape of options. This application note provides a structured decision matrix and detailed experimental protocols to guide researchers, scientists, and drug development professionals in selecting optimal preprocessing tools based on specific study designs, data types, and analytical requirements. By systematizing this selection process, we aim to enhance the reliability, efficiency, and reproducibility of neuroimaging research.

Modern fMRI preprocessing pipelines can be broadly categorized into classical and next-generation architectures. Classical pipelines, such as FSL, SPM, and AFNI, often require users to manually combine processing steps, while integrated pipelines like fMRIPrep provide robust, automated workflows [8]. A more recent development involves deep learning-powered pipelines, such as DeepPrep, which replace computationally intensive steps with trained models to achieve significant acceleration [57].

The table below summarizes key performance characteristics of several prominent pipelines, providing a quantitative basis for initial tool consideration.

Table 1: Performance and Characteristics of fMRI Preprocessing Pipelines

Pipeline Name	Primary Architecture	Key Features	Processing Time (Min per Subject, Mean Â± SD)	Key Strengths
fMRIPrep [58] [57]	Classical (Integrated)	Robust, automated workflow; BIDS-compliant	318.9 Â± 43.2 [57]	High reproducibility, transparency, widespread adoption
FuNP [8]	Classical (Fusion)	Fusion of AFNI, FSL, FreeSurfer, Workbench; volume & surface-based	Information Missing	Integrates multiple software strengths; flexible for volume/surface analysis
CAT12 [59]	Classical (SPM-based)	Efficient structural MRI segmentation & preprocessing	~30-45 minutes [59]	Optimized for T1-weighted data; good for volumetric analysis
DeepPrep [57]	Deep Learning	Uses FastSurferCNN, FastCSR, SUGAR for accelerated processing	31.6 Â± 2.4 (with GPU) [57]	High speed (10x faster); superior scalability & clinical sample robustness

The choice between these pipelines involves critical trade-offs. DeepPrep demonstrates a tenfold acceleration (31.6 Â± 2.4 minutes vs. 318.9 Â± 43.2 minutes for fMRIPrep) and superior robustness with clinical samples, achieving a 100% pipeline completion ratio compared to 69.8% for fMRIPrep [57]. However, the established validation history and transparency of classical pipelines like fMRIPrep may be preferable for initial methodological studies or when using standard, high-quality datasets.

Decision Matrix for Pipeline Selection

Selecting the optimal pipeline requires balancing computational resources, data characteristics, and research goals. The following decision matrix provides a systematic framework for this selection.

Table 2: Decision Matrix for Selecting an fMRI Preprocessing Pipeline

Criterion	High-Performance Computing (HPC) / GPU Available	Standard Computing (CPU-Only)	Large-Scale Datasets (N > 1000)	Clinical/Pathological Data	Focus on Cortical Surface
Recommended Tool	DeepPrep [57]	fMRIPrep [58] [57] or FuNP [8]	DeepPrep [57]	DeepPrep [57]	FuNP [8] or FreeSurfer
Rationale	Maximizes computational efficiency and speed.	Reliable performance without specialized hardware.	Designed for scalability and batch processing.	Higher success rate with anatomically atypical brains.	Specialized workflows for surface-based analysis.

Key Selection Criteria Elaboration

Computational Resources: DeepPrep's design, which leverages a workflow manager (Nextflow) and deep learning, allows it to dynamically allocate resources, making it highly efficient in both HPC and cloud environments. Its computational expense is 5.8 to 22.1 times lower than fMRIPrep [57]. For labs without GPU access, fMRIPrep remains a robust, widely validated option.
Data Scale and Scalability: For projects involving thousands of participants, such as the UK Biobank, DeepPrep's ability to process over 54,000 scans in just 6.5 days is unparalleled [57]. Its efficient batch-processing capability (1,146 participants per week) makes it the only tool currently demonstrated to handle data at this scale efficiently.
Data Quality and Pathology: When working with clinical populations where brain anatomy may be distorted by tumors, lesions, or trauma, DeepPrep has demonstrated superior robustness, successfully processing 100% of a challenging clinical sample (n=53) that caused conventional pipelines to fail [57].
Analysis Output Space: For studies requiring cortical surface-based analysis, FuNP provides a fully automated pipeline that combines volume and surface processing, which can improve the sensitivity of neuroimaging studies [8].

Experimental Protocols for Preprocessing and QA

Protocol 1: Structural MRI Preprocessing with CAT12

This protocol details the preprocessing of T1-weighted structural data using the CAT12 toolbox, a specialized plugin for SPM12 [59].

Workflow Diagram: Structural MRI Preprocessing with CAT12

Procedure:

Data Segmentation:
- In the MATLAB command window, enter spm fmri to open the SPM12 graphical interface.
- From the Toolbox menu, select CAT12.
- In the CAT12 GUI, click the Segment button to open the batch editor.
- Double-click the Volumes field and select all T1-weighted anatomical images for all subjects.
- Retain default parameters for other options. The Split job into separate processes field can be set to leverage multiple processors (default: 4) for parallel processing, significantly reducing computation time [59].
- Execute the module by clicking the green Go button. This process typically requires 30-45 minutes per subject [59].

Quality Assurance (QA) Checks:
- IQR Assessment: Upon completion, the MATLAB terminal displays an IQR (Image Quality Rating) value for each subject. An IQR above 75-80% generally indicates average to above-average data quality [59].
- Single-Slice Display: In the CAT12 GUI, navigate to Data Quality -> Single Slice Display. Select all grey matter segments (files prefixed mwp1) to visually inspect normalization accuracy across all subjects in a standardized space (e.g., axial slice through the anterior commissure).
- Sample Homogeneity Check: In the CAT12 GUI, navigate to Data Quality -> Check Sample Homogeneity. Under Sample Data, select all mwp1 images. For Quality measures, recursively load each subject's .xml file. Execute to generate a correlation matrix and boxplot. This assesses the mean correlation of each anatomical scan with every other scan; most brains in a homogeneous sample should correlate in the range of r=0.80-0.90 [59]. Manually inspect the most deviating data points.
Spatial Smoothing:
- From the SPM12 GUI, select Smooth.
- For the Images to smooth field, select all mwp1 (grey matter) files.
- Set the smoothing kernel full width at half maximum (FWHM) to [8 8 8] as recommended by the CAT12 manual for volumetric data [59].
- Run the smoothing module. Verify successful smoothing by inspecting the output files using SPM's Check Reg function; the images should appear visibly blurred.

Protocol 2: Functional MRI Preprocessing with FuNP

This protocol describes the use of the FuNP (Fusion of Neuroimaging Preprocessing) pipeline, which integrates components from AFNI, FSL, FreeSurfer, and Workbench for a fully automated, comprehensive functional MRI preprocessing workflow [8].

Workflow Diagram: FuNP fMRI Preprocessing Pipeline

Procedure:

Pipeline Selection and Input:
- Launch the FuNP software, which provides a user-friendly graphical interface.
- Select the preprocessing type: volume-based or surface-based pipeline. The surface-based pipeline processes cortical regions on a 2D surface while retaining subcortical regions in 3D volume space, which may improve sensitivity [8].
- Input the raw T1-weighted structural and functional MRI (fMRI) data. FuNP allows users to selectively enable or disable each preprocessing step and modify specific parameters (e.g., degrees of freedom for registration) as needed.

Volume-Based Preprocessing Steps (Core):
- De-Oblique: Corrects for tilted scan angles acquired to avoid MRI-induced artifacts, using AFNI's 3drefit [8].
- Re-orientation: Standardizes data orientation across all subjects using AFNI's 3dresample to a common template (e.g., RPI) to prevent mis-registration [8].
- Magnetic Field Inhomogeneity Correction: Corrects intensity variations caused by different brain tissues using AFNI's 3dUnifize to make white matter intensity more homogeneous, which is crucial for accurate tissue segmentation [8].
- Non-brain Tissue Removal: Removes skull, eyes, and other non-brain tissues using AFNI's 3dSkullStrip, focusing the analysis on the brain region of interest [8].
- Further steps include slice timing correction, motion realignment, spatial normalization to a standard space (e.g., MNI), and spatial smoothing.
Surface-Based Preprocessing (Optional):
- If the surface-based pipeline is selected, FuNP automatically generates a 2D surface representation of the cortical mantle. This involves cortical surface reconstruction and registration, typically using tools like FreeSurfer and Workbench [8].
- The final output includes preprocessed data in both volume and surface spaces, enabling analyses that leverage the advantages of both representations.

Protocol 3: Enhanced Single-Trial Response Estimation with GLMsingle

For task-based fMRI studies, particularly those with condition-rich designs and limited trial repetitions, accurately estimating single-trial responses is critical. GLMsingle is a specialized toolbox that optimizes the standard General Linear Model (GLM) to achieve this goal [60].

Workflow Diagram: GLMsingle Optimization Steps

Procedure:

Input Preparation:
- Prepare the input fMRI time-series data and a design matrix specifying the onset times of each experimental condition or trial.
- GLMsingle is available in both MATLAB and Python, requiring only these two inputs to function.

Model Optimization Steps:
- Baseline Estimation (b1: AssumeHRF): The algorithm first runs a baseline single-trial GLM using a canonical Hemodynamic Response Function (HRF). This provides a reference for evaluating the improvements of subsequent steps [60].
- Voxel-Wise HRF Identification (b2: FitHRF): For each voxel, GLMsingle iteratively fits a set of GLMs using a library of 20 different HRF shapes. It selects the HRF that provides the best fit (highest variance explained) to the voxel's time-course, thereby accommodating spatial variation in hemodynamic responses [60].
- Noise Regressor Derivation (b3: FitHRF + GLMdenoise): The algorithm uses a cross-validation procedure to identify noise regressors. It applies Principal Component Analysis (PCA) to time-series data from "noise" voxels (voxels unrelated to the experimental task) and selectively adds the top principal components to the GLM until the cross-validated variance explained is maximized on average across voxels [60].
- Regularization (b4: FitHRF + GLMdenoise + RR): In the final and most impactful step, GLMsingle employs voxel-wise fractional ridge regression. This technique uses cross-validation to determine a custom regularization parameter for each voxel, which stabilizes beta estimates and reduces noise, particularly for designs with closely spaced trials [60].
Output and Validation:
- The primary output is a set of optimized single-trial beta estimates. Application to datasets like the Natural Scenes Dataset (NSD) and BOLD5000 has shown that the complete GLMsingle procedure (b4) substantially improves the test-retest reliability of response estimates across the cortex compared to the standard GLM approach [60].
- These improved betas tangibly benefit higher-level analyses, including representational similarity analysis (RSA) and multivoxel pattern analysis (MVPA), by providing a more stable and reliable neural signature for each experimental trial [60].

The Scientist's Toolkit: Essential Research Reagents

The following table catalogues key software tools and resources that constitute the essential "reagent solutions" for modern fMRI preprocessing research.

Table 3: Essential Software Tools for fMRI Preprocessing Research

Tool Name	Type	Primary Function	Application Note
fMRIPrep [58]	Integrated Pipeline	Robust, automated preprocessing of fMRI data.	A gold-standard for reproducible preprocessing; ideal for standard datasets and CPU-only environments.
DeepPrep [57]	Deep Learning Pipeline	Accelerated preprocessing using deep learning models.	Critical for large-scale studies (e.g., UK Biobank) and clinical data with pathologies.
FuNP [8]	Fusion Pipeline	Combines components of AFNI, FSL, FreeSurfer, Workbench.	Provides flexibility for hybrid volume- and surface-based analysis in a single pipeline.
CAT12 [59]	Structural Processing Toolbox	Efficient segmentation and preprocessing of T1-weighted MRI.	Integrated within SPM12; excellent for rapid and reliable volumetric analysis of structural data.
GLMsingle [60]	Analysis Optimization Toolbox	Enhances single-trial BOLD response estimation in task-fMRI.	Indispensable for condition-rich designs; improves SNR and reliability of trial-level estimates.
AFNI [8]	Software Library	Low-level neuroimaging data processing (e.g., 3drefit, 3dUnifize).	Often used as a component within larger pipelines; provides powerful command-line tools.
FSL [8]	Software Library	Comprehensive library for MRI data analysis.	Another foundational library; its tools (e.g., BET, FLIRT) are widely integrated.
FreeSurfer [8] [57]	Software Suite	Automated cortical surface reconstruction and analysis.	The benchmark for surface-based analysis, though computationally intensive.
SPM12 [59]	Software Package	Statistical analysis of brain imaging data sequences.	A classic platform; forms the base for toolboxes like CAT12.
7-bromo-4-methoxyquinolin-2(1H)-one	7-Bromo-4-methoxyquinolin-2(1H)-one\|	7-Bromo-4-methoxyquinolin-2(1H)-one is For Research Use Only (RUO). Explore this quinoline scaffold for pharmaceutical and organic materials research. Not for human or veterinary use.	Bench Chemicals
2-cyano-N-(4-nitrophenyl)acetamide	2-cyano-N-(4-nitrophenyl)acetamide, CAS:22208-39-5, MF:C9H7N3O3, MW:205.173	Chemical Reagent	Bench Chemicals

The reliability of fMRI research findings is inextricably linked to the choice and execution of the preprocessing pipeline. This application note provides a structured framework, demonstrating that the selection is not one-size-fits-all but must be guided by the study's specific design, data characteristics, and computational resources. While classical pipelines like fMRIPrep continue to offer robustness and transparency for standard datasets, the emergence of deep learning-based tools like DeepPrep represents a paradigm shift, addressing critical challenges of scalability, speed, and robustness in the era of big data and clinical translation. By adhering to the detailed protocols and decision matrix provided, researchers can make informed, justified choices that enhance the methodological rigor and ultimate impact of their neuroimaging investigations.

Troubleshooting Common Pitfalls and Optimizing Pipelines for Enhanced Signal and Specificity

Advanced Motion Correction and Scrubbing Techniques for Data with Excessive Head Movement

Subject head movement remains one of the most significant confounding factors in functional magnetic resonance imaging (fMRI), directly impacting data quality and the validity of statistical inference [61]. In the context of research on fMRI preprocessing pipeline reliability, implementing robust motion mitigation strategies is paramount, particularly for studies involving populations prone to excessive movement (e.g., children, patients, elderly) [61]. Motion introduces a complex set of physical effects, including spin-history artifacts, intensity modulations, and distortions that can mimic or obscure true neural activity [61]. This application note details advanced retrospective correction and scrubbing techniques, providing structured protocols to enhance preprocessing pipeline reliability for data with challenging motion characteristics.

Technical Approaches to Motion Mitigation

Retrospective Motion Correction

Retrospective correction techniques are foundational to fMRI preprocessing, designed to address inter-volume inconsistencies. Standard rigid-body registration, which corrects for six degrees of freedom (translations and rotations), is implemented in tools like mcflirt (FSL) and is a core component of automated pipelines like fMRIPrep [62] [33].

Advanced Methods:

Non-rigid Registration: Goes beyond rigid-body assumptions to model complex motion-induced deformations in the brain, potentially offering superior correction using software like ANTs [63].
Deep Learning-Based Correction: Utilizes convolutional neural networks (CNNs) to learn complex motion patterns and perform data-driven correction, showing promise in handling severe motion [63].

Table 1: Summary of Retrospective Motion Correction Techniques

Technique	Description	Primary Tools	Key Advantages	Key Limitations
Rigid-Body Registration	Corrects for 6 degrees of freedom (3 translations, 3 rotations) between volumes.	`mcflirt` (FSL), `3dvolreg` (AFNI), `spm_realign` (SPM) [33]	Robust, sequence-independent, widely adopted [61].	Cannot correct for spin-history effects or intra-volume distortions [61].
Non-Rigid Registration	Models flexible, non-linear deformations of the brain.	ANTs [63]	Can correct for complex motion patterns that rigid-body cannot.	Computationally intensive; risk of over-correction.
Deep Learning Correction	Uses CNNs to learn and correct motion artifacts from data.	In-house/emerging models [63]	Data-driven; potential for high accuracy on severe motion.	Requires large training datasets; model generalizability can be a concern.

Data Scrubbing and Denoising

For data with excessive movement, correction alone is often insufficient. Scrubbing (or frame censoring) identifies and removes motion-contaminated volumes, while denoising uses statistical methods to isolate and regress out noise components.

A. Scrubbing Methodologies: Scrubbing involves identifying outlier volumes based on motion estimates or data integrity metrics.

Motion-Parameter Derived Scrubbing: Uses the Framewise Displacement (FD) metric to identify volumes with excessive motion. A common threshold is FD > 0.9 mm, with subjects excluded if over 20% of volumes exceed this limit [64].
Data-Driven Scrubbing: Uses metrics derived from the data itself, such as DVARS (standardized spatial variance of the differentiated data) [65] [66]. A novel method, "Projection Scrubbing," uses a statistical outlier detection framework with dimension reduction (e.g., ICA) to isolate artifactual variation, demonstrating a favorable balance of noise reduction and data retention [65].

B. Denoising Techniques:

Nuisance Regression: Incorporates motion parameters and other signals as regressors in the general linear model. Advanced approaches include 24 motion regressors (6 basic, their derivatives, and squares) and component-based methods like aCompCor, which identifies noise regions of interest (e.g., white matter, CSF) for signal regression [64].
Independent Component Analysis (ICA): A multivariate method that separates data into independent components, allowing for the identification and removal of artifact-related components (e.g., ICA-AROMA) [63] [65].

Table 2: Evaluation of Scrubbing and Denoising Efficacy in Task-fMRI (Multi-Dataset Study) [66]

Method	Max t-value in Group Analysis	ROI-based Mean Activation	Split-Half Reliability	Data Loss
24 Motion Regressors	Baseline	Baseline	Baseline	None
Frame Censoring (1-2% loss)	Consistent improvements	Comparable to other techniques	Comparable to other techniques	Low (1-2%)
Wavelet Despiking	Comparable to frame censoring	Comparable to frame censoring	Comparable to frame censoring	None
Robust Weighted Least Squares	Comparable to frame censoring	Comparable to frame censoring	Comparable to frame censoring	None
Note: No single approach consistently outperformed all others across all datasets and tasks, highlighting the context-dependent nature of optimal method selection.

Experimental Protocols

Protocol 1: Implementing a Robust Preprocessing Pipeline with fMRIPrep and Scrubbing

Objective: To preprocess fMRI data with integrated motion correction and scrubbing for downstream analysis. Reagents & Solutions: See Section 5, The Scientist's Toolkit.

Methodology:

Data Preparation: Ensure structural (T1-weighted) and functional (BOLD) data are organized in BIDS format [33].
Anatomical Preprocessing: Execute fMRIPrep to perform T1w brain extraction, tissue segmentation, and spatial normalization. fMRIPrep automatically generates a brain mask and transforms data to standard space (e.g., MNI) [62] [33].
Functional Preprocessing: fMRIPrep concurrently performs:
- Slice-timing correction.
- Head-motion estimation and correction via mcflirt.
- Susceptibility distortion correction.
- Co-registration of functional and anatomical data.
- Spatial normalization to standard space.
Confound Extraction: fMRIPrep outputs a comprehensive TSV file containing confound regressors, including Framewise Displacement (FD) and DVARS [62] [33].
Data Scrubbing:
- Calculate FD and DVARS for each volume.
- Identify "bad" volumes exceeding threshold (e.g., FD > 0.9 mm or DVARS > threshold [64]).
- Generate nuisance regressors for the general linear model (GLM) that mask out these contaminated volumes. For task-fMRI, modest censoring (1-2% data loss) has shown consistent benefits [66].
Nuisance Regression: In the first-level GLM, include the scrubbing regressors alongside other confounds (e.g., 24 motion parameters, aCompCor components, or global signal) [64].

Figure 1: Workflow for fMRIPrep integrated with scrubbing and denoising.

Protocol 2: Data-Driven Projection Scrubbing for Resting-State fMRI

Objective: To maximize data retention while effectively removing motion artifacts in resting-state fMRI, improving functional connectivity metrics. Reagents & Solutions: See Section 5, The Scientist's Toolkit.

Methodology:

Standard Preprocessing: Complete initial preprocessing (motion correction, normalization, etc.) using a pipeline like fMRIPrep.
Dimension Reduction: Apply strategic dimension reduction to the preprocessed BOLD time series. Independent Component Analysis (ICA) is highly effective for isolating artifactual sources of variation [65].
Outlier Detection: Use a statistical outlier detection framework (e.g., based on Mahalanobis distance) applied to the components identified in step 2 to flag volumes with prominent artifacts [65].
Volume Censoring: Generate and include nuisance regressors in the connectivity model to censor the identified outlier volumes.
Validation: Assess outcomes using benchmarks for functional connectivity, including validity, reliability, and identifiability (fingerprinting). Projection scrubbing has been shown to improve fingerprinting without negatively impacting validity or reliability, while censoring far fewer volumes than stringent motion scrubbing [65].

Figure 2: Logic flow for data-driven projection scrubbing.

Discussion and Application

The choice of motion mitigation strategy is highly context-dependent. For task-based fMRI, modest frame censoring (1-2% data loss) can yield consistent improvements, though it is often comparable to other denoising techniques like wavelet despiking [66]. For resting-state fMRI, where maximizing data is critical for connectivity measures, data-driven scrubbing methods like projection scrubbing offer a superior balance by improving fingerprinting while minimizing data loss and avoiding negative impacts on reliability [65].

A critical consideration is the sample characteristic. Populations such as children, patients, or the elderly often exhibit more pronounced motion [61]. In these cases, stringent exclusion criteria (e.g., excluding subjects with >20% of volumes with FD > 0.9mm) may lead to significant data loss and reduced statistical power [64]. Adopting advanced, data-driven scrubbing techniques is therefore essential for preserving sample size and enhancing the reliability of findings in population neuroscience and clinical drug development research.

The Scientist's Toolkit

Table 3: Essential Software Tools and Resources for Advanced Motion Mitigation

Tool/Resource	Function	Application in Protocol
fMRIPrep [62] [33]	A robust, analysis-agnostic pipeline for automated fMRI preprocessing.	Core engine for Protocols 1 & 2, handling anatomical and functional preprocessing and confound extraction.
FSL	A comprehensive library of MRI analysis tools.	Provides `mcflirt` for motion correction (used by fMRIPrep) and `ICA-AROMA` for ICA-based denoising.
ANTs	Advanced normalization and segmentation tools.	Used for non-rigid registration and brain extraction within fMRIPrep [63].
Nipype	A Python framework for integrating neuroimaging software packages.	The workflow engine that orchestrates fMRIPrep's modular design [33].
BIDS Format [33]	The Brain Imaging Data Structure, a standardized format for organizing neuroimaging data.	Mandatory input format for fMRIPrep, ensuring reproducibility and ease of data sharing.
fMRIflows [3]	A consortium of pipelines extending fMRIPrep to include univariate and multivariate statistical analysis.	Useful for researchers seeking a fully automated pipeline from preprocessing to group-level analysis.
Custom Scripts (Python/R)	For implementing specialized scrubbing algorithms.	Required for executing data-driven projection scrubbing as detailed in Protocol 2 [65].
Ethyl 2-(oxetan-3-yl)propanoate	Ethyl 2-(oxetan-3-yl)propanoate, MF:C8H14O3, MW:158.19 g/mol	Chemical Reagent

Spatial smoothing is a critical preprocessing step in functional magnetic resonance imaging (fMRI) analysis, traditionally implemented using isotropic Gaussian kernels with a heuristically selected Full-Width at Half Maximum (FWHM). This conventional approach enhances the signal-to-noise ratio (SNR) and mitigates inter-subject anatomical variability for group-level analyses. However, the brain's complex functional architecture, particularly within the highly folded cerebral cortex and the anisotropic white matter pathways, renders uniform smoothing suboptimal. The application of a fixed filter throughout the brain inevitably blurs distinct functional boundaries, reducing spatial specificity and potentially inducing false positives in regions adjacent to true activations [67]. This limitation is particularly problematic for clinical applications such as presurgical planning, where precise localization of eloquent cortex is paramount [67].

The pursuit of greater precision in fMRI has catalyzed the development of adaptive spatial smoothing methods. These advanced techniques tailor the smoothing process at the voxel level, guided by local features of the data, such as underlying anatomy, tissue properties, or the functional time series itself. This document, framed within broader thesis research on fMRI preprocessing pipeline reliability, details the limitations of conventional smoothing and provides application notes and experimental protocols for implementing modern adaptive methods. These protocols are designed to help researchers and drug development professionals improve the sensitivity, specificity, and spatial accuracy of their fMRI analyses.

Limitations of Isotropic Gaussian Smoothing

The standard practice of isotropic Gaussian smoothing applies a filter of constant width and uniform shape across all brain voxels. While computationally efficient and simple to implement, this one-size-fits-all approach presents several significant drawbacks:

Reduced Spatial Specificity: Isotropic smoothing dilates active regions, potentially causing inactive voxels near active regions to be mistakenly identified as active due to spatial blurring. This compromises the accuracy of subject-level analyses, a critical concern for applications like fMRI fingerprinting and presurgical mapping [67].
Mismatch with Brain Anatomy: The cerebral cortex is a thin, highly folded sheet of gray matter. Isotropic smoothing filters do not respect this complex geometry and can blur signals across adjacent gyri and sulci [67]. Similarly, in white matter, where BOLD signals exhibit an anisotropic correlation structure aligned with axonal pathways, isotropic Gaussian filters are fundamentally inadequate and blur across anatomically distinct tracts [68] [69].
Artifacts from k-Space Truncation: fMRI data are acquired in a finite subset of k-space, which introduces ringing artifacts and 'side lobes' upon image reconstruction. While Gaussian smoothing can suppress these artifacts, it does so inefficiently. If the kernel is too narrow, artifacts remain; if too wide, it unnecessarily degrades spatial resolution [70].

Adaptive Spatial Smoothing Methodologies

Adaptive spatial smoothing methods overcome the limitations of isotropic Gaussian filters by allowing the properties of the smoothing kernel to vary based on local information. The following sections detail several prominent approaches.

Deep Neural Network-Based Smoothing

Principle: This method uses a deep neural network (DNN), typically comprising multiple 3D convolutional layers, to learn optimal, data-driven spatial filters directly from the unsmoothed fMRI data. The network learns to incorporate information from a large number of neighboring voxels in a time-efficient manner, producing an adaptively smoothed time series [67].

Table 1: Key Components of a DNN for Adaptive Smoothing

Component	Architecture/Function	Benefit
Input Data	Batches of unsmoothed 4D fMRI data (n Ã— T Ã— x Ã— y Ã— z)	Enables processing of high-resolution data
3D Convolutional Layers	Multiple layers with 3Ã—3Ã—3 kernels; number of filters (F_i) can vary	Acts as data-driven spatial filters learned from the data
Fully Connected Layers	Applied to the output of the final convolutional layer	Assigns optimized weights to generate the final smoothed time series
Constraints	Sum constraint on convolutional layers; non-negative constraint on fully connected layers	Ensures model stability and physiological plausibility

The following diagram illustrates the typical workflow for a DNN-based adaptive smoothing approach:

Anatomy-Informed Smoothing

Principle: These methods leverage high-resolution anatomical scans to guide the smoothing process, ensuring it conforms to the brain's structure.

Vasculature-Informed Spatial Smoothing (VSS): For white matter fMRI, VSS uses susceptibility-weighted imaging (SWI) to identify the predominant direction of local vasculature. A Frangi filter is applied to extract the vascular orientation, which is then used to weight a graph-based smoothing filter, restricting smoothing along the direction of blood vessels [68].
Diffusion-Informed Spatial Smoothing (DSS): Also for white matter, DSS employs diffusion tensor imaging (DTI) to infer the orientation of axonal bundles. A graph is constructed where edges represent the structural connectivity between voxels, derived from the diffusion data. Spatial smoothing is then performed preferentially along these white matter pathways [69].

Table 2: Quantitative Comparison of Anatomy-Informed Smoothing Results

Method	Input Data	Regional Homogeneity (ReHo) vs. GSS	Independent Component Analysis (ICA) Quality (Dice Score)
Gaussian Smoothing (GSS)	fMRI	Baseline	Baseline
Diffusion-Informed (DSS)	fMRI + dMRI	Significantly increased (p<0.01)	Comparable to VSS (p=0.06)
Vasculature-Informed (VSS)	fMRI + SWI	Significantly increased (p<0.01)	Significantly higher than GSS (p<0.05)

Gaussian Process Regression Smoothing

Principle: Gaussian Process (GP) regression provides a principled Bayesian framework for adaptive smoothing. It models the data as a GP and infers a spatially varying smoothing kernel that depends on the local characteristics of the neural activity patterns. This method achieves an optimal trade-off between sensitivity (noise reduction) and specificity (preservation of fine-scale structure) without the need for pre-specified kernel shapes [71].

Prolate Spheroidal Wave Function (PSWF) Filters

Principle: The PSWF filter is designed specifically to correct for artifacts arising from the truncation of k-space during data acquisition. The 0th-order PSWF is the function that maximizes the signal energy within a defined region of image-space for a given compact support in k-space. When used as a smoothing filter, it effectively suppresses ringing artifacts with minimal loss of spatial resolution compared to a Gaussian filter of equivalent width [70].

Table 3: Performance Comparison of Gaussian vs. PSWF Filters

Filter Type	K-Space Artifact Correction	Statistical Power (FWHM <8mm)	Spatial Resolution Preservation
Gaussian Filter	Inefficient (requires wider kernel)	Lower	Poorer for narrow kernels
PSWF Filter	Optimal and efficient	Significantly higher	Superior

Experimental Protocols

This section provides detailed protocols for implementing key adaptive smoothing methods.

Protocol: DNN-Based Adaptive Smoothing

This protocol outlines the procedure for implementing the deep neural network approach for task fMRI data [67].

Data Preparation:
- Input: Acquire high-resolution (preferably sub-millimeter) task-based fMRI data and corresponding structural T1-weighted images.
- Preprocessing: Use a robust pipeline like fMRIPrep [55] [33] to perform minimal preprocessing, including motion correction, distortion correction, and brain extraction. Crucially, omit the spatial smoothing step.
- Formatting: Convert preprocessed data into a suitable format for deep learning (e.g., NIfTI). Partition the 4D fMRI data into smaller batches (e.g., n Ã— T Ã— x Ã— y Ã— z Ã— 1) to manage memory load during training.
Network Architecture & Training:
- Model Definition: Construct a DNN with multiple 3D convolutional layers (e.g., kernel size 3Ã—3Ã—3) followed by fully connected layers. Implement sum constraints on the convolutional kernels and non-negative constraints on the fully connected layer weights.
- Loss Function: Define a loss function that maximizes the correlation between the smoothed output and the task design, potentially incorporating terms that penalize excessive spatial blurring.
- Training: Train the model using the unsmoothed fMRI data as input and the task design matrix to guide the optimization. Use a data generator to feed batches to the model.
Output & Validation:
- Output: The model will output the DNN-smoothed fMRI time series.
- Validation: Compare the activation maps generated from the DNN-smoothed data against those from conventionally smoothed data using metrics of spatial specificity, activation effect size, and concordance with known neuroanatomy.

Protocol: Vasculature-Informed Spatial Smoothing (VSS) for White Matter

This protocol describes the steps for implementing VSS to enhance white matter functional connectivity [68].

Multi-Modal Data Acquisition:
- Acquire the following datasets for each subject:
  - Resting-state fMRI: High-resolution 7T data is recommended (e.g., TR=1.5s, 400 time points).
  - Susceptibility-Weighted Imaging (SWI): To visualize venous vasculature.
  - Diffusion MRI (dMRI): For comparison with DSS.
  - Structural T1-weighted Imaging.
Vasculature Mapping:
- Preprocess the SWI data to enhance contrast.
- Apply a Frangi filter to the preprocessed SWI volume to identify tubular structures (veins) and extract the peak vasculature direction within each voxel.
Graph Construction and Filtering:
- Construct a graph where nodes represent white matter voxels and edges connect neighboring voxels.
- Weight the edges of the graph based on the agreement between the local vasculature direction (from SWI) and the spatial orientation of the edge.
- Apply the resulting VSS graph filter to the unsmoothed white matter fMRI data.
Analysis and Quality Control:
- Calculate Regional Homogeneity (ReHo) and perform Independent Component Analysis (ICA) on the VSS-smoothed data.
- Statistically compare these metrics against those derived from GSS- and DSS-smoothed data using paired t-tests and Mann-Whitney U-tests, as appropriate.

The workflow for implementing anatomy-informed smoothing, such as VSS or DSS, is summarized below:

The Scientist's Toolkit

Table 4: Essential Research Reagents and Software Solutions

Tool/Reagent	Function/Purpose	Example Source/Software
fMRIPrep	A robust, analysis-agnostic pipeline for standardized fMRI preprocessing. Provides a solid foundation before applying adaptive smoothing. [55] [33]	https://fmriprep.org
FSL	A comprehensive library of neuroimaging tools. Used for various preprocessing steps and for comparison of analysis methods. [55]	https://fsl.fmrib.ox.ac.uk/fsl/fslwiki
ANTs	Advanced Normalization Tools. Used for high-precision image registration and brain extraction within pipelines like fMRIPrep. [55]	http://stnava.github.io/ANTs/
Deep Learning Framework	Provides the environment to define, train, and deploy DNNs for adaptive smoothing.	TensorFlow, PyTorch
Graph Signal Processing Library	Enables the implementation of anatomy-informed smoothing methods like DSS and VSS.	Custom Python code (e.g., https://github.com/MASILab/vss_fmri) [68]
Nipype	A Python framework for integrating multiple neuroimaging software packages into cohesive and reproducible workflows. [3]	https://nipype.readthedocs.io
fMRIflows	A consortium of fully automatic pipelines that includes flexible spatial filtering options, suitable for both univariate and multivariate analyses. [3]	https://github.com/miykael/fmriflows

The move from isotropic Gaussian smoothing to adaptive methods represents a significant advancement in fMRI preprocessing, directly enhancing the reliability of derived results. Techniques such as DNN-based smoothing, anatomy-informed VSS/DSS, and GP regression offer powerful, data-driven means to improve spatial specificity and functional contrast. The experimental protocols and resources provided herein offer a pathway for researchers and clinicians to implement these methods, promising more accurate and biologically plausible maps of brain function for both basic neuroscience and clinical drug development.

Functional Magnetic Resonance Imaging (fMRI) is a cornerstone of neuroscience research and clinical applications, yet the reliability of its findings is often challenged by significant inter-individual variability introduced during data preprocessing. Traditional preprocessing pipelines, such as the multi-step interpolation method used in FSL's FEAT, can induce unwanted spatial blurring, complicating the comparison of data across subjects. This Application Note explores the OGRE (One-Step General Registration and Extraction) pipeline, which implements a one-step interpolation method to consolidate multiple spatial transformations. We detail the protocol for implementing OGRE, present quantitative evidence demonstrating its superiority in reducing inter-subject variability and enhancing task-related signal detection compared to FSL and fMRIPrep, and provide essential tools for its adoption in research and clinical settings.

The validity of group-level fMRI analyses hinges on the accurate alignment of brain data across multiple individuals. A primary source of error in this process is the preprocessing pipeline itself. Multi-step interpolation, a common approach in widely used tools like FSL's FEAT, involves applying a sequence of sequential transformations (e.g., motion correction, registration, normalization) where each step resamples the data, potentially accumulating interpolation errors and spatial blurring [72] [73]. This "stacking" of transformations can amplify subtle differences in individual brain anatomy and data acquisition, increasing inter-subject variability and obscuring genuine biological signals. Reducing this noise is critical for enhancing the sensitivity of fMRI in basic research and its reliability in clinical applications, such as drug development where detecting subtle treatment effects on brain function is paramount. The OGRE pipeline was developed specifically to address this fundamental challenge.

Quantitative Performance Evaluation of OGRE

The efficacy of the OGRE pipeline was evaluated through a controlled comparison with two other prevalent methods: standard FSL preprocessing and fMRIPrep. The analysis used data from 53 adult volunteers performing a precision drawing task during fMRI scanning, with subsequent statistical analysis performed uniformly using FSL FEAT [72] [73].

Table 1: Comparison of Preprocessing Pipeline Performance

Performance Metric	OGRE	fMRIPrep	FSL-Preproc
Inter-Subject Variability	Lowest (p < 0.036 vs. fMRIPrep; p < 0.000000001 vs. FSL)	Intermediate	Highest
Task-Related Activation (Primary Motor Cortex)	Strongest (p = 0.00042 vs. FSL)	Not Significant vs. OGRE	Weaker
Core Interpolation Method	One-step	One-step	Multi-step

The data in Table 1 conclusively demonstrate that OGRE's one-step interpolation approach significantly outperforms multi-step methods in reducing inter-individual differences, thereby providing a more reliable foundation for group-level analyses.

Experimental Protocol: Implementing the OGRE Pipeline

This section provides a detailed protocol for replicating the comparative analysis of preprocessing pipelines as described in the featured study [72] [73].

Participant Cohort and Data Acquisition

Participants: 53 right-handed adults (38 female; mean age 47 Â± 18 years). Exclusion criteria included left-handedness, major neurological/psychiatric diagnoses, and standard MRI contraindications.
fMRI Acquisition: Data were acquired on a Siemens PRISMA 3T scanner.
- Task fMRI: T2*-weighted gradient echo EPI sequence; TR=662 ms, TE=30 ms, flip angle=52Â°, voxel size=3.0 mmÂ³, multi-band acceleration=6x.
- Task Design: Block-design precision drawing task with the right hand. Each block: 15.2s drawing, 15.2s rest. Three runs per participant.
- Structural MRI: T1-weighted (MP-RAGE) and T2-weighted images were acquired for registration and segmentation.
- Field Maps: Spin-echo field maps were collected for B0 distortion correction.

Preprocessing Pipelines and Statistical Analysis

All preprocessed data from the three pipelines were analyzed using an identical FSL FEAT General Linear Model (GLM) for volumetric statistical analysis.

OGRE Preprocessing Protocol: The pipeline is available from https://github.com/PhilipLab/OGRE-pipeline or https://www.nitrc.org/projects/ogre/ [73] [74].
- Brain Extraction and Parcellation: Perform brain extraction using FreeSurfer.
- One-Step Interpolation and Registration: Integrate motion correction, field map distortion correction, and warping to the MNI atlas (2 mm) via a single transformation step using FSL FLIRT and FNIRT.
- Output for FSL FEAT: Generate FEAT-formatted registration images and matrices. In FEAT, select the "Statistics" option (not "Full Analysis") and input the OGRE-preprocessed data.
FSL Preprocessing Protocol: This is the standard "Full Analysis" within FSL FEAT, which employs multi-step interpolation for each preprocessing transformation.
fMRIPrep Preprocessing Protocol: Preprocess data using fMRIPrep (version not specified in sources), which also employs a one-step interpolation method. The outputs are then formatted for subsequent GLM analysis in FSL FEAT.

Outcome Measures

Primary Outcome: Inter-Subject Variability. Quantified by measuring the residual variance across subjects after group-level alignment.
Secondary Outcome: Task-Related Activation Magnitude. The strength of the BOLD signal in the primary motor cortex contralateral to hand movement was compared across pipelines.

Workflow Visualization: OGRE vs. Multi-Step Processing

The following diagram illustrates the fundamental architectural difference between the multi-step approach and OGRE's one-step method, highlighting the source of reduced error accumulation.

Diagram 1: A comparison of multi-step and one-step interpolation workflows. OGRE's key innovation is calculating a composite transformation from native to standard space and applying it in a single resampling step, thereby minimizing the cumulative spatial blurring and error inherent in the multi-step chain.

Table 2: Key Software and Computational Resources for OGRE

Resource	Function / Description	Source / Availability
OGRE Pipeline	Core software for one-step registration and extraction preprocessing.	GitHub: `PhilipLab/OGRE-pipeline` or NITRC [73] [74]
FSL (FMRIB Software Library)	Provides FLIRT & FNIRT for registration; FEAT for GLM analysis.	https://fsl.fmrib.ox.ac.uk/fsl/fslwiki [72] [73]
FreeSurfer	Used for automated brain extraction and anatomical parcellation.	https://surfer.nmr.mgh.harvard.edu/ [75] [73]
Siemens PRISMA 3T Scanner	Acquisition platform used in the validation study.	Vendor-specific
fMRI Precision Drawing Task	Block-design motor task to elicit robust, lateralized activation.	Custom implementation based on STEGA app [73]

The OGRE pipeline represents a significant methodological advancement for improving the reliability of fMRI data. By replacing multi-step with one-step interpolation, OGRE directly targets a key source of technical variance, leading to measurably lower inter-subject variability and stronger detection of true task-related brain activity. For researchers and drug development professionals, adopting OGRE can enhance the sensitivity of studies seeking to identify robust functional biomarkers, characterize patient subgroups, and evaluate the efficacy of therapeutic interventions. Its compatibility with the widely used FSL FEAT for statistical analysis facilitates integration into existing workflows, making it a practical and powerful tool for enhancing fMRI preprocessing pipeline reliability.

Functional magnetic resonance imaging (fMRI) preprocessing is a critical foundation for valid neuroimaging research, yet standard pipelines often fail to address the unique challenges presented by special populations. Pediatric subjects and neurological patients exhibit characteristics, such as increased head motion and atypical brain anatomy, that can severely degrade data quality and confound results if not properly managed [76]. These challenges directly impact the reliability and interpretability of findings in both basic neuroscience and clinical drug development contexts. The imperative for tailored preprocessing strategies is underscored by benchmarking studies showing that inappropriate pipeline choices can produce systematically misleading results, while optimized workflows consistently satisfy criteria for reliability and sensitivity across diverse datasets [77]. This Application Note provides detailed protocols and analytical frameworks designed to enhance fMRI preprocessing for these vulnerable populations, with a focus on practical implementation within a broader research program investigating pipeline reliability.

Specific Challenges in Special Populations

Pediatric Populations

Children present distinct challenges for fMRI acquisition and preprocessing. Studies have documented that young children (aged 4-8 years) exhibit significantly higher head motion compared to adults, which introduces spurious correlations in functional connectivity metrics [78]. This motion problem is compounded by smaller brain sizes and ongoing neurodevelopment, which complicate anatomical normalization and segmentation. Furthermore, children often have lower tolerance for scanner environments, resulting in increased anxiety and movement that further degrades data quality. These challenges necessitate specialized preprocessing approaches that go beyond standard motion correction techniques.

Neurological Patients

Patients with neurological conditions such as cerebral palsy (CP) present a dual challenge: excessive head motion during scanning and significant anatomical variations due to underlying neuropathology [76]. These anatomical anomalies include atrophy, lesions, and malformations that disrupt standard spatial normalization processes. Conventional whole-brain analysis pipelines often fail when brains deviate substantially from neurotypical templates, requiring alternative registration and normalization strategies. The presence of abnormal neurovascular coupling in certain patient populations further complicates the interpretation of the blood-oxygen-level-dependent (BOLD) signal, as the fundamental relationship between neural activity and hemodynamic response may be altered.

Quantitative Benchmarking of Preprocessing Strategies

Pipeline Efficacy in High-Motion Pediatric Data

Systematic benchmarking of preprocessing strategies in early childhood populations (ages 4-8) has revealed critical insights into optimal pipeline configurations. The following table summarizes key findings from a comprehensive evaluation of different preprocessing combinations:

Table 1: Efficacy of Different Preprocessing Strategies in Pediatric fMRI

Preprocessing Component	Options Tested	Impact on Data Quality	Recommendation for Pediatric Data
Global Signal Regression (GSR)	With vs. Without GSR	Minimal impact on connectome fingerprinting; improved intersubject correlation (ISC)	Include GSR for task-based studies; consider for resting-state
Motion Censoring	Various thresholds (e.g., FD < 0.2-0.5 mm)	Strict censoring reduced motion-correlated edges but negatively impacted identifiability	Use moderate censoring thresholds balanced with retention of data
Motion Correction Strategy	ICA-AROMA vs. HMP regression	ICA-AROMA performed similarly to HMP regression; neither obviated need for censoring	Combine ICA-AROMA with moderate censoring for optimal results
Filtering	Bandpass filtering (e.g., 0.01-0.1 Hz)	Essential for removing physiological noise	Include bandpass filtering alongside HMP regression
Optimal Pipeline Combination	Censoring + GSR + bandpass filtering + HMP regression	Most efficacious for both noise removal and information recovery	Recommended as default for high-motion pediatric data

This benchmarking study demonstrated that the most efficacious pipeline for both noise removal and information recovery in children included censoring, GSR, bandpass filtering, and head motion parameter (HMP) regression [78]. Importantly, ICA-AROMA performed similarly to HMP regression and did not eliminate the need for censoring, indicating that multiple motion mitigation strategies must be employed in concert.

Pipeline Reliability in Functional Connectomics

A systematic evaluation of 768 data-processing pipelines for network reconstruction from resting-state fMRI revealed vast variability in pipelines' suitability for functional connectomics [77]. The evaluation used multiple criteria, including minimization of motion confounds, reduction of spurious test-retest discrepancies, and sensitivity to inter-subject differences. Key findings included:

Table 2: Performance Metrics for Network Construction Pipelines in Test-Retest Scenarios

Pipeline Component	Options Evaluated	Reliability Impact	Recommendation
Global Signal Regression	Applied vs. Not Applied	Significant impact on reliability metrics; context-dependent	Use consistent GSR approach across study; document choice
Brain Parcellation	Anatomical vs. Functional vs. Multimodal; Various node numbers (100-400)	Node definition critically influences reliability	Use multimodal parcellations with 200-300 nodes for balanced reliability
Edge Definition	Pearson correlation vs. Mutual information	Moderate impact; correlation more reliable for most applications	Prefer Pearson correlation for standard functional connectivity
Edge Filtering	Density-based (5-20%) vs. Weight-based (0.3-0.5) vs. Data-driven	Filtering approach significantly affects network topology	Use data-driven methods (ECO, OMST) for optimal balance
Network Type	Binary vs. Weighted	Differential effects on reliability metrics	Weighted networks generally preferred for functional connectomics

The study revealed that inappropriate choice of data-processing pipeline can produce results that are not only misleading but systematically so, with the majority of pipelines failing at least one criterion [77]. However, a subset of optimal pipelines consistently satisfied all criteria across different datasets, spanning minutes, weeks, and months, providing clear guidance for robust functional connectomics in special populations.

Detailed Experimental Protocols

Real-Time Motion Monitoring and Feedback Protocol

For populations with high motion characteristics (children and neurological patients), real-time motion feedback provides an effective strategy for minimizing head movement during acquisition [76].

Materials and Software Requirements:

MRI scanner with capability for real-time processing
Real-time motion tracking software (e.g., AFNI with real-time plugin)
Visual feedback display system visible from scanner
Customizable motion threshold settings

Procedure:

Setup and Calibration: Configure real-time motion tracking to calculate a composite motion index (M(t)) from three translational and three rotational parameters at each repetition time (TR). Set a motion threshold (e.g., M(t) < 1) for acceptable motion levels.
Participant Preparation: Instruct participants to remain still during scanning with the goal of keeping feedback indicators within acceptable range. Practice the procedure outside the scanner with a mock setup.
Feedback Implementation: Program visual feedback (e.g., color-changing arrows or bars) that reflects real-time motion levels. Increased intensity or color change (blue to red) should indicate increasing degrees of motion.
Data Acquisition: Begin with anatomical T1-weighted scan, followed by resting-state fMRI with real-time motion feedback. Conclude with task-based fMRI if required.
Quality Assessment: Calculate framewise displacement (FD) post-scan using the formula: FDtranslation,t = âˆš[(hpx,t - hpx,t-1)Â² + (hpy,t - hpy,t-1)Â² + (hpz,t - hp_z,t-1)Â²] Exclude datasets exceeding predetermined quality thresholds (e.g., mean FD > 0.5mm).

This protocol has been successfully implemented in children with cerebral palsy, significantly improving data quality and completion rates [76].

Quality Control Protocol for Preprocessing of Challenging Data

Based on SPM and MATLAB, this protocol provides comprehensive quality control for each preprocessing step [79].

Initial Data Check:

Verify consistency of imaging parameters across participants (number of volumes, TR, voxel sizes).
Check anatomical images for artifacts, coverage, and orientation using SPM Check Registration functionality.
Manually reorient images far from MNI template or reset origin to anterior commissure.

Anatomical Image Processing:

Segment anatomical images into gray matter, white matter, and CSF using SPM Segment.
Generate bias-corrected anatomical image for functional-anatomical registration.
Check segmentation quality by overlaying tissue probability maps onto MNI template.

Functional Image Processing:

Realign functional images to correct for head motion using SPM Realign: Estimate & Reslice.
Calculate framewise displacement (FD) from rigid body transformation parameters.
Coregister functional and anatomical images, considering skull-stripping to improve registration.
Normalize images to standard space (MNI) using deformation fields from segmentation.
Apply spatial smoothing (if desired) and temporal filtering.

Exclusion Criteria:

Maximum framewise displacement > 1.5mm [79]
Coregistration errors visible as misalignment > 3mm
Normalization failures resulting in poor overlap with template
Artifacts (ghosting, signal dropouts) affecting > 10% of brain volume

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 3: Essential Tools for Specialized fMRI Preprocessing

Tool/Software	Application	Key Function	Special Population Utility
fMRIPrep	Automated preprocessing	Robust, analysis-agnostic pipeline adapting to dataset idiosyncrasies	Handles diverse data quality; minimal manual intervention required [33] [34]
ICA-AROMA	Motion artifact removal	ICA-based strategy for removing motion artifacts	Effective for high-motion data without necessitating complete volume removal [78]
Real-time AFNI	Motion monitoring	Real-time calculation and display of motion parameters	Enables motion feedback during scanning for pediatric/patient populations [76]
SPM with QC Tools	Processing and quality control	Statistical parametric mapping with quality control protocol	Systematic identification of artifacts and processing failures [79]
ANTs	Registration and normalization	Advanced normalization tools for atypical anatomy	Improved spatial normalization for brains with lesions or atrophy [33]
Custom Censoring Scripts	Data scrubbing	Identification and removal of high-motion volumes	Critical for high-motion datasets; customizable thresholds [78]

Advanced Analytical Framework

Surface-Based Processing for Atypical Anatomy

For neurological patients with significant anatomical abnormalities, surface-based processing provides an alternative to volume-based approaches. This method reconstructs cortical surfaces from T1-weighted images, enabling analysis that is less constrained by gross anatomical distortions. The Human Connectome Project pipelines demonstrate the utility of this approach for maintaining functional-analytical alignment in brains with atypical anatomy [77].

Laterality Indices for Functional Mapping in Pathological Brains

For patients with cerebral palsy or other conditions causing anatomical reorganization, standard bilateral approaches to functional localization may be insufficient. The calculation of laterality indices (LI) provides a quantitative measure of hemispheric dominance that accommodates individual neuroanatomical variations [76]. The modified LI formula for patients with unilateral involvement is: LI (Unaffected) = (U - A)/(U + A) where U represents voxels from the unaffected hemisphere and A represents voxels from the affected hemisphere. This approach enables valid functional assessment even in the presence of significant anatomical abnormality.

Tailoring fMRI preprocessing strategies to address the unique challenges of pediatric and neurological patient populations is essential for generating valid, interpretable results. The protocols and frameworks presented here provide a foundation for robust analysis of data from these special populations. Implementation of real-time motion monitoring, systematic quality control, and tailored analytical approaches can significantly enhance data quality and analytical validity. As the field moves toward increasingly standardized processing frameworks like fMRIPrep [33] [34], incorporating these population-specific adaptations will be crucial for advancing both basic neuroscience and clinical applications. Future developments in foundation models for fMRI analysis [6] promise further improvements in handling data variability, potentially offering more generalized solutions to these persistent challenges.

Leveraging Deep Neural Networks for Adaptive Spatial Filtering in Subject-Level Analysis

Functional magnetic resonance imaging (fMRI) has emerged as a predominant technique for mapping human brain activity in vivo. However, the blood oxygen level-dependent (BOLD) signal measured by fMRI is contaminated by substantial structured and unstructured noise from various sources, complicating statistical analysis [67]. A critical preprocessing step designed to enhance the signal-to-noise ratio (SNR) is spatial smoothing, traditionally implemented using isotropic Gaussian filters with a heuristically selected full-width at half maximum (FWHM). While this approach benefits group-level analysis by mitigating anatomical variability across subjects, it introduces significant spatial blurring artifacts at the subject level, where inactive voxels near active regions may be mistakenly identified as active [67]. This limitation is particularly problematic for clinical applications such as presurgical planning and fMRI fingerprinting, which demand high spatial specificity [67].

The pursuit of improved reliability in single-subject fMRI is a pressing concern in neuroscience. Common task-fMRI measures have demonstrated poor test-retest reliability, with a meta-analysis revealing a mean intra-class correlation coefficient (ICC) of just .397 [80]. This unreliability undermines the suitability of fMRI for brain biomarker discovery and individual-differences research. Furthermore, the conventional Gaussian smoothing method applies uniform filtering across all voxels, disregarding the complex, folded anatomy of the cerebral cortex [67]. This often results in reduced spatial specificity and compromised accuracy in subject-level activation maps.

This application note explores the transformative potential of deep neural networks (DNNs) for adaptive spatial filtering in subject-level fMRI analysis. By moving beyond rigid, heuristic-based smoothing, DNNs can learn data-driven spatial filters that adapt to local brain architecture and activation patterns, thereby enhancing both the reliability and spatial precision of single-subject fMRI results.

Current Limitations of Standard Spatial Smoothing

Spatial Specificity Trade-offs in Conventional Methods

Traditional Gaussian spatial smoothing presents a fundamental trade-off: while it improves SNR and facilitates group-level analysis by reducing inter-subject anatomical variability, it concurrently dilates active regions and reduces spatial precision at the individual subject level. The cerebral cortex is a thin, highly folded sheet of gray matter, and the application of a fixed isotropic filter fails to respect this complex geometry. Consequently, inactive voxels adjacent to truly active regions can be misclassified as active, leading to false positives and reduced spatial accuracy [67]. This is a critical limitation for applications where precise localization is paramount, such as in pre-surgical mapping to determine the relationship between cortical functional areas and pathological structures like tumors [67].

The Reliability Challenge in Subject-Level fMRI

The reliability of fMRI is a cornerstone for its valid application in both basic research and clinical settings. However, empirical evidence consistently highlights a reliability crisis. A comprehensive meta-analysis of 90 experiments (N=1,008) found the overall reliability of task-fMRI measures to be poor (mean ICC = .397) [80]. Subsequent analyses of data from the Human Connectome Project and the Dunedin Study further confirmed these concerns, showing test-retest reliabilities across 11 common fMRI tasks ranging from ICCs of .067 to .485 [80]. Such low reliability renders many common fMRI measures unsuitable for studying individual differences or for use as diagnostic or prognostic biomarkers. Improving preprocessing methodologies, including spatial smoothing, is therefore essential to enhance the validity and utility of subject-level fMRI.

Deep Neural Networks for Adaptive Filtering

Architectural Framework and Inductive Biases

Deep learning offers a paradigm shift from fixed-parameter smoothing to adaptive, data-driven spatial filtering. The core idea is to use a DNN to estimate the optimal spatial smoothing for each voxel individually, based on the characteristics of the surrounding voxels' time series and, optionally, brain tissue properties [67]. The inductive bias of the chosen architectureâ€”the set of assumptions it makes about the dataâ€”profoundly influences its performance.

Recent systematic comparisons of architectures for fMRI time-series classification have demonstrated that Convolutional Neural Networks (CNNs) consistently outperform other models like Long Short-Term Memory networks (LSTMs) and Transformers in settings with limited data [81]. CNNs embody a strong locality bias, assuming that nearby voxels are correlated and that patterns can recur across space. This aligns well with the spatial structure of fMRI data, where functionally related neural activity often occurs in localized clusters. In contrast, LSTMs, with their bias for modeling temporal sequences, and Transformers, which excel at capturing global dependencies, may be less data-efficient for extracting spatially localized features [81].

A proposed DNN model for adaptive smoothing comprises multiple 3D convolutional layers followed by fully connected layers [67]. The 3D convolutional layers act as data-driven spatial filters learned directly from the data, allowing them to adapt flexibly to various activation profiles without requiring pre-specified filter shapes or orientations. The fully connected layers then assign weights to the smoothed time series from the convolutional layers, producing a final optimized time series for each voxel [67]. To ensure numerical stability and interpretability, a sum constraint is typically applied to the convolutional layers, and a non-negative constraint is applied to the fully connected layers.

Comparative Performance Against Existing Adaptive Methods

Previous adaptive spatial smoothing methods, such as constrained Canonical Correlation Analysis (CCA), aimed to address the limitations of Gaussian smoothing. These methods tailor smoothing parameters for each voxel based on the time series of surrounding voxels [67]. However, they often face significant computational bottlenecks. For instance, including more neighboring voxels in a constrained CCA framework can lead to an exponentially increasing number of sub-problems (e.g., 2^26 for a 3x3x3 neighborhood), making the approach computationally prohibitive for high-resolution fMRI data [67].

The DNN-based approach surmounts this limitation by using deeper convolutional layers to incorporate information from a larger neighborhood of voxels without a prohibitive increase in computational cost. This makes it particularly suitable for modern ultrahigh-resolution (sub-millimeter) task fMRI data, where the inclusion of more neighboring voxels is beneficial due to the finer spatial resolution [67].

Table 1: Comparison of Spatial Smoothing Methods for fMRI

Method	Key Principle	Spatial Specificity	Computational Efficiency	Suitability for High-Res Data
Isotropic Gaussian Smoothing	Applies a fixed Gaussian kernel to all voxels.	Low (introduces blurring)	High	Moderate
Constrained CCA	Adapts smoothing using voxel-wise CCA with constraints on neighbors.	Moderate to High	Low (exponentially complex with more neighbors)	Low
Weighted Wavelet Denoising [82]	Uses a weighted 3D discrete wavelet transform for denoising.	Moderate	Moderate	Moderate
DNN-based Adaptive Smoothing [67]	Uses deep 3D CNNs to learn data-driven spatial filters.	High	High (after initial training)	High

Experimental Protocols and Validation

Protocol for DNN Implementation and Training

The following protocol outlines the steps for implementing and validating a DNN for adaptive spatial smoothing of task fMRI data, as derived from the referenced research [67].

Data Preparation and Preprocessing:
- Begin with unsmoothed, preprocessed task fMRI data in a standardized space (e.g., MNI) or native subject space. Preprocessing should include standard steps like motion correction, distortion correction, and co-registration, which can be robustly performed using tools like fMRIPrep [55] [33] [34].
- Partition the 4D fMRI data (time x space) into smaller batches (e.g., n x T x x Ã— y Ã— z Ã— 1) to manage memory load during training, where n is the batch size, T is the number of time points, and x, y, z are spatial dimensions.
Model Architecture Specification:
- Input Layer: Accepts a batch of voxel time series from a 3D neighborhood.
- 3D Convolutional Layers: Stack multiple 3D convolutional layers with a kernel size of 3x3x3. The number of filters (F_i) in each layer can be progressively increased (e.g., 32, 64, 128). These layers learn to extract spatially localized features from the input data.
- Fully Connected Layers: Follow the convolutional blocks with one or more fully connected layers. These layers integrate the features extracted from the convolutional layers to produce the final smoothed time series for the center voxel.
- Constraints: Apply a sum constraint to the weights of the convolutional layers to maintain stability and a non-negative constraint to the fully connected layers to ensure interpretability.
Model Training:
- The model is trained in a supervised manner. The specific loss function is not detailed in the search results, but a typical choice for a regression task (predicting a cleaned time series) would be Mean Squared Error (MSE) between the model's output and a target signal.
- Training aims to minimize this loss, effectively teaching the network to denoise and smooth the fMRI time series in a way that is adaptive to the local spatial and temporal context.
Output and Inference:
- The trained model takes unsmoothed fMRI data and outputs the DNN-smoothed time series for each voxel.
- These smoothed time series can then be fed into a standard General Linear Model (GLM) for statistical analysis of brain activation.

Protocol for Assessing Single-Subject fMRI Reliability

To validate the improvement offered by any novel preprocessing method, including DNN-based smoothing, its impact on single-subject reliability must be quantitatively assessed. The following protocol, leveraging the Intra-class Correlation Coefficient (ICC), is recommended [83] [84].

Data Acquisition:
- Acquire test-retest fMRI data from a cohort of participants (e.g., N â‰¥ 20). Each participant should undergo at least two identical scanning sessions on different days, performing the same task (e.g., finger-tapping with multiple runs) [84].
Data Processing with Experimental and Control Pipelines:
- Process the data from both sessions using two parallel pipelines:
  - Pipeline A (Experimental): Includes the novel DNN-based adaptive spatial smoothing.
  - Pipeline B (Control): Uses conventional isotropic Gaussian smoothing.
- Otherwise, keep all other preprocessing steps (e.g., in fMRIPrep) and first-level analysis (GLM) identical.
Calculation of Intra-Run Variability (Optional but Informative):
- For each subject and run, compute a map of intra-run variability (IRV). This can be done by analyzing individual task blocks separately within a run and calculating the variance of the activation parameter estimates across blocks [84].
- This IRV map can be used to weight standard GLM activation maps, a method that has been shown to improve reliability by emphasizing voxels with more stable responses [84].
Voxel-Wise ICC Calculation:
- For each pipeline, calculate the voxel-wise ICC across the two sessions for all subjects. The ICC measures the consistency of activation values between sessions. A common form of ICC used for this purpose assesses whether voxels that show high activation in one session will reliably show high activation in a second session for the same subject relative to other subjects [83].
- The formula for a voxel-wise ICC (e.g., ICC(2,1)) is based on a repeated-measures analysis of variance (ANOVA), partitioning the variance into between-subject and within-subject (error) components.
Statistical Comparison:
- Compare the distribution of ICC values (e.g., mean ICC, proportion of voxels with ICC > 0.5) between Pipeline A (DNN) and Pipeline B (Gaussian) within a priori regions of interest or across the whole brain.
- Statistical tests (e.g., paired t-tests on ICC values within a mask) can determine if the DNN method leads to a significant improvement in reliability.

Table 2: Key Metrics for Validating DNN-based Adaptive Smoothing

Validation Metric	Description	Interpretation
Peak Signal-to-Noise Ratio (PSNR) [82]	Ratio between the maximum possible power of a signal and the power of corrupting noise.	Higher PSNR indicates better denoising performance.
Structural Similarity Index (SSIM) [82]	Measures the perceived quality and structural similarity between two images.	Higher SSIM indicates better preservation of image structure.
Intra-class Correlation Coefficient (ICC) [83] [80]	Measures test-retest reliability of activation values across scanning sessions.	ICC > 0.6 indicates good reliability; ICC > 0.75 indicates excellent reliability.
Sensitivity & Specificity	Ability to correctly identify truly active (sensitivity) and inactive (specificity) voxels.	Assessed against a "ground truth" from simulations or high-quality data.

Visualization of Workflows

The following diagram illustrates the core architecture of a DNN for adaptive spatial smoothing and its position within a broader fMRI processing workflow.

Diagram Title: fMRI Processing Workflows Comparing Conventional and DNN Methods

Table 3: Key Software Tools and Resources for DNN-based fMRI Analysis

Tool/Resource	Type	Primary Function	Relevance to DNN Adaptive Filtering
fMRIPrep [55] [33] [34]	Software Pipeline	Robust, standardized preprocessing of fMRI data.	Provides high-quality, minimally preprocessed data that is essential for training and applying DNN models. Reduces uncontrolled spatial smoothness.
Nipype [3]	Python Framework	Facilitates interoperability between neuroimaging software packages.	Enables the integration of DNN models (e.g., in TensorFlow/PyTorch) with traditional neuroimaging workflows.
fMRIflows [3]	Software Pipeline	Fully automatic pipelines for univariate and multivariate fMRI analysis.	Extends preprocessing (e.g., via fMRIPrep) to include flexible spatial filtering, which can be a foundation for integrating DNN smoothing.
ANTs [55] [33]	Software Library	Advanced normalization and segmentation tools.	Used within fMRIPrep for spatial normalization and brain extraction, ensuring data is in a standard space for model application.
FSL [55] [33]	Software Library	FMRI analysis tools (e.g., MELODIC ICA, FIX).	Used for noise component extraction and other preprocessing steps that can complement DNN smoothing.
HCP Datasets [81]	Data Resource	Publicly available high-resolution fMRI data (7T).	Ideal for training and validating DNN models on data with high spatial and temporal resolution.
ICC Reliability Toolbox [83]	Software Tool	Calculates voxel-wise intra-class correlation coefficients.	Critical for quantitatively assessing the improvement in test-retest reliability afforded by the DNN method.

Validating and Comparing Pipelines: Metrics for Performance, Reproducibility, and Clinical Utility

Functional magnetic resonance imaging (fMRI) has become an indispensable tool for studying brain function in both basic research and clinical applications. However, the flexibility in data acquisition and analysis has led to challenges in reproducibility and transferability of findings, which is particularly critical in contexts like clinical trials and drug development. This application note establishes a procedural framework for employing quantitative performance metricsâ€”specifically prediction accuracy and reproducibilityâ€”to optimize fMRI processing pipelines. Grounded in the broader thesis of improving fMRI preprocessing pipeline reliability, this document provides detailed protocols and data presentations tailored for researchers, scientists, and drug development professionals. The guidelines herein are designed to ensure that fMRI methodologies meet the stringent requirements for biomarker qualification in regulatory settings, such as by the FDA and EMA, where demonstrating reproducibility and a link to clinical outcomes is paramount [2].

Performance Metrics and Their Quantitative Comparison

The evaluation of fMRI pipelines hinges on two cornerstone metrics: reproducibility, which measures the stability of results across repeated measurements, and prediction accuracy, which assesses the model's ability to correctly classify or predict outcomes. Different experimental paradigms and analysis techniques yield varying performances in these metrics, as summarized in the table below.

Table 1: Quantitative Performance Metrics of fMRI Paradigms and Analysis Models

fMRI Paradigm / Analysis Model	Key Metric	Reported Performance	Context of Use
Words (event-related) [85]	Between-sessions reliability of lateralization; Classification of TLE patients	Lateralization most reliable; Significantly above-chance classification at all sessions	Temporal Lobe Epilepsy (TLE) memory assessment
Hometown Walking & Scenes (block) [85]	Between-sessions spatial reliability	Best between-sessions reliability and spatial overlap	Memory fMRI in TLE
Landmark Task [86]	Reliability of Hemispheric Lateralization Index (LI)	LI reliably determined (>62% for	LI	>0.4; >93% for left/right categories); "Fair" to "good" reliability of LI strength	Visuospatial processing assessment
i-ECO Method [87]	Diagnostic precision (Precision-Recall AUC)	>84.5% PR-AUC for schizophrenia, bipolar, ADHD	Psychiatric disorder diagnosis
NeuroSTORM Foundation Model [6]	Gender prediction accuracy	93.3% accuracy on HCP-YA dataset	General-purpose fMRI analysis
Java-based Pipeline (GLM & CVA) [88]	Prediction accuracy and SPI reproducibility	System successfully ranked pipelines; Performance highly dependent on preprocessing	General fMRI processing pipeline evaluation

The evidence demonstrates a trade-off; some protocols excel in spatial reproducibility (e.g., Hometown Walking), while others show superior predictive classification (e.g., event-related Words task, i-ECO, and NeuroSTORM). The choice of paradigm and model must therefore be fit-for-purpose.

Detailed Experimental Protocols

Below are detailed methodologies for key experiments cited in this note, providing a blueprint for implementation and validation.

Protocol for Assessing Memory fMRI Lateralization in Epilepsy

This protocol is designed to evaluate the reliability of memory paradigms for pre-surgical mapping in Temporal Lobe Epilepsy (TLE) [85].

1. Objective: To identify the fMRI memory protocol with the optimal combination of between-sessions reproducibility and ability to correctly lateralize seizure focus in TLE patients. 2. Experimental Design: * Participants: 16 patients with diagnosed TLE. * Paradigms: Seven memory fMRI protocols are administered, including: * Hometown Walking (block design) * Scene encoding (block and event-related) * Picture encoding (block and event-related) * Word encoding (block and event-related) * Session Structure: Each participant undergoes all seven protocols across three separate scanning sessions to assess test-retest reliability. 3. Data Acquisition: * Use a 3T MRI scanner. * Acquire BOLD fMRI data with a gradient-echo EPI sequence sensitive to T2* contrast. * Maintain consistent scanning parameters (TR, TE, voxel size, number of slices) across all sessions. 4. Data Analysis: * Preprocessing: Conduct standard steps including realignment, normalization, and smoothing. * Activation Analysis: Use a General Linear Model (GLM) to generate individual activation maps for each protocol and session. * Lateralization Index (LI): Calculate an LI for the temporal lobe for each protocol and session. * Reproducibility Metrics: * Compute the voxelwise intraclass correlation coefficient (ICC) across the three sessions for each protocol to assess spatial reliability. * Calculate the spatial overlap of activated voxels between sessions. * Prediction Accuracy: Use Receiver Operating Characteristic (ROC) analysis to evaluate each protocol's ability to classify patients as having left-onset or right-onset TLE. 5. Interpretation: The Words (event-related) protocol demonstrated the best combination of reliable lateralization across sessions and significantly above-chance diagnostic classification [85].

Protocol for Evaluating fMRI Processing Pipelines

This protocol outlines a systematic approach for comparing and ranking different fMRI processing pipelines, as enabled by a Java-based evaluation system [88].

1. Objective: To evaluate and rank the performance of heterogeneous fMRI processing pipelines based on prediction accuracy and statistical parametric image (SPI) reproducibility. 2. Experimental Design: * Pipelines: Select pipelines for comparison (e.g., FSL.FEAT with GLM, NPAIRS with CVA). * Data: Use a real fMRI dataset acquired from a sensory-motor or cognitive task. 3. Data Processing & Analysis: * Pipeline Execution: Run the identical preprocessed dataset through each pipeline. * Performance Metric Calculation: * Prediction Accuracy: For each pipeline, use machine learning (e.g., support vector machines) to calculate the classification accuracy of task conditions based on the fMRI model outputs. The Java-based system employs an algorithm to measure GLM prediction accuracy. * SPI Reproducibility: Use resampling techniques (e.g., bootstrapping) to generate multiple versions of the statistical parametric map. Quantify the reproducibility of the activation patterns across these resampled maps. * Automated Scoring: The system integrates these two metrics into a single, automated performance score for each pipeline. 4. Interpretation: The system successfully ranked pipeline performance, revealing that the rank was highly dependent on the specific preprocessing operations chosen, highlighting the critical need for systematic optimization [88].

Workflow Visualization

The following diagrams, generated with Graphviz DOT language, illustrate the logical relationships and workflows described in the protocols.

fMRI Pipeline Evaluation Workflow

This diagram outlines the core procedure for evaluating fMRI processing pipelines as detailed in Section 3.2.

Performance Metrics for Protocol Selection

This diagram illustrates the decision-making process for selecting an fMRI protocol based on its performance profile, derived from data in Section 2.

The Scientist's Toolkit

This section catalogs essential research reagents, software, and datasets critical for implementing the described experiments and achieving high reproducibility and prediction accuracy.

Table 2: Essential Research Reagents and Resources for fMRI Pipeline Optimization

Item Name	Type	Function and Application
3T MRI Scanner	Equipment	Provides the necessary magnetic field strength for high-quality BOLD fMRI data acquisition. Fundamental for all protocols.
Gradient-Echo EPI Sequence	Pulse Sequence	The standard MRI pulse sequence for acquiring T2*-sensitive BOLD fMRI data.
fMRI Paradigms (e.g., Words event-related, Landmark)	Experimental Stimulus	Standardized tasks to robustly activate target brain networks (memory, visuospatial). Critical for reproducibility.
Java-based Pipeline Evaluation System [88]	Software	Integrated environment (Fiswidgets + YALE) for evaluating pipelines with GLM and CVA models using prediction accuracy and SPI reproducibility.
NeuroSTORM Foundation Model [6]	Software/AI Model	A pre-trained model for general-purpose fMRI analysis, enabling efficient transfer learning for tasks like age/gender prediction and disease diagnosis.
i-ECO Analysis Package [87]	Software/Method	An integrated dimensionality reduction and visualization tool combining ReHo, fALFF, and Eigenvector Centrality for psychiatric diagnosis.
Multi-Session Test-Retest Dataset	Data	A dataset where the same subjects are scanned multiple times, which is indispensable for calculating reproducibility metrics like ICC.
HCP-YA, ABCD, UK Biobank [6]	Data	Large-scale, publicly available neuroimaging datasets suitable for pre-training foundation models and benchmarking pipeline performance.

Functional magnetic resonance imaging (fMRI) is a cornerstone technique for mapping human brain activity in cognitive, perceptual, and motor tasks. The validity of its findings, however, is deeply contingent upon the data preprocessing pipeline employed. The neuroimaging community utilizes a diverse inventory of tools, leading to ad-hoc preprocessing workflows customized for nearly every new dataset. This methodological variability challenges the reproducibility and interpretability of results, as differences in preprocessing can substantially influence outcomes. Within this context, three pipelines have garnered significant attention: the established FSL's FEAT, the robust and adaptive fMRIPrep, and the newer OGRE pipeline which incorporates advanced registration techniques. This protocol details a comparative framework for evaluating these three pipelines on task-based fMRI data, providing researchers and drug development professionals with a structured approach to assess pipeline performance based on key metrics such as inter-individual variability and task activation magnitude.

Feature	FSL FEAT	fMRIPrep	OGRE
Primary Analysis Type	Volumetric GLM [89]	Analysis-agnostic (Volume & Surface) [33]	Volumetric for FSL [89]
Core Philosophy	Integrated preprocessing and GLM analysis	Minimal preprocessing; "glass box" [90]	Integration of HCP's one-step resampling for FSL [89]
Brain Extraction	BET (Brain Extraction Tool) [89]	`antsBrainExtraction.sh` (ANTs) [91]	FreeSurfer parcellation [89]
Registration Approach	Multi-step interpolation [73]	One-step interpolation [73]	One-step resampling/ interpolation [89] [73]
Key Advantage	Mature, all-in-one solution	Robustness, adaptability, and high-quality reports [33]	Aims to reduce inter-individual variability [89]

Materials and Reagents

The Scientist's Toolkit: Essential Research Reagents and Materials

Item	Function in the Protocol
Siemens PRISMA 3T MRI Scanner	A high-performance MRI scanner for acquiring both functional (BOLD) and structural (T1w, T2w) images.
64-Channel Head Coil	Standard radio-frequency coil for receiving signal, crucial for achieving high-quality BOLD images.
MRI-Compatible Drawing Tablet	Allows participants to perform a precision drawing task inside the scanner for evoking motor cortex activation [89] [73].
BIDS (Brain Imaging Data Structure)	A standardized format for organizing neuroimaging data. Essential for fMRIPrep and recommended for OGRE and modern FSL analyses [33].
T1-weighted MP-RAGE Sequence	Provides high-resolution structural anatomy for brain extraction, tissue segmentation, and functional-to-structural registration.
Spin Echo Field Maps	Acquired to estimate and correct for B0 field inhomogeneities that cause susceptibility distortions in the EPI images [89].

Experimental Design and Methodology

Participant Cohort

The comparative data is derived from a study involving right-handed adult volunteers (N=53; 38 female; ages 47 Â± 18). A subset of participants had peripheral nerve injuries to the right arm, though group differences were not the focus of the pipeline comparison [73].

fMRI Task Paradigm

Participants performed a precision drawing task with their right hand, based on the STEGA app, during fMRI scanning. The task used a block design:

Active Block: 15.2 seconds (23 images) of drawing.
Control Block: 15.2 seconds (23 images) of rest (fixation cross).
Run Structure: 10 cycles of draw/rest, with an initial rest block, totaling 5 minutes and 23 seconds per run. Each participant completed three such BOLD runs with the right hand [73]. This task is designed to robustly activate the contralateral primary motor cortex.

Data Acquisition Parameters

Functional (BOLD) Scans: T2*-weighted gradient-echo EPI sequence; TR=662 ms, TE=30 ms, flip angle=52Â°, resolution=3.0Ã—3.0Ã—3.0 mm, multi-band acceleration=6x [73].
Structural Scans: T1-weighted (MP-RAGE) and T2-weighted images were acquired at 1.0 mm isotropic resolution [89] [73].

Data Preprocessing Protocols

Parallel Preprocessing Workflows

The core of this framework involves preprocessing the same dataset with three different pipelines, while keeping all subsequent statistical analysis identical (conducted with FSL's FEAT). This ensures that any differences in the final results are attributable to the preprocessing steps.

Figure 1: High-level overview of the comparative preprocessing workflow. The same raw data is processed through three parallel pipelines, with an identical statistical analysis performed on the outputs of each.

Detailed Pipeline Steps

FSL FEAT Preprocessing

This pipeline represents the standard, integrated preprocessing within FSL.

Brain Extraction: Uses FSL's Brain Extraction Tool (BET) [89].
Head Motion Correction: MCFLIRT.
Spatial Transformation: Multi-step interpolation for registration to standard space [73].
Limitations: BET can be prone to under- or over-extraction of brain voxels, and multi-step interpolation may introduce unwanted spatial blurring [89] [73].

fMRIPrep Preprocessing

fMRIPrep is an analysis-agnostic tool that automatically adapts to the input dataset.

Brain Extraction: Uses antsBrainExtraction.sh (ANTs), which is an atlas-based method often considered more robust than BET [91].
Head Motion Correction: MCFLIRT (FSL).
Spatial Transformation: Employs one-step interpolation for motion correction, distortion correction, and spatial normalization using ANTs' antsRegistration [73] [91].
Outputs: Preprocessed data in standard space, along with comprehensive visual quality control reports [33].

OGRE Preprocessing

The OGRE (One-step General Registration and Extraction) pipeline integrates components from the Human Connectome Project (HCP) pipeline for use with FSL.

Brain Extraction: Utilizes FreeSurfer parcellation, which can provide a more refined brain mask [89].
Head Motion Correction: Integrated into the one-step resampling procedure.
Spatial Transformation: Its defining feature is the "one-step resampling" (or one-step interpolation) procedure. This method applies a single transformation that simultaneously corrects for head motion, susceptibility distortions, and registers the data to a standard space (e.g., 2mm MNI atlas), thereby minimizing the spatial blurring associated with multiple interpolations [89] [73].
Compatibility: Designed specifically to output data and files compatible with downstream FSL FEAT analysis [89].

Figure 2: A conceptual diagram comparing multi-step (FSL) and one-step (OGRE, fMRIPrep) registration approaches. The one-step method minimizes the number of interpolations applied to the data.

Quantitative Comparison and Anticipated Results

Applying the aforementioned protocols to the precision drawing task dataset yields quantitative results that highlight critical performance differences between the pipelines.

Table 2: Comparative Performance Metrics

Performance Metric	FSL FEAT	fMRIPrep	OGRE
Inter-Individual Variability	Highest (Baseline)	Lower than FSL (p=0.036)	Lowest (p=7.3Ã—10â»â¹ vs. FSL) [73]
Activation Magnitude (Contralateral M1)	Baseline	Not significantly different from others	Strongest detection (OGRE > FSL, p=4.2Ã—10â»â´) [73]
Brain Extraction Robustness	Prone to under/over extraction [89]	Robust, atlas-based (ANTs) [91]	Precise (FreeSurfer) [89]

Interpretation of Results

Reduced Inter-Individual Variability: The significantly lower variability observed with OGRE, and to a lesser extent with fMRIPrep, suggests that the one-step interpolation approach is more effective in aligning brain data across subjects. This improved alignment increases statistical power in group-level analyses and is a critical factor for reliability in longitudinal studies or clinical trials [73].
Enhanced Activation Detection: OGRE's superior performance in detecting task-related activity in the primary motor cortex indicates that the combination of FreeSurfer-based brain extraction and one-step resampling better preserves the true BOLD signal in functionally specific regions [89]. This can lead to more sensitive biomarkers in drug development contexts.
Practical Implications: For researchers, the choice of pipeline involves a trade-off between ease of use and performance. While FSL offers an integrated environment, fMRIPrep provides robust, containerized preprocessing with excellent quality control. OGRE, while requiring more setup, demonstrates potential for higher sensitivity and reliability in volumetric analyses.

This application note establishes a structured framework for evaluating fMRI preprocessing pipelines, demonstrating that the choice of pipeline has a measurable and significant impact on analytical outcomes. The findings indicate that pipelines leveraging one-step interpolation (OGRE and fMRIPrep) offer tangible advantages over the traditional multi-step approach (FSL FEAT) in terms of reducing inter-subject variability. Specifically, the OGRE pipeline shows exceptional promise for task-based fMRI studies, yielding both the most consistent subject alignment and the strongest detection of task-evoked activity in key brain regions.

For the broader thesis on fMRI reliability, these results strongly suggest that adopting modern preprocessing strategies that minimize sequential image interpolation is crucial for enhancing the reproducibility of neuroimaging findings. This is particularly salient for drug development professionals who require robust and sensitive biomarkers. Future work should focus on validating these findings across different task paradigms, clinical populations, and with higher-resolution data to further solidify the evidence base for pipeline selection.

Functional magnetic resonance imaging (fMRI) has become a cornerstone technique for probing human brain function in both research and clinical settings. However, the blood oxygenation level-dependent (BOLD) signal that forms the basis of fMRI represents a small fraction of the total MR signal, making it highly susceptible to noise from various sources including system instability, physiological fluctuations, and head motion [92]. The preprocessing of raw fMRI data is therefore a critical step that directly influences the validity and interpretability of all subsequent analyses. Within the broader context of fMRI preprocessing pipeline reliability research, this document provides detailed application notes and protocols for assessing how preprocessing choices impact three crucial downstream applications: functional connectivity mapping, behavioral outcome prediction, and clinical disease diagnosis. The reproducibility crisis in neuroimaging underscores the necessity of these protocols, as variability in preprocessing methodologies can significantly alter study conclusions and impede the translation of fMRI biomarkers into clinical drug development [93] [2].

Quantitative Impact of Preprocessing on Downstream Metrics

Empirical studies consistently demonstrate that preprocessing pipeline selection directly influences key quantitative outcomes in downstream fMRI analysis. The tables below summarize documented effects on functional connectivity measurements, behavioral prediction accuracy, and diagnostic performance.

Table 1: Impact of Preprocessing Strategy on Functional Connectivity Metrics

Preprocessing Strategy	Functional Connectivity Metric	Reported Effect	Study Context
Standard Pipeline	Connectivity Mean Strength	Baseline spurious connectivity [16]	Large stroke dataset
Enhanced Pipeline (Lesion-aware masks)	Connectivity Mean Strength	Moderate reduction in spurious connectivity [16]	Large stroke dataset
Stroke-Specific Pipeline (ICA for artifacts)	Connectivity Mean Strength	Significant reduction in spurious connectivity [16]	Large stroke dataset
Censoring (Time-point removal)	Global Efficiency (GEFF)	Increased reliability, reduced motion dependency [93]	Healthy controls
Censoring (Time-point removal)	Characteristic Path Length (CPL)	Increased reliability, reduced motion dependency [93]	Healthy controls
Global Signal Regression	Correlation between seed pairs	Increased sensitivity to motion artifacts [93]	Healthy controls
Despiking + Motion Regression + Local White Matter Regressor	Correlation between seed pairs	Reduced sensitivity to motion [93]	Healthy controls

Table 2: Impact of Preprocessing on Behavioral and Diagnostic Prediction

Analysis Type	Preprocessing Strategy	Performance Outcome	Study Context
Behavioral Prediction	Standard, Enhanced, Stroke-Specific Pipelines	No significant impact on behavioral outcome prediction [16]	Stroke patients
Disease Diagnosis	Volume-based Foundation Model (NeuroSTORM)	Outstanding diagnostic performance across 17 diagnoses [6]	Multi-site clinical data (US, South Korea, Australia)
Phenotype Prediction	Volume-based Foundation Model (NeuroSTORM)	High relevance in predicting psychological/cognitive phenotypes [6]	Transdiagnostic Connectome Project
Age/Gender Prediction	ROI-based Methods (BrainGNN, BNT)	Lower accuracy than volume-based models [6]	Large-scale public datasets (HCP, UKB)
Age/Gender Prediction	Volume-based Foundation Model (NeuroSTORM)	Highest accuracy (e.g., 93.3% gender classification) [6]	Large-scale public datasets (HCP, UKB)

Detailed Experimental Protocols

Protocol 1: Evaluating Lesion-Specific Pipelines for Behavioral Prediction in Stroke

3.1.1 Objective: To assess the efficacy of lesion-specific preprocessing pipelines in reducing spurious functional connectivity while maintaining accuracy in predicting post-stroke behavioral outcomes.

3.1.2 Materials and Reagents:

fMRI Data: A large dataset of resting-state fMRI scans from stroke patients with associated clinical behavioral scores [16].
Lesion Masks: Structural MRI-derived masks identifying the location and extent of each patient's brain lesion.
Software: The fMRIStroke open-source tool or equivalent processing environment capable of implementing the pipelines below [16].

3.1.3 Procedure:

Pipeline Implementation:
- Standard Pipeline: Apply conventional steps including slice-timing correction, realignment, normalization to a standard space (e.g., MNI), and spatial smoothing.
- Enhanced Pipeline: Account for lesion topography during normalization and when computing tissue masks (e.g., CSF, white matter) for nuisance signal regression.
- Stroke-Specific Pipeline: Incorporate independent component analysis (ICA) to identify and remove components representing lesion-driven physiological artifacts [16].

Functional Connectivity Analysis:
- For each preprocessed dataset, calculate whole-brain functional connectivity matrices (e.g., using Pearson correlation between region time-series).
- Extract summary metrics such as connectivity mean strength and functional connectivity contrast [16].
Behavioral Prediction Modeling:
- Using the functional connectivity matrices as features, build models (e.g., linear regression, machine learning) to predict behavioral scores.
- Evaluate model performance using cross-validation.
Statistical Comparison:
- Compare the group-level functional connectivity metrics (e.g., mean strength) across the three pipelines using appropriate statistical tests (e.g., ANOVA).
- Compare the prediction accuracy (e.g., RÂ², mean squared error) of the behavioral models across pipelines.

Protocol 2: Assessing Pipeline Reliability via Graph Theory Metrics

3.2.1 Objective: To evaluate the reliability and motion-dependency of different preprocessing schemes using graph theoretical measures on resting-state fMRI data.

3.2.2 Materials and Reagents:

fMRI Data: Publicly available resting-state datasets (e.g., 1000 Functional Connectomes Project) including healthy controls [93].
Parcellation Atlas: A brain atlas to define network nodes (e.g., CC200 with 190 regions) [93].
Motion Parameters: Framewise displacement (FD) time-series derived from head motion correction.

3.2.3 Procedure:

Preprocessing Schemes: Implement seven distinct preprocessing strategies. Key variations should include:
- Inclusion/Exclusion of global signal regression.
- Application of different band-pass filters (e.g., 0.01-0.1 Hz, 0.04-0.07 Hz).
- Use of censoring (scrubbing) for high-motion time points versus no censoring.
- Composition of nuisance regressors (e.g., motion parameters, white matter, CSF signals) [93].

Network Construction:
- Extract mean time-series from each of the 190 defined regions.
- Create a 190x190 functional connectivity matrix for each subject using Pearson correlation.
- Apply a proportional threshold to create a binary adjacency matrix.
Graph Metric Calculation: Calculate four primary graph theory measures for each subject's network:
- Global Efficiency (GEFF): The average inverse shortest path length in the network.
- Characteristic Path Length (CPL): The average shortest path length between all node pairs.
- Average Clustering Coefficient (ACC): The mean of the clustering coefficients for all nodes, measuring local interconnectedness.
- Average Local Efficiency (ALE): The average global efficiency computed over each node's neighborhood [93].
Reliability and Motion-Dependency Analysis:
- Calculate intra-class correlation coefficients (ICC) for each graph metric across preprocessing schemes to assess reliability.
- Compute correlation coefficients between each graph metric and framewise displacement (FD) to assess motion dependency.

Protocol 3: Validating Foundation Models for Multi-Site Disease Diagnosis

3.3.1 Objective: To validate the diagnostic transferability of a foundation model (NeuroSTORM) across diverse clinical populations and acquisition protocols.

3.3.2 Materials and Reagents:

Pre-trained Foundation Model: The NeuroSTORM model, pre-trained on ~28.65 million fMRI frames from over 50,000 subjects [6].
Clinical Datasets: Multi-site datasets comprising patients with various diagnoses (e.g., TCP, MND, ABIDE, ADHD200) and healthy controls [6].
Computational Resources: GPU cluster with sufficient memory for 4D fMRI volume processing.

3.3.3 Procedure:

Data Preparation:
- For each clinical dataset, organize raw or minimally preprocessed 4D fMRI volumes according to the Brain Imaging Data Structure (BIDS).
- Ensure appropriate demographic and clinical labels are available for all subjects.

Model Fine-Tuning:
- Employ the Task-specific Prompt Tuning (TPT) strategy, which keeps the bulk of the pre-trained model frozen.
- Only a small set of task-specific prompt parameters are updated using the training split of the target clinical dataset [6].
Diagnostic Performance Evaluation:
- Feed the preprocessed fMRI volumes from the held-out test set through the fine-tuned NeuroSTORM model.
- Record model predictions (diagnostic classification) and compare against ground-truth labels.
- Calculate performance metrics including accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC).
Transferability Assessment:
- Repeat the fine-tuning and evaluation process across multiple independent clinical datasets.
- Compare performance against state-of-the-art region-of-interest (ROI) based and other volume-based methods.
- Assess performance in data-scarce scenarios by fine-tuning with progressively smaller subsets of the target data [6].

Table 3: Key Resources for fMRI Preprocessing and Analysis

Resource Name/Type	Primary Function	Application Context
fMRIStroke	Open-source tool for implementing lesion-specific preprocessing pipelines.	Standardized analysis of stroke fMRI data; reduces spurious connectivity [16].
fMRIPrep	Robust, standardized preprocessing pipeline for diverse fMRI data.	General-purpose fMRI preprocessing; enhances reproducibility and reduces pipeline-related variability [92].
NeuroSTORM	General-purpose foundation model for 4D fMRI volume analysis.	Downstream task adaptation (diagnosis, phenotype prediction) without information loss from atlas projection [6].
FBIRN/ HCP QA Phantoms	Agar gel phantoms with T1/T2 similar to gray matter.	Quality assurance of fMRI systems; measures temporal stability (SFNR, SNR) critical for BOLD detection [92].
Craddock CC200 Atlas	Functional parcellation defining 190 brain regions.	Graph theory analysis; provides standardized nodes for network construction [93].
UK Biobank/ ABCD/ HCP Datasets	Large-scale, publicly available neuroimaging datasets.	Pre-training foundation models; benchmarking pipeline performance on diverse, well-characterized populations [6].

Functional magnetic resonance imaging (fMRI) preprocessing has long been plagued by reproducibility and transferability challenges, largely stemming from complex, ad-hoc preprocessing pipelines and task-specific model designs that introduce uncontrolled variability and bias [6] [33]. The neuroimaging community currently lacks standardized workflows that reliably provide high-quality results across diverse datasets, acquisition protocols, and study populations [33]. This reliability crisis fundamentally undermines the validity of inference and interpretability of results in both basic neuroscience and clinical applications.

Foundation models represent a paradigm-shifting approach to these challenges. These large-scale models, pre-trained on massive datasets, offer enhanced generalization, efficiency, and adaptability across diverse tasks [94]. In fMRI analysis, they promise to capture noise-resilient patterns through self-supervised learning, potentially reducing sensitivity to acquisition variations and mitigating preprocessing-induced variability while preserving meaningful neurobiological information [6]. This Application Note examines the performance of foundation models across diverse fMRI tasks within the critical context of preprocessing pipeline reliability, providing experimental protocols and benchmarking data to guide researchers and clinicians.

Foundation Models in Neuroimaging: From Theory to Practice

Core Principles and Advantages

Foundation models in artificial intelligence are characterized by their large-scale pre-training on extensive datasets, which enables them to develop generalized representations that can be adapted to various downstream tasks with minimal task-specific training [94]. These models achieve superior generalization through self-supervised learning on heterogeneous data sources, allowing them to learn general features rather than task-specific patterns. Their architecture typically employs transformer or state-space models with self-attention mechanisms that efficiently capture long-range dependencies and contextual relationships in complex data [94] [6].

In neuroimaging, foundation models offer three crucial advantages for addressing preprocessing reliability concerns:

Robustness to Technical Variability: By learning from multi-site, multi-protocol data, these models become less sensitive to acquisition parameter variations that typically plague traditional preprocessing pipelines [6].
Minimized Structural Biases: Unlike traditional approaches that project data onto pre-defined brain atlases or connectomes (potentially introducing irreversible information loss and structural biases), foundation models can learn directly from 4D fMRI volumes, preserving more native information [6].
Reproducible Feature Extraction: The standardized feature extraction provided by foundation models reduces the analytical variability introduced by custom preprocessing workflows, enhancing reproducibility across studies and institutions [6].

The NeuroSTORM Model: A Case Study in fMRI Foundation Models

NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized and Representation Modeling) represents a state-of-the-art example specifically designed for fMRI analysis [6]. This model addresses fundamental computational challenges in processing 4D fMRI data through several innovations:

Architecture: Employs a Shifted-Window Mamba (SWM) backbone combining linear-time state-space modeling with shifted-window mechanisms to reduce computational complexity and GPU memory usage while maintaining performance.
Pre-training Strategy: Incorporates Spatiotemporal Redundancy Dropout (STRD) to effectively learn inherent characteristics in fMRI data while improving robustness and reproducibility.
Adaptation Mechanism: Utilizes Task-specific Prompt Tuning (TPT) for efficient downstream task adaptation with minimal trainable parameters.

NeuroSTORM was pre-trained on a remarkable 28.65 million fMRI frames (9,000 hours) collected from over 50,000 subjects across multiple centers, covering ages 5 to 100, representing the largest multi-source fMRI training dataset assembled to date [6].

Benchmarking Performance Across Diverse fMRI Tasks

Experimental Design and Baseline Preprocessing

To evaluate foundation model performance across diverse tasks, we established a comprehensive benchmarking framework with standardized preprocessing and evaluation metrics. All datasets underwent consistent preprocessing using fMRIPrep, an analysis-agnostic tool designed for robust and reproducible preprocessing of fMRI data [34] [33]. fMRIPrep performs minimal preprocessing including motion correction, field unwarping, normalization, bias field correction, and brain extraction, while providing comprehensive visual reports for quality assessment [33].

Table 1: Benchmark Tasks and Evaluation Framework

Task Category	Specific Tasks	Datasets	Evaluation Metrics
Demographic Prediction	Age and gender prediction	HCP-YA, HCP-A, HCP-D, UK Biobank, ABCD	Accuracy, Mean Absolute Error (Age)
Phenotype Prediction	Psychological/cognitive phenotypes	HCP-YA, TCP	Balanced Accuracy, F1-Score
Disease Diagnosis	Multiple psychiatric and neurological disorders	HCP-EP, ABIDE, ADHD200, COBRE, UCLA, MND	Sensitivity, Specificity, AUC
fMRI Retrieval	Cross-modal retrieval (fMRI-to-image)	NSD, LAION-5B	Recall@K, Mean Average Precision
Task fMRI Analysis	tfMRI state classification	HCP-YA	Accuracy, F1-Score

Quantitative Performance Comparison

Foundation models consistently outperform traditional approaches across all benchmarked tasks, demonstrating superior generalization capabilities while maintaining high reproducibility.

Table 2: Performance Comparison Across fMRI Tasks (Select Results)

Task	Dataset	Traditional Methods	Foundation Model (NeuroSTORM)	Performance Gap
Gender Classification	HCP-YA	86.7-91.2% (ROI-based methods)	93.3%	+2.1-6.6%
Age Prediction	ABCD	MAE: 1.82 years (BNT)	MAE: 1.21 years	-0.61 years
Disease Diagnosis	ABIDE	70.4% (BrainGNN)	76.8%	+6.4%
Phenotype Prediction	TCP	65.1% (volume-based)	72.3%	+7.2%
fMRI Retrieval	NSD	mAP: 31.5 (CLIP-based)	mAP: 42.7	+11.2

Notably, foundation models demonstrate particular advantages in data-scarce scenarios. When fine-tuning data was limited to just 10% of training samples, NeuroSTORM maintained over 85% of its full-data performance across most tasks, significantly outperforming traditional methods that typically dropped to 60-70% of their full performance [6]. This robustness to limited training data highlights their potential for clinical applications where large annotated datasets are often unavailable.

Experimental Protocols for Foundation Model Evaluation

Protocol 1: Standardized Benchmarking of Foundation Models for fMRI Analysis

Purpose: To systematically evaluate the performance, reproducibility, and transferability of foundation models across diverse fMRI tasks using standardized preprocessing and evaluation metrics.

Materials:

Hardware: High-performance computing cluster with GPU acceleration (minimum 16GB VRAM)
Software: fMRIPrep 21.0.0+ for standardized preprocessing [34]
Foundation Models: NeuroSTORM (publicly available) or equivalent
Comparison Methods: Include both ROI-based (BrainGNN, BNT) and volume-based approaches
Datasets: Multi-site collections spanning healthy and clinical populations

Procedure:

Data Curation and Preprocessing
- Obtain datasets following BIDS standard to ensure compatibility
- Process all data through fMRIPrep with identical parameters across studies
- Generate and review quality control reports for each participant
- Exclude subjects failing quality thresholds (e.g., excessive motion)

Model Implementation and Training
- Initialize foundation model with pre-trained weights
- For each task, implement task-specific prompt tuning with â‰¤5% of model parameters
- Train comparison models using their recommended architectures and protocols
- Employ 5-fold cross-validation with consistent splits across all methods
Evaluation and Analysis
- Compute task-specific performance metrics on held-out test sets
- Assess reproducibility through multiple random initializations
- Evaluate cross-dataset transferability using leave-one-dataset-out validation
- Perform statistical significance testing across methods

Troubleshooting:

For memory limitations with 4D fMRI data, implement gradient checkpointing
If convergence issues arise, adjust learning rate schedules or prompt initialization
Address dataset imbalance through stratified sampling or loss reweighting

Protocol 2: Clinical Validation of Foundation Models

Purpose: To validate the clinical utility of foundation models for disease diagnosis and phenotype prediction in real-world patient populations.

Materials:

Clinical Datasets: TCP (245 participants), MND (59 participants) [6]
Reference Standards: Clinical diagnoses established by board-certified neurologists/psychiatrists
Hardware/Software: As in Protocol 1

Procedure:

Data Preparation
- Process clinical datasets through identical fMRIPrep pipeline as pre-training data
- Maintain strict separation between pre-training and clinical evaluation datasets
- Curate comprehensive clinical metadata for subgroup analyses

Model Adaptation
- Initialize with foundation model pre-trained on large-scale research datasets
- Employ task-specific prompt tuning with minimal parameter updates
- Implement rigorous cross-validation respecting patient group structure
Clinical Validation
- Compare performance against clinical checklists and standardized assessments
- Assess model calibration and confidence estimation for clinical decision support
- Conduct subgroup analyses across demographic and clinical variables
- Perform expert review of interpretation maps for clinical plausibility

Visualization of Experimental Workflows

Foundation Model Benchmarking Workflow

fMRIPrep Preprocessing Pipeline

Table 3: Key Research Reagents and Computational Resources

Resource	Type	Function/Benefit	Access
fMRIPrep	Software Pipeline	Robust, standardized preprocessing for task and resting-state fMRI; reduces technical variability	Open-source (https://fmriprep.org) [34]
NeuroSTORM	Foundation Model	General-purpose fMRI analysis with state-of-the-art performance across diverse tasks	GitHub (github.com/CUHK-AIM-Group/NeuroSTORM) [6]
UK Biobank	Dataset	Large-scale neuroimaging dataset (40,842 participants) for pre-training	Application-based access [6]
ABCD Study	Dataset	Developmental dataset (9,448 children) for evaluating age-related patterns	Controlled access [6]
HCP Datasets	Dataset	Multi-modal neuroimaging (HCP-YA, HCP-A, HCP-D) with high-quality data	Application-based access [6]
BIDS Standard	Framework	Brain Imaging Data Structure for organizing neuroimaging data	Open standard (bids.neuroimaging.io) [33]

Foundation models represent a transformative approach to fMRI analysis, demonstrating superior performance across diverse tasks while directly addressing critical challenges in preprocessing reliability and reproducibility. Through standardized benchmarking, these models have shown consistent improvements over traditional methods in demographic prediction, phenotype characterization, disease diagnosis, and cross-modal retrieval tasks.

The integration of foundation models with robust preprocessing pipelines like fMRIPrep offers a path toward more reproducible and clinically applicable fMRI analysis. Future developments should focus on expanding model interpretability through techniques such as attribution mapping [95], enhancing cross-modal capabilities for integrating fMRI with other data modalities [96], and developing more efficient adaptation mechanisms for clinical implementation. As these technologies mature, they hold significant promise for advancing both fundamental neuroscience and clinical applications in neurology and psychiatry.

The translation of functional magnetic resonance imaging (fMRI) from a powerful research tool into a reliable instrument for clinical diagnosis and prognosis hinges on the establishment of robust, standardized preprocessing pipelines. Functional Magnetic Resonance Imaging (fMRI) is crucial for studying brain function and diagnosing neurological disorders, yet the field remains fragmented across data formats, preprocessing pipelines, and analytic models [6]. This fragmentation challenges the reproducibility and transferability of findings, ultimately hindering clinical application [6]. In clinical settings, where decisions impact patient care, the validity of inference and interpretability of results are paramount [33]. Preprocessingâ€”the critical stage of cleaning and standardizing raw fMRI data before statistical analysisâ€”directly controls the accuracy of functional connectivity measures and behavioral predictions derived from the data [16]. While a large inventory of preprocessing tools exists, researchers have typically created ad-hoc workflows for each new dataset, leading to variability and questions about reliability [33] [55]. This article details the application notes and protocols for establishing gold-standard fMRI preprocessing pipelines that maximize diagnostic and predictive power for clinical translation, framed within a broader thesis on fMRI preprocessing pipeline reliability research.

Application Notes: Core Principles for Clinical fMRI Pipelines

The Imperative of Standardization and Automation

The neuroimaging community has increasingly recognized that manual, ad-hoc preprocessing workflows are a major source of analytical variability. A recent study investigating fMRI mega-analysesâ€”which combine data processed with different pipelinesâ€”found that analytical variability, if not accounted for, can induce false positive detections and lead to inflated false positive rates [97]. This underscores a critical principle for clinical translation: standardization is non-negotiable. Automated, analysis-agnostic tools like fMRIPrep address this challenge by providing a robust and convenient preprocessing workflow that automatically adapts to the idiosyncrasies of virtually any dataset, ensuring high-quality results with no manual intervention [33] [55]. fMRIPrep produces two key classes of outputs essential for clinical analysis: (1) preprocessed time-series data that have been cleaned and standardized, and (2) experimental confounds, such as physiological recordings and estimated noise sources, that can be used as nuisance regressors in subsequent analyses [33]. By introducing visual assessment checkpoints and leveraging the Brain Imaging Data Structure (BIDS), fMRIPrep maximizes transparency and shareability, key requirements for clinical research [33].

The Need for Population- and Pathology-Specific Customization

While standardization provides a necessary foundation, the "gold standard" for clinical translation must also accommodate the unique characteristics of specific patient populations. A one-size-fits-all approach is insufficient. This is particularly evident in stroke research, where lesions can introduce distinct artifacts. A 2025 study designed and evaluated three preprocessing pipelines for stroke patients: a standard pipeline, an enhanced pipeline accounting for lesions in tissue masks, and a stroke-specific pipeline incorporating independent component analysis (ICA) to address lesion-driven artifacts [16]. The results demonstrated that the stroke-specific pipeline significantly reduced spurious connectivity without impacting behavioral predictions, highlighting the need for tailored preprocessing strategies in clinical populations [16]. The researchers contributed to this goal by making their stroke-specific pipeline accessible via an open-source tool, fMRIStroke, to ensure replicability and promote best practices [16]. Similarly, studies in infant populations, where participants exhibit infrequent but large head motions, have motivated the development of specialized motion correction techniques like "JumpCor" to retain valuable data that would otherwise be discarded [98].

Embracing a New Paradigm with Foundation Models

The latest frontier in overcoming pipeline variability is the development of foundation models for fMRI analysis. These models are designed to be intrinsically generalizable across diverse experimental settings. NeuroSTORM, a recently introduced neuroimaging foundation model, learns generalizable representations directly from 4D fMRI volumes, bypassing the irreversible information loss and structural biases imposed by preprocessing steps that project data onto pre-defined brain atlases [6]. Pre-trained on a massive corpus of over 50,000 subjects, NeuroSTORM employs a spatial-temporal optimized pre-training strategy and task-specific prompt tuning to learn transferable fMRI features [6]. This approach has demonstrated outstanding performance across diverse downstream clinical tasks, including age and gender prediction, phenotype prediction, and disease diagnosis, maintaining high relevance in predicting psychological/cognitive phenotypes and achieving superior disease diagnosis performance on clinical datasets from multiple international hospitals [6]. This represents a paradigm shift towards models that enhance reproducibility by capturing noise-resilient patterns and reducing sensitivity to acquisition and preprocessing variations [6].

Protocols for Establishing and Validating Clinical Pipelines

Protocol 1: Implementing a Standardized Base Preprocessing Workflow

This protocol outlines the steps for implementing a standardized, high-quality base preprocessing pipeline suitable for a wide range of clinical fMRI data, using fMRIPrep as the exemplar.

I. Experimental Setup and Prerequisites

Software Environment: Utilize containerized software (Docker or Singularity) to ensure a consistent and reproducible computational environment. This mitigates "works on my machine" problems and guarantees that results can be replicated elsewhere [3].
Data Formatting: Organize input neuroimaging data according to the Brain Imaging Data Structure (BIDS) standard. BIDS allows the pipeline to precisely identify the structure of the input data and gather all available metadata with no manual intervention [33].

II. Detailed Methodology The fMRIPrep workflow is composed of dynamically assembled sub-workflows, combining tools from widely-used, open-source neuroimaging packages [33]. The major steps are:

Anatomical Data Processing:
- Brain Extraction: Removal of non-brain tissue from the T1-weighted (T1w) image using antsBrainExtraction.sh (ANTs) [33].
- Tissue Segmentation: Segmentation of the brain-extracted T1w image into cerebrospinal fluid (CSF), white matter (WM), and gray matter (GM) [33].
- Spatial Normalization: Normalization of the T1w image to a standard template (e.g., MNI space) using antsRegistration (ANTs) [33].
Functional Data Processing:
- Head-Motion Estimation and Correction: Estimation and correction of head motion using mcflirt (FSL) [33].
- Susceptibility-Distortion Correction (SDC): Application of one of several methods (e.g., 3dQwarp in AFNI, topup in FSL) to correct for distortions caused by magnetic field inhomogeneities. fMRIPrep includes a "fieldmap-less" SDC option when no fieldmap data is available [33].
- Co-registration: Alignment of the functional images to the subject's T1w anatomical image using bbregister (FreeSurfer) [33].
- Normalization: Transformation of the functional data into standard space using the transform derived from anatomical normalization [33].
- Confound Estimation: Generation of confounding time series based on motion parameters, tissue masks (e.g., global signal, CSF, WM), and framewise displacement [33].

III. Validation and Quality Control

Visual Reports: fMRIPrep automatically generates an individual, interactive HTML report per participant. These reports contain mosaic views of images at critical quality control points (e.g., after brain extraction, co-registration, normalization) and must be meticulously inspected by the researcher [33] [55].
Data Censoring: Implement "scrubbing" to remove individual volumes with excessive motion. A common metric is framewise displacement (FD), with a threshold of 0.2-0.5 mm often used to flag problematic volumes [98].

Table 1: Key Preprocessing Steps and Their Primary Tools in fMRIPrep

Preprocessing Task	Primary Tool(s) in fMRIPrep	Clinical Impact
Anatomical brain extraction	`antsBrainExtraction.sh` (ANTs)	Ensures accurate tissue segmentation and normalization.
Head-motion correction	`mcflirt` (FSL)	Reduces spurious correlations caused by subject movement [99].
Susceptibility distortion correction	`3dQwarp` (AFNI), `topup`/`fugue` (FSL)	Improves anatomical accuracy of functional localizations.
Spatial normalization	`antsRegistration` (ANTs)	Enables group-level analysis and comparison across subjects.
Confound estimation	In-house implementation, ICA-AROMA [33]	Provides nuisance regressors to clean BOLD signal of non-neural noise.

Protocol 2: Optimizing Pipelines for Specific Clinical Populations

This protocol details the adaptation of a standardized pipeline to address the unique challenges presented by specific clinical populations, such as stroke patients.

I. Experimental Setup and Prerequisites

Base Pipeline: An established preprocessing pipeline (e.g., fMRIPrep).
Population-Specific Knowledge: Defined lesion masks for patients or, in the case of infant studies, knowledge of specific motion patterns.

II. Detailed Methodology for Stroke fMRI Based on the 2025 study by Abraham et al. [16], a stroke-specific pipeline can be implemented through the following enhancements:

Lesion-Aware Tissue Masking: Manually or automatically defined lesion masks are incorporated when computing standard tissue masks (CSF, WM, GM). This prevents the misclassification of lesion voxels, which could otherwise introduce significant noise in confound estimation [16].
ICA-Based Denoising: An Independent Component Analysis (ICA) is run on the preprocessed data. The resulting components are automatically classified as signal or noise using a classifier like ICA-AROMA. Crucially, the spatial maps of noise components are then compared to the patient's lesion mask. Components that are highly weighted within the lesion territory are prioritized for removal, as they are likely driven by lesion-specific artifacts rather by neural activity [16].

III. Validation and Quality Control

Quantitative Metrics: The success of the pipeline optimization should be assessed using metrics such as functional connectivity contrast and the reduction of spurious connectivity in known artifact-prone networks [16].
Behavioral Correlation: The ultimate validation for a clinical pipeline is its utility in predicting clinically relevant outcomes. The pipeline's output (e.g., connectivity measures) should be tested for its power to predict behavioral deficits or recovery trajectories, ensuring the pipeline preserves neurologically meaningful signal [16].

Table 2: Comparison of Preprocessing Pipelines for Clinical fMRI

Pipeline Attribute	Standard Pipeline (e.g., fMRIPrep)	Stroke-Specific Pipeline [16]	Foundation Model (NeuroSTORM) [6]
Core Principle	Standardization & automation for robustness.	Customization for pathology-driven artifacts.	Generalizability via large-scale pre-training.
Key Innovation	Analysis-agnostic, BIDS-compliant workflow.	Integration of lesion masks into tissue segmentation and ICA denoising.	Direct 4D volume processing with spatial-temporal redundancy dropout.
Primary Clinical Strength	High reproducibility and transparency; reduces analyst-induced variability.	Significantly reduces spurious connectivity in lesioned brains.	State-of-the-art performance across diverse tasks (diagnosis, phenotype prediction).
Validation Metric	Visual quality control reports [33].	Reduction in spurious connectivity without impacting behavioral prediction [16].	Accuracy in disease diagnosis and cognitive phenotype prediction on external clinical datasets [6].

Protocol 3: Validating Pipelines with Motion-Controlled Data

Understanding and correcting for motion artifact is a cornerstone of reliable clinical fMRI. This protocol uses simulated data to rigorously validate motion correction methods.

I. Experimental Setup and Prerequisites

SIMPACE Sequence: Use a Simulated Prospective Acquisition Correction (SIMPACE) sequence to generate motion-corrupted MR data from an ex vivo brain phantom. This involves altering the imaging plane coordinates before each volume and slice acquisition to emulate realistic intervolume and intravolume motion, providing a "gold standard" dataset where ground truth is known [99].

II. Detailed Methodology

Data Generation: Acquire fMRI data from an ex vivo brain phantom using the SIMPACE sequence, injecting various patterns of intervolume (whole-brain between TRs) and intravolume (slice-specific during a TR) motion [99].
Pipeline Comparison: Process the simulated motion-corrupted data with different motion correction strategies:
- VOLMOCO: Standard volume-based motion correction (e.g., FSL's mcflirt) with 6 motion parameters and a residual motion regressor [99].
- Original SLOMOCO (oSLOMOCO): A slice-oriented motion correction method that includes 14 voxel-wise regressors [99].
- Modified SLOMOCO (mSLOMOCO): An enhanced pipeline incorporating 6 volume-wise and 6 slice-wise motion parameters, plus a novel voxel-wise Partial Volume (PV) regressor designed to address residual motion signals from resampling [99].
Performance Assessment: Calculate the standard deviation (SD) of the residual time-series signals in gray matter. A lower SD indicates more effective noise removal and a cleaner BOLD signal [99].

III. Validation and Quality Control

Quantitative Benchmarking: The mSLOMOCO pipeline with 12 parameters and the PV regressor was shown to reduce the average SD in gray matter by 29-45% compared to VOLMOCO and by 28-31% compared to oSLOMOCO, demonstrating its superior efficacy [99].
Application to In-Vivo Data: After validation with simulated data, the optimal pipeline (e.g., mSLOMOCO) should be applied to in-vivo clinical data, such as infant fMRI, where occasional large motions are common. The effectiveness can be measured by the increased amount of usable data and improved functional connectivity estimates after applying techniques like JumpCor [98].

The Scientist's Toolkit: Essential Research Reagents & Software

Table 3: Essential Software Tools for Clinical fMRI Pipeline Development

Tool Name	Type / Category	Primary Function in Pipeline Development
fMRIPrep [33] [55]	Integrated Preprocessing Pipeline	Provides a robust, standardized, and analysis-agnostic foundation for cleaning and preparing fMRI data.
SLOMOCO [99]	Specialized Motion Correction Tool	Offers advanced, slice-wise motion correction and denoising for challenging datasets with significant intravolume motion.
NeuroSTORM [6]	Foundation Model	Serves as an end-to-end, transferable model for diverse downstream clinical tasks (diagnosis, prediction), potentially reducing pipeline dependency.
Nipype [33] [3]	Workflow Engine	Facilitates the integration of tools from different software packages (AFNI, FSL, SPM, ANTs) into a single, automated workflow.
ICA-AROMA [33]	Denoising Tool	Provides a robust ICA-based strategy for automatically identifying and removing motion-related artifacts from fMRI data.
MRIQC [33]	Quality Control Tool	Automates the extraction of image quality metrics from fMRI and structural data, aiding in the objective assessment of data quality pre- and post-processing.

Workflow Visualization

The following diagram illustrates the logical progression and decision points involved in establishing a gold-standard clinical fMRI pipeline, integrating the principles and protocols discussed in this article.

Gold Standard Clinical fMRI Pipeline

The journey towards a true gold standard for clinical fMRI translation is not about finding a single, universal pipeline, but about establishing a rigorous, principled framework for pipeline selection, optimization, and validation. This framework rests on three pillars: the robust foundation provided by standardized, automated tools like fMRIPrep; the critical customization for the pathophysiological realities of specific clinical populations, as demonstrated in stroke research; and the forward-looking integration of powerful, generalizable foundation models like NeuroSTORM. By adhering to the detailed application notes and protocols outlined hereinâ€”which emphasize stringent quality control, validation against simulated and behavioral data, and the use of open-source, reproducible toolsâ€”researchers and clinicians can build fMRI processing workflows that truly maximize diagnostic and predictive power, thereby unlocking the full clinical potential of this transformative technology.

Conclusion

The journey toward reliable fMRI research requires a deliberate and evidence-based approach to preprocessing. Moving beyond ad-hoc workflows to standardized, validated pipelines is no longer optional but a fundamental requirement for reproducibility. As this article has detailed, this involves a deep understanding of foundational steps, careful selection from a growing ecosystem of robust methodologies, proactive optimization for specific data challenges, and rigorous validation using quantitative metrics. The future of the field points toward more automated, intelligent systems like foundation models and adaptive deep learning networks, which promise greater generalizability and clinical applicability. For researchers and drug development professionals, embracing these principles and tools is paramount to ensuring that fMRI findings are accurate, reliable, and ultimately capable of informing meaningful scientific and clinical decisions.