Optimal Parameter Tuning for iCanClean: A Comprehensive Guide to R2 and Window Size Selection for Mobile EEG

Charles Brooks Dec 02, 2025 48

This article provides a definitive guide for researchers and biomedical professionals on tuning the two critical parameters in the iCanClean algorithm—the R2 correlation threshold and the window length—for processing mobile...

Optimal Parameter Tuning for iCanClean: A Comprehensive Guide to R2 and Window Size Selection for Mobile EEG

Abstract

This article provides a definitive guide for researchers and biomedical professionals on tuning the two critical parameters in the iCanClean algorithm—the R2 correlation threshold and the window length—for processing mobile electroencephalography (EEG) data. We synthesize findings from recent peer-reviewed studies to establish foundational concepts, detail methodological applications for various experimental conditions, and offer troubleshooting strategies for optimization. A comparative analysis validates iCanClean's performance against other artifact removal methods like Artifact Subspace Reconstruction (ASR). The guide concludes with key takeaways and future directions for employing iCanClean in clinical and drug development research to achieve high-fidelity brain source identification during movement.

Understanding iCanClean: How R2 and Window Size Govern Motion Artifact Removal

Frequently Asked Questions (FAQs)

FAQ 1: What makes motion artifacts particularly problematic for ICA? Motion artifacts are highly problematic for Independent Component Analysis (ICA) because they violate the algorithm's core assumption of statistical independence between sources. The large amplitude and non-stationary nature of motion-induced signals can dominate the mixture, leading to an inaccurate decomposition. This often results in brain and artifact sources being improperly merged into single components, a failure known as "over-mixing," which makes it difficult to isolate and remove artifacts without also discarding neural data [1].

FAQ 2: How does movement intensity affect my ICA decomposition? Research shows that within individual studies, increased movement intensity significantly decreases ICA decomposition quality. The greater the motion, the more the recorded data is dominated by artifactual signals, which reduces the algorithm's ability to identify maximally independent brain components effectively [1].

FAQ 3: Can't I just use ICLabel to identify and remove motion artifact components after ICA? While ICLabel is a valuable tool, it has a significant limitation for mobile EEG data: its underlying classifier was not trained on mobile EEG data containing substantial motion artifacts. The presence of large motion artifacts can contaminate the ICA's ability to separate sources in the first place, meaning ICLabel may be working with already flawed components. Therefore, relying solely on post-ICA correction is often insufficient [2] [3].

FAQ 4: What is the fundamental difference between a method like iCanClean and traditional ICA? Traditional ICA is a blind source separation technique applied to the mixed EEG signals. In contrast, iCanClean is a preprocessing algorithm that uses reference noise signals (from dedicated sensors or created from the EEG itself) to identify and subtract noise subspaces before ICA is run. It leverages canonical correlation analysis (CCA) to find and remove components in the EEG that are highly correlated with known noise, thereby cleaning the data so that ICA can perform a more effective decomposition [4] [5] [2].

Troubleshooting Guide: Poor ICA Decomposition in Mobile EEG

Symptom

Your independent component analysis (ICA) yields few brain-like components, components that are a mixture of brain and artifact, or components that are poorly fit by a single dipole (high residual variance).

Diagnosis

This is a classic sign that motion artifacts are hindering the ICA decomposition. The large amplitude and complex nature of motion artifacts (from head movement, cable sway, electrode pops) are dominating the signal mixture, preventing the algorithm from cleanly separating brain sources.

Solution: Implementing a Pre-ICA Cleaning Pipeline

The most effective strategy is to reduce the motion artifact burden before performing ICA. The table below compares two prominent methods for this purpose.

Table 1: Comparison of Pre-ICA Motion Artifact Removal Methods

Method	How It Works	Key Parameters	Optimal Use Case
Artifact Subspace Reconstruction (ASR)	Uses a sliding-window PCA to identify and remove high-variance signal subspaces that deviate from a clean baseline recording [2] [3].	`k`: Standard deviation threshold for artifact rejection. A lower `k` is more aggressive. A `k` between 10-20 is often used for locomotion data to avoid over-cleaning [2].	Effective for removing large, transient motion artifacts when a clean baseline period is available.
iCanClean	Leverages canonical correlation analysis (CCA) and reference noise signals (from dual-layer electrodes or created as pseudo-references) to subtract noise subspaces from the EEG [4] [5].	`R²`: Correlation threshold (aggressiveness). Window Length: Segment size for CCA. Optimal parameters for walking data are often an `R²` of 0.65 and a 4-second window [4] [5].	Superior performance when dedicated noise references are available. Also effective with pseudo-reference signals, making it highly versatile [6] [2].

Action Plan

Preprocess Your Data: Apply a high-pass filter (e.g., 1 Hz cutoff) and remove bad channels.
Apply a Cleaning Algorithm: Choose either ASR or iCanClean based on your data and setup. For iCanClean, start with the recommended parameters of R²=0.65 and a 4-s window [4] [5].
Run ICA: Perform ICA on the cleaned data using a reliable algorithm (e.g., AMICA).
Validate: Check if the number of "good" (dipolar, brain-like) components has increased. For example, iCanClean has been shown to improve the average number of good components from 8.4 to 13.2 (+57%) in walking data [4].

The following tables consolidate key performance metrics from recent studies to aid in the evaluation and selection of artifact removal methods.

Table 2: iCanClean Performance Metrics on Walking EEG Data

Study	Key Metric	Performance with iCanClean	Performance Baseline
Gonsisko et al. (2023) [4] [5]	Number of "good" ICA components (dipolar, brain-like)	13.2	8.4
Gonsisko et al. (2023) [4] [5]	Effective number of noise channels	16-64 channels maintained good performance (12.0-12.7 good components)	—

Table 3: Comparative Performance in Running EEG Paradigm (Ledwidge et al., 2025) [6] [2]

Method	ICA Component Dipolarity	Reduction at Gait Frequency	P300 ERP Congruency Effect Recovered?
iCanClean (pseudo-reference)	Most effective	Significant reduction	Yes
Artifact Subspace Reconstruction (ASR)	Improved	Significant reduction	No
Standard Preprocessing	Least effective	No significant reduction	No

Experimental Protocols for Method Validation

Researchers validating artifact removal methods often use the following experimental paradigms and metrics.

Protocol 1: Validating with a Flanker Task During Locomotion

This protocol tests whether neural markers can be recovered after artifact removal in a dynamic setting [6] [2].

Task: Participants perform a cognitive (Flanker) task while either standing statically (control condition) or jogging (test condition).
EEG Recording: Mobile EEG is recorded during both conditions.
Analysis:
- Preprocess the dynamic data with different methods (e.g., iCanClean, ASR).
- Run ICA and compute the number of dipolar brain components.
- Calculate the power at the step frequency and its harmonics to quantify residual motion artifact.
- Extract and compare the P300 event-related potential (ERP) from both static and dynamic conditions. A successful method will show the expected "congruency effect" (higher P300 amplitude for incongruent stimuli) in the dynamic condition.

Protocol 2: Parameter Sweep for Algorithm Tuning

This methodology is used to determine the optimal settings for a cleaning algorithm like iCanClean [4] [5].

Data: High-density EEG recorded during walking, ideally with a dual-layer setup (120 scalp electrodes + 120 noise electrodes).
Parameter Sweep:
- Vary R² threshold: Test values from 0.05 to 1.00 in increments of 0.05.
- Vary window length: Test different window sizes (e.g., 1 s, 2 s, 4 s, and an infinite window using the entire dataset).
Quality Metric: For each parameter combination, run ICA and count the number of "good" independent components. A good component is typically defined as one that is well-localized by a single dipole (residual variance < 15%) and has a high probability of being a brain source according to ICLabel (> 50%) [4] [5].

Research Reagent Solutions

Table 4: Essential Materials for Advanced Mobile EEG Research

Item / Technique	Function in Research
Dual-Layer EEG Cap	A cap with two layers of electrodes: scalp electrodes that record brain signal + noise, and mechanically coupled outward-facing electrodes that record only environmental and motion noise. This provides an ideal reference for noise-cancellation algorithms like iCanClean [5].
High-Density EEG Systems (64+ channels)	Provides adequate spatial resolution (~1 cm) for source-level analysis and improves the performance of blind source separation techniques like ICA by providing more spatial information [5].
iCanClean Algorithm	A dedicated preprocessing algorithm that uses canonical correlation analysis (CCA) to remove motion and other artifacts by leveraging reference noise signals, thereby improving subsequent ICA decomposition [4] [5].
AMICA Algorithm	An adaptive mixture ICA algorithm considered one of the most powerful implementations for decomposing EEG data, including data from mobile paradigms [1].
ICLabel	A convolutional neural network-based classifier that automatically labels independent components (e.g., as brain, muscle, eye, noise). It is a standard tool for post-ICA evaluation [4] [5].

Technical Diagrams

Motion Artifact Impact on ICA

iCanClean Cleaning Pipeline

Frequently Asked Questions

What is the iCanClean R2 parameter? The R2 parameter is a correlation threshold between 0 and 1 that determines the cleaning aggressiveness of the iCanClean algorithm. It works by using Canonical Correlation Analysis (CCA) to identify and remove noisy subspaces from EEG data that are correlated with reference noise recordings [7] [3]. A lower R2 value means the algorithm will remove components that have even a weak correlation with noise, resulting in more aggressive cleaning. A higher R2 value (closer to 1) protects more of the signal by only removing components with a very strong correlation to noise, resulting in less aggressive cleaning [7].

What are the optimal R2 settings for mobile EEG data? Based on a systematic parameter sweep, the optimal setting for cleaning mobile EEG data during walking was determined to be an R2 value of 0.65 combined with a 4-second window length [7]. This configuration significantly improved the quality of the Independent Component Analysis (ICA) decomposition.

How does the R2 parameter interact with the window length? The R2 threshold is applied within a specific time window that slides through the data. The window length determines the segment of data used to calculate the local correlation between cortical electrodes and noise electrodes [7]. Shorter windows (e.g., 1-2 seconds) can adapt to rapidly changing noise but may be less stable, while the recommended 4-second window provides a good balance for capturing the structure of motion artifacts during walking [7].

What happens if I set the R2 value too low or too high? Setting the R2 value too low (e.g., below 0.5) risks over-cleaning, where genuine brain signal components may be mistakenly identified as noise and removed from the data [7] [3]. Conversely, setting the R2 value too high (e.g., above 0.8) leads to under-cleaning, leaving a significant amount of motion artifact in the data, which can hinder subsequent source analysis with ICA [7].

Can I use iCanClean effectively with a standard EEG system? Yes, while iCanClean is ideally used with a dual-layer EEG cap that has dedicated noise sensors, it can also generate "pseudo-reference" noise signals from standard EEG data. This is often done by applying a temporary notch filter to the raw EEG to isolate noise in a specific frequency band, such as below 3 Hz, which is then used for the CCA-based cleaning [3].

Experimental Protocols & Data

The following quantitative data on R2 parameter tuning is drawn from a foundational study that performed a comprehensive parameter sweep [7].

Table 1: Parameter Sweep Results for R2 and Window Length This table shows how the number of "good" independent components (ICs)—defined as being well-localized (Residual Variance < 15%) and having a high brain probability (ICLabel > 50%)—changes with different settings.

R2 Value	Window Length	Avg. Number of Good ICs	Performance Notes
0.65	4 seconds	13.2	Optimal performance [7]
Varied (0.05-1.0)	1 second	< 13.2	Less stable cleaning [7]
Varied (0.05-1.0)	2 seconds	< 13.2	Less stable cleaning [7]
Varied (0.05-1.0)	Infinite	< 13.2	Less adaptive to non-stationary noise [7]
Baseline (No cleaning)	N/A	8.4	Performance before iCanClean processing [7]

Table 2: Performance with a Reduced Number of Noise Channels This table demonstrates that iCanClean remains effective even when the number of available noise channels is reduced, which is relevant for systems with fewer reference sensors. The data was obtained using the optimal R2 value of 0.65 and a 4-second window [7].

Number of Noise Channels	Avg. Number of Good ICs
120 (Full Set)	13.2
64	12.7
32	12.2
16	12.0

Experimental Methodology from Cited Studies

The primary study that established the optimal R2 value involved 45 participants across three groups: young adults, high-functioning older adults, and low-functioning older adults [7]. The key experimental steps were:

Data Collection: Mobile high-density EEG was recorded using a dual-layer cap with 120 scalp electrodes and 120 outward-facing noise electrodes during treadmill walking. Participants walked for approximately 48 minutes under various terrain and speed conditions [7].
Basic Preprocessing: Data was high-pass filtered at 1 Hz and average re-referenced. Abnormal channels were rejected based on amplitude (standard deviation > 3 times the median) [7].
Parameter Sweep Execution: The preprocessed data was cleaned using iCanClean across a full factorial combination of parameters. The R2 threshold was varied from 0.05 to 1.0 in increments of 0.05, and the window length was tested at 1s, 2s, 4s, and infinite (the entire dataset) [7].
Outcome Measurement: Each cleaned dataset was decomposed using ICA. The quality of the decomposition was measured by counting the number of "good" independent components that were both dipolar (Residual Variance < 15%) and brain-like (ICLabel probability > 50%) [7].

A subsequent 2025 study validated these findings in a running paradigm, confirming that iCanClean (with both dual-layer and pseudo-reference signals) improved ICA dipolarity and enabled the recovery of expected event-related potential components, outperforming other common methods like Artifact Subspace Reconstruction (ASR) [3].

Workflow Diagram

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Key Materials and Software for iCanClean Research

Item	Function / Description
Dual-Layer EEG Cap	A specialized cap where scalp electrodes are mechanically coupled with outward-facing noise electrodes. The noise electrodes record motion artifacts without brain signals, providing an ideal reference for iCanClean [7] [3].
High-Density EEG System	An EEG system with 64 or more channels, providing the spatial resolution needed for effective source separation using ICA, which is crucial for validating cleaning outcomes [7].
MATLAB	The primary computational environment used for implementing iCanClean and associated data processing scripts in the cited studies [7].
EEGLAB	An interactive MATLAB toolbox used for processing, analyzing, and visualizing EEG data. It is essential for performing ICA, dipole fitting, and using the ICLabel classifier [7] [3].
ICLabel	An EEGLAB plugin that uses a trained convolutional neural network to automatically classify independent components as brain, muscle, eye, heart, line noise, or other. It is a key metric for evaluating cleaning success [7] [3].
DIPFIT (EEGLAB Plugin)	A toolbox within EEGLAB used to localize the neural sources of independent components by fitting an equivalent dipole model. A low residual variance (< 15%) is a marker of a high-quality, brain-like component [7].

Frequently Asked Questions (FAQs)

Q1: What is the "window length" parameter in the iCanClean algorithm? The window length is a key user-selectable parameter in the iCanClean algorithm that determines the duration of the data segment used to calculate local correlations between cortical EEG electrodes and reference noise electrodes. It controls the timescale over which motion artifacts are identified and removed [7] [5]. Shorter windows (e.g., 1-2 seconds) can capture rapidly changing noise, while longer windows provide a more global noise estimate.

Q2: How does window length interact with the R² cleaning aggressiveness setting? Window length and the R² threshold work together to determine cleaning performance. The R² threshold (ranging from 0 to 1) defines the correlation level above which data subspaces are considered noise and removed, with lower values being more aggressive [7]. The optimal window length provides the temporal framework for calculating these correlations. Research has found that a 4-second window paired with an R² value of 0.65 provides optimal results for gait-related motion artifacts [7].

Q3: What happens if I choose a window length that is too short or too long? An improperly chosen window length can reduce cleaning efficacy [7]:

Too Short (e.g., 1s): May be insufficient to stably capture the correlation structure between EEG and noise channels, potentially leading to under-cleaning or over-cleaning depending on the R² setting.
Too Long (e.g., "Infinite"): Uses the entire dataset to calculate a single, global noise model, which may fail to adapt to non-stationary, time-varying motion artifacts that occur during different movement phases.

Q4: What is the empirically determined optimal window length for mobile EEG during walking? A comprehensive parameter sweep demonstrated that a 4-second window was optimal for cleaning high-density mobile EEG data collected during walking on various terrains [7]. This timescale effectively captured the noise structure associated with gait and other whole-body movements.

Q5: Can I use fewer noise channels than the 120 used in the original study? Yes, iCanClean maintains good performance even with reduced noise channels. Testing with 64, 32, and 16 noise channels showed only a gradual decline in the number of "good" independent components identified after cleaning [7]. This makes the method applicable to systems with fewer available reference channels.

Troubleshooting Guides

Problem: Poor ICA Decomposition After iCanClean Cleaning

Description: After running iCanClean, the subsequent Independent Component Analysis (ICA) yields few brain-like components, as determined by dipole localization and ICLabel classification.

Potential Causes and Solutions:

Cause: Overly Conservative Cleaning (R² too high)
- Solution: Reduce the R² value to be more aggressive. A high R² threshold (e.g., near 1.0) removes only the most strongly correlated noise, leaving significant artifacts that can corrupt the ICA decomposition [7]. Try systematically lowering the R² value, using 0.65 as a starting point [7].
Cause: Suboptimal Window Length
- Solution: Adjust the window length to match the dynamics of your noise. For walking data, the 4-second window is optimal [7]. If your experiment involves different types of movement (e.g., very rapid head turns), you may need to test shorter windows (1s or 2s) to see if they better capture the transient noise.
Cause: Insufficient Number of Noise Reference Channels
- Solution: Ensure you are using an adequate number of noise channels. While the algorithm works with 16 channels, performance is better with 32 or 64 [7]. If possible, use the maximum number of functional noise channels available in your setup.

Problem: Excessive Removal of Neural Signal

Description: The cleaned EEG data appears over-processed, with attenuated event-related potentials or a loss of high-frequency brain activity.

Potential Causes and Solutions:

Cause: Overly Aggressive Cleaning (R² too low)
- Solution: Increase the R² threshold. A very low R² value will remove any signal subspace that shows even a weak correlation with the noise references, which can include brain signals. Increase the R² value to preserve more data [7].
Cause: Mismatch Between Noise and EEG Channels
- Solution: Verify the integrity of your noise channels during preprocessing. Ensure that noisy or malfunctioning noise channels are identified and removed, as they can provide an erroneous noise reference that leads the algorithm to incorrectly remove brain activity [7].

Experimental Protocols & Data

Protocol: Parameter Sweep for Optimizing Window Length and R²

This protocol is derived from the methodology used to establish optimal iCanClean settings [7].

1. Objective: To systematically determine the optimal window length and R² threshold for cleaning mobile EEG data in a specific experimental context.

2. Materials and Setup:

EEG System: A dual-layer EEG cap with a minimum of 120 scalp electrodes and 120 outward-facing noise electrodes is recommended [7].
Data: Mobile EEG data (e.g., collected during walking) from multiple participants. The study used 45 participants across different age and functional groups [7].
Software: MATLAB, EEGLAB toolbox, and the iCanClean algorithm [7].

3. Procedure:

Step 1: Basic Preprocessing. High-pass filter the data (1 Hz cutoff), apply average reference separately to EEG and noise channels, and reject outlier channels [7].
Step 2: Set Parameter Ranges.
- Window Length: Test values of 1 second, 2 seconds, 4 seconds, and an "infinite" window (entire recording) [7].
- R² Threshold: Test a range from 0.05 to 1.00 in increments of 0.05 [7].
Step 3: Run iCanClean Parameter Sweep. Process the preprocessed data through the iCanClean algorithm for every combination of window length and R² value.
Step 4: Perform ICA. Decompose each cleaned dataset using an ICA algorithm (e.g., AMICA is recommended) [7].
Step 5: Identify "Good" Components. Classify independent components (ICs) as 'good' based on two primary criteria:
- Dipole Fit: Residual Variance (RV) of the component's topographic map should be less than 15% [7].
- Brain Probability: The ICLabel classifier should assign a probability greater than 50% that the component is a brain source [7].
Step 6: Analyze Results. The optimal parameter set is the one that maximizes the number of 'good' brain components per subject.

4. Outcome Measures:

Primary: Average number of 'good' ICs per subject.
Secondary: Impact of reduced noise channels (64, 32, 16) on performance at optimal settings.

Table 1: Performance of iCanClean at Optimal vs. Baseline Settings [7]

Condition	Window Length	R² Value	Average Number of "Good" ICs	Performance Change
Basic Preprocessing (Baseline)	Not Applicable	Not Applied	8.4	Baseline
iCanClean (Optimal)	4 seconds	0.65	13.2	+57%
iCanClean (Reduced Channels)
... with 64 Noise Channels	4 seconds	0.65	12.7	+51%
... with 32 Noise Channels	4 seconds	0.65	12.2	+45%
... with 16 Noise Channels	4 seconds	0.65	12.0	+43%

Table 2: Key Research Reagents and Solutions for iCanClean Protocol [7] [5]

Item	Function / Relevance in the Protocol
Dual-Layer EEG Cap	Provides mechanically coupled noise reference electrodes essential for the iCanClean algorithm. The original study used a 120+120 electrode configuration [7].
iCanClean Algorithm	The core cleaning algorithm that uses Canonical Correlation Analysis (CCA) to remove motion artifacts based on correlations with reference noise signals [7].
AMICA (Adaptive Mixture ICA)	An implementation of Independent Component Analysis used for source separation after cleaning. It was identified as a high-performing ICA algorithm [7].
ICLabel Classifier	A convolutional neural network-based tool for automatically classifying independent components into categories like brain, muscle, eye, and noise [7].
Dipole Fitting Tool (e.g., DIPFIT)	Used to localize the source of an independent component and calculate its Residual Variance (RV), a measure of how well its topography is explained by a single dipole [7].

Workflow and Conceptual Diagrams

iCanClean Processing and Evaluation Workflow

Diagram Title: iCanClean Parameter Optimization and Evaluation Workflow

Logic of Parameter Selection for Window Length

Diagram Title: Logic for Selecting an Initial Window Length

Frequently Asked Questions: Hardware & Setup

Q1: What is dual-layer EEG and how does its hardware setup function? A dual-layer EEG system employs two layers of electrodes: a scalp layer that records a mixture of brain signals and motion artifacts, and a noise layer with electrically isolated electrodes that record primarily motion and non-biological artifacts [8]. These two layers are mechanically joined using 3D-printed couplers, and their wires are secured together with tape, ensuring both sets of cables experience nearly identical motion, which is a primary source of artifact [8] [9].

Q2: What are the critical steps for setting up a dual-layer EEG system for a mobile experiment?

Electrode Placement: Verify that scalp and noise electrode impedances are below 20 kΩ for optimal signal quality [8].
Cable Management: Shorten ribbon cables to minimize swing (e.g., 14 cm used in table tennis studies) and secure cables to the cap using tape or velcro to reduce movement-induced artifacts [8] [10].
Amplifier Configuration: Use separate, electrically isolated amplifiers for the scalp and noise layers. The reference and ground electrodes for each layer should also be kept separate [8].
System Portability: House amplifiers in a secure, comfortable backpack worn by the participant to facilitate whole-body movement without restricting motion [8].

Q3: Why are my noise channels not correlating well with motion artifacts in the scalp data? This is often due to improper mechanical coupling. Ensure that the scalp and its corresponding noise electrode are firmly joined with a 3D-printed coupler and that their wires are taped together along their entire length. If the cables move independently, the noise channel will not accurately capture the artifact profile affecting the scalp channel [8].

Troubleshooting Guides

Problem: Poor quality Independent Components (ICs) after ICA decomposition.

Potential Cause 1: Inadequate artifact reference from the noise layer.
Solution: Use an algorithm like iCanClean that leverages the dual-layer noise electrodes to identify and remove artifact subspaces before performing ICA. This has been shown to yield a higher number of "good" brain components [8] [4].
Potential Cause 2: Insufficient recording time for a stable ICA decomposition.
Solution: For high-density mobile EEG, record at least 30 minutes of data at a sampling frequency of 500 Hz or higher to provide enough data for ICA to reliably separate sources [11].

Problem: Persistent motion artifacts despite using a dual-layer system.

Potential Cause: Cable sway is a dominant source of motion artifact, often more significant than head acceleration itself [8].
Solution: Re-evaluate the physical setup. Further shorten and bundle cables, and use a neoprene cap or similar solution to fix cables firmly in place, minimizing their independent movement [10].

Problem: Muscle artifacts (EMG) contaminating the signal.

Potential Cause: Muscle contractions from the neck, jaw, or face during movement.
Solution: Incorporate dedicated EMG electrodes (e.g., on neck muscles) to provide a reference signal. The iCanClean algorithm can use such reference signals to remove these biological artifacts effectively [8] [11].

Performance Data for Parameter Tuning

The following table summarizes key quantitative findings from research using dual-layer EEG and the iCanClean algorithm, which are critical for informing parameter tuning in mobile EEG studies.

Study Focus	Key Metric	Performance Before Processing	Performance After Processing	Recommended Parameters / Conditions
iCanClean Parameter Sweep (Walking) [4]	Number of "good" ICA components	8.4 components	13.2 components (+57%)	Window Length: 4 secondsR² Aggressiveness: 0.65
iCanClean on Phantom Head (All Artifacts) [11]	Data Quality Score (correlation with ground truth)	15.7%	55.9%	iCanClean outperformed ASR, Auto-CCA, and Adaptive Filtering.
Noise Channel Reduction Test [4]	Number of "good" ICA components	-	12.7 (64 channels)12.2 (32 channels)12.0 (16 channels)	Performance maintained even with a reduced set of noise channels.

Experimental Protocol: Validating Setup with a Table Tennis Paradigm

This protocol, adapted from Studnicki et al. (2022), provides a robust methodology for testing dual-layer EEG hardware and processing pipelines in a high-motion environment [8].

1. Objective: To characterize and remove motion artifacts in EEG data during a discrete, responsive, whole-body task (table tennis) and to identify optimal processing strategies.

2. Participant Preparation:

Apply a custom dual-layer EEG cap (e.g., 120 scalp electrodes + 120 noise electrodes).
Repurpose select scalp electrodes (e.g., TP9, P9, PO9, O9, O10, PO10, P10, TP10) to record neck muscle EMG activity.
Ensure all electrode impedances are below 20 kΩ.
Securely fit a backpack containing the amplifiers on the participant's upper back.

3. Data Acquisition & Synchronization:

Record EEG data at a minimum of 500 Hz using multiple synchronized amplifiers.
Place Inertial Measurement Units (IMUs) on the participant's head/body, paddles, and ball machine to capture movement kinematics and event timing.
Use a synchronization pulse (e.g., from an Arduino timer module every five seconds) to align EEG and IMU data streams.

4. Experimental Tasks:

The participant engages in multiple 15-minute blocks of table tennis.
Blocks should include cooperative and competitive rallies with a human opponent, as well as drills using a ball machine.
This variety ensures the data contains a mix of predictable and responsive whole-body movements.

5. Data Processing & Analysis:

Correlation Analysis: Correlate individual scalp channels with their noise-matched counterparts and with head/body acceleration data to confirm that noise channels best capture the artifact profile.
Component Analysis: Process the data using ICA with and without the aid of the dual-layer noise electrodes (e.g., using the iCanClean algorithm). The quality of independent components can be assessed by the fit of a dipole model and an automated labeling algorithm (e.g., ICLabel). The number of well-localized brain components is the primary outcome measure.

The Scientist's Toolkit: Essential Research Reagents & Materials

Item	Function / Explanation
Dual-Layer EEG Cap	A cap with mechanically coupled scalp and noise electrodes (e.g., 120+120 channels). The noise layer provides a dedicated reference for motion artifacts [8] [9].
Portable Amplifiers	Lightweight, battery-powered amplifiers (e.g., LiveAmp) that enable mobile brain imaging in real-world settings [8].
Conductive Fabric	Acts as an artificial skin circuit to bridge noise electrodes, completing the electrical pathway for artifact recording [8].
Inertial Measurement Units (IMUs)	Placed on the participant, equipment, and environment to synchronize motion kinematics with brain data and mark behavioral events [8] [9].
iCanClean Algorithm	A cleaning algorithm that uses canonical correlation analysis on scalp and noise electrodes to find and reject artifact subspaces, improving subsequent ICA [4] [11].
Neck EMG Electrodes	Repurposed EEG electrodes placed on neck muscles to provide a reference signal for myogenic artifacts, which can be used by cleaning algorithms like iCanClean [8] [11].
3D-Printed Cable Couplers & Cases	Custom components to mechanically join scalp/noise electrodes and securely house amplifiers, which are critical for system integrity and portability [8].

Signaling Pathways and Workflows

iCanClean Artifact Removal Flow

Experimental Workflow for Validation

A Practical Protocol: Implementing iCanClean with Optimized Parameters for Your Study

Frequently Asked Questions (FAQs)

Q1: What are the established starting parameters for cleaning mobile EEG data collected during walking? Based on a comprehensive parameter sweep using high-density, dual-layer EEG data from participants walking on a treadmill, the optimal parameters for the iCanClean algorithm are an R2 value of 0.65 and a window length of 4 seconds [7]. These settings were determined to maximize the number of "good" independent components recovered after ICA decomposition.

Q2: Why is an R2 value of 0.65 recommended, and what happens if I use a more or less aggressive value? The R2 threshold controls the cleaning aggressiveness. A higher R2 value (closer to 1) results in less cleaning, while a lower value (closer to 0) results in more aggressive cleaning [7]. The parameter sweep found that an R2 of 0.65 optimally balances the removal of motion artifacts with the preservation of underlying brain signals. Straying significantly from this value may result in either insufficient cleaning (leaving too much noise) or over-cleaning (removing brain activity).

Q3: Can I use these parameters with a standard EEG system, or do I need a special cap? The original validation was performed using a dual-layer EEG cap with 120 scalp electrodes and 120 dedicated noise electrodes [7]. However, the study also demonstrated that good performance can be maintained with a reduced set of noise channels. The parameters are effective with 64, 32, or even 16 noise channels, though the number of high-quality brain components identified may decrease slightly as noise channels are reduced [7].

Q4: How much improvement in data quality can I expect using these parameters? In the validation study, using the optimal parameters (R2=0.65, 4-s window) improved the average number of "good" independent components (ICs) from 8.4 to 13.2, an increase of 57% [7]. "Good" components were defined as those well-localized by a dipole model (residual variance < 15%) and with a high probability of being brain activity (ICLabel > 50%).

Q5: Are these parameters specific to certain populations? The study tested these parameters across three different participant groups: young adults, high-functioning older adults, and low-functioning older adults [7]. The optimal parameters were consistent across these groups, suggesting they are a robust starting point for neurotypical adult populations.

Troubleshooting Guide

Problem	Possible Cause	Solution
Poor ICA decomposition after cleaning.	Over-cleaning; R2 value is too low, removing brain signals along with noise.	Increase the R2 value (e.g., to 0.7 or 0.75) to make the algorithm less aggressive [7].
Residual motion artifacts are still visible in the data.	Under-cleaning; R2 value is too high, leaving too much noise in the data.	Decrease the R2 value (e.g., to 0.55 or 0.6) to increase cleaning aggressiveness [7].
Cleaning performance is lower than expected with a reduced number of noise channels.	The algorithm has insufficient reference information to isolate noise subspaces effectively.	Ensure noise channels are evenly spaced around the scalp. Use the `loc_subsets` function in EEGLAB to select the most geometrically representative channels [7].
The algorithm struggles with very high-frequency noise (e.g., muscle artifacts).	The 4-second window may be too long to capture non-stationary, high-frequency bursts.	Shorten the window length (e.g., to 2 seconds) to better capture and remove local, high-frequency artifacts [7].

Methodology for Parameter Optimization

The following workflow was used to establish the optimal R2 and window length parameters [7]:

Quantitative Performance of iCanClean Parameters

Table 1: Improvement in ICA Decomposition Quality with Optimal iCanClean Settings [7]

Condition	Average Number of "Good" ICs	Change from Baseline
Before iCanClean (Baseline)	8.4	-
After iCanClean (R2=0.65, 4-s window)	13.2	+57%

Table 2: Performance with a Reduced Number of Noise Channels (using R2=0.65, 4-s window) [7]

Number of Noise Channels	Average Number of "Good" ICs
120 (Full Set)	13.2
64	12.7
32	12.2
16	12.0

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Key Materials and Software for iCanClean Experiments

Item	Function / Description	Role in the Protocol
Dual-Layer EEG Cap	A specialized cap with scalp electrodes and mechanically coupled, outward-facing noise electrodes [7].	Provides concurrent recordings of brain signals (EEG channels) and reference noise (noise channels) essential for the iCanClean algorithm.
High-Density EEG System	An EEG system capable of recording from 64+ channels, often 120+ channels in validation studies [7].	Ensures adequate spatial sampling for effective source separation using ICA after cleaning.
iCanClean EEGLAB Plugin	The software implementation of the iCanClean algorithm, available as a plugin for the EEGLAB environment [12].	Performs the core cleaning function using Canonical Correlation Analysis (CCA) to remove artifact subspaces.
Independent Component Analysis (ICA)	A blind source separation algorithm (e.g., AMICA, Infomax) to decompose cleaned EEG data into independent components [7].	Used after iCanClean processing to isolate brain and non-brain sources for analysis.
ICLabel	A classifier that automatically labels independent components based on their type (e.g., brain, muscle, eye) [7].	Provides a quantitative metric (brain probability) for identifying "good" brain components post-ICA.
Dipolar Source Localization	A method for fitting an equivalent current dipole to an independent component's scalp topography [7].	Provides a quantitative metric (Residual Variance) for identifying well-localized, "good" brain components.

Parameter Decision Workflow

The following diagram outlines the logic for adjusting parameters based on your cleaning goals and data quality:

Frequently Asked Questions

What is a parameter sweep and why is it critical for iCanClean research? A parameter sweep is a systematic research approach where multiple parameters of an algorithm are varied across a defined range of values to determine the optimal combination for a specific goal [13]. For iCanClean, a novel algorithm for cleaning motion artifacts from mobile EEG data, conducting a parameter sweep is essential because its performance is highly dependent on two key user-defined parameters: the R² threshold (cleaning aggressiveness) and the window length (the segment of data used for local correlation calculations) [5]. Empirical optimization ensures the algorithm removes noise without accidentally degrading the underlying brain signals of interest [14] [5].

What are the optimal parameter values for iCanClean when processing walking data? Research involving high-density EEG recorded during walking has identified an optimal window length of 4 seconds and an optimal R² threshold of 0.65 [5] [2]. This combination significantly improved the quality of the subsequent independent component analysis (ICA), increasing the average number of "good" brain components extracted from the data by 57% [5].

Can I use iCanClean if I don't have a dual-layer EEG system with dedicated noise sensors? Yes, iCanClean can still be implemented using "pseudo-reference" noise signals derived from the raw EEG data itself [2]. This is typically done by applying a temporary notch filter to the EEG to isolate noise within a specific frequency band (e.g., below 3 Hz for motion artifacts), which then serves as the reference for the canonical correlation analysis (CCA) [2].

Troubleshooting Guides

Problem: Overcleaning—Suspected Loss of Brain Signal After iCanClean

Description: After processing with iCanClean, the resulting data appears too clean, and known neural signals, such as event-related potentials (ERPs), are diminished or absent.
Potential Cause: The R² threshold is set too low, making the algorithm overly aggressive. A low R² value causes iCanClean to remove components with lower correlations to the noise reference, which risks also removing neural activity that shares some properties with the noise [5].
Solution:
- Verify with a Ground Truth: If possible, compare the cleaned data to a recording from a static condition (e.g., seated or standing) where the neural response is well-established [2].
- Adjust the R² Parameter: Increase the R² threshold to be less aggressive. The parameter sweep study found that a value of 0.65 was optimal for walking [5]. If you are working with a different type of movement, you may need to run a new parameter sweep to find a better value, starting with a higher R² (e.g., 0.7-0.8).
- Inspect Components: Use tools like ICLabel to check if independent components classified as "brain" have been inadvertently removed.

Problem: Ineffective Cleaning—Motion Artifacts Persist After Processing

Description: Visual inspection of the EEG data or spectral analysis shows that significant motion artifact remains after running iCanClean.
Potential Causes:
- The R² threshold is set too high, making the algorithm not aggressive enough [5].
- The window length is too short to capture the characteristic pattern of the motion artifact [5].
- The pseudo-reference noise signals were not properly constructed and do not adequately represent the true noise subspace [2].
Solution:
- Lower the R² Threshold: Gradually decrease the R² value (e.g., from 0.65 to 0.55) to remove more correlated components. Monitor the data to avoid overcleaning.
- Adjust the Window Length: For rhythmic motion like walking or running, a 4-second window is effective [5] [2]. For more sporadic motion, test longer windows to capture a more representative noise profile.
- Re-evaluate Pseudo-Reference Creation: Ensure the filtering method used to create pseudo-references effectively isolates the frequency band of your specific motion artifact.

Problem: Inconsistent Results Across Participants or Experimental Conditions

Description: iCanClean parameters that worked well for one participant or task are ineffective for another.
Potential Cause: The amount and type of noise can vary substantially between individuals (e.g., due to cap fit, hair density, or movement style) and between different tasks (e.g., walking vs. running) [5] [2].
Solution:
- Collect Ample Baseline Data: Record a few minutes of clean, resting-state data for each participant. This can be used for calibration or as a reference for data quality.
- Perform Individualized Parameter Sweeps: For critical studies, consider running a small-scale parameter sweep for each individual participant to determine their personal optimal settings.
- Group-Specific Optimization: If studying distinct groups (e.g., young vs. older adults), the parameter sweep from [5] suggests that optimal settings may be stable across groups, but verifying this for your specific cohort is good practice.

Experimental Protocol: A Step-by-Step Parameter Sweep for iCanClean

This protocol is adapted from published research that successfully optimized iCanClean for mobile EEG data [5].

1. Define the Parameter Space Create a table of the parameters and the values you will test.

Parameter	Description	Values to Test
R² Threshold	Controls cleaning aggressiveness. Lower values remove more components.	Test a range from 0.05 to 1.00 in increments of 0.05 [5].
Window Length	Duration of the data segment used for local correlation analysis.	Test 1, 2, and 4 seconds, and potentially the entire recording ("infinite") [5].

2. Select a Sweep Strategy For this type of discrete parameter optimization, an Exhaustive Sweep is the most straightforward strategy, as it evaluates every possible combination of the listed R² and window length values [13]. This ensures you do not miss the optimal configuration.

3. Prepare the EEG Data

Data Collection: Record high-density mobile EEG data during your task of interest (e.g., walking). Also, record data during a matched static condition (e.g., standing) for validation [2].
Basic Preprocessing: Apply a high-pass filter (e.g., 1 Hz cutoff) and average re-reference the data. Identify and remove any obviously bad channels with abnormally high amplitudes [5].

4. Execute the Parameter Sweep

Automate the Process: Use a script (e.g., in MATLAB or Python) to automatically run the iCanClean algorithm on your dataset with each unique combination of R² and window length.
Parallelize if Possible: To save time, distribute different parameter combination jobs across multiple CPU cores or machines [15].

5. Evaluate the Results For each cleaned dataset, run an ICA decomposition and calculate the following quality metrics:

Number of "Good" Brain Components: The count of independent components that are both well-localized (dipole residual variance < 15%) and have a high probability of being brain (ICLabel probability > 50%) [5].
Power at Gait Frequency: The reduction in spectral power at the fundamental frequency of stepping and its harmonics [2].
ERP Integrity: The presence and amplitude of expected event-related potentials, such as the P300 in a Flanker task [2].

6. Identify the Optimal Configuration The optimal parameter set is the one that maximizes the number of "good" brain components while also successfully reducing gait-related power and preserving expected ERPs [5] [2].

iCanClean Parameter Sweep Workflow

The diagram below outlines the logical flow for designing and executing a parameter sweep for iCanClean.

Research Reagent Solutions

The following table details key materials and computational tools required for implementing the iCanClean parameter sweep.

Item	Function in the Experiment	Specification / Note
High-Density EEG System	Records scalp potentials containing mixtures of brain signal and noise.	Systems with 64+ channels are recommended for effective ICA [5].
Dual-Layer EEG Cap	Provides dedicated noise reference signals. Outer-layer electrodes are mechanically coupled to scalp electrodes but record only noise [5] [2].	Ideal for iCanClean. A 120+120 channel configuration was used in foundational studies [5].
Pseudo-Reference Signals	An alternative noise reference when a dual-layer cap is not available. Created by filtering the raw EEG to isolate noise bands [2].	For motion, a notch filter below 3 Hz can be used to create these signals [2].
Computing Environment	Runs the parameter sweep and iCanClean processing.	MATLAB with EEGLAB is commonly used. Parallel processing is recommended to reduce computation time [5] [15].
ICA Algorithm	Decomposes cleaned EEG into independent sources for quality assessment.	AMICA is recommended for high-quality decompositions in mobile settings [5].
ICLabel Classifier	Automatically classifies independent components as "brain", "muscle", "eye", etc.	A trained neural network used to quantify the number of "good" brain components post-cleaning [5].

Frequently Asked Questions

How does the number of noise channels affect iCanClean's performance? The number of reference noise channels directly impacts the cleaning efficacy of the iCanClean algorithm. However, the system is robust and shows good performance even with reduced noise channel sets. Research has demonstrated that when using an optimal r2 value of 0.65 and a 4-second window, iCanClean maintained strong performance as noise channels were reduced [5].

What is the minimum number of noise channels required? While a dual-layer setup with 120 noise channels was used in the foundational research, subsequent testing showed that good performance could be maintained with sets as low as 16 noise channels [5]. The key is to adjust the r2 parameter to compensate for having fewer noise references.

Should I change the window length if I have fewer noise channels? The primary research indicates that the 4-second window length is optimal across different numbers of noise channels [5]. Your focus should be on tuning the r2 value, as it is the parameter most sensitive to changes in the noise channel setup.

Parameter Tuning Guide for Noise Channel Configurations

The following table summarizes the key experimental findings for different numbers of noise channels. The baseline performance before cleaning was 8.4 good components, established using a 120-channel dual-layer EEG setup [5].

Table 1: Performance of iCanClean with Different Noise Channel Setups

Number of Noise Channels	Number of "Good" ICs Retained After Cleaning	Performance Relative to Baseline (8.4 Good ICs)
64	12.7	+51% improvement
32	12.2	+45% improvement
16	12.0	+43% improvement

Note: The data above was obtained using the identified optimal parameters of a 4-second window length and an r2 value of 0.65 [5].

Experimental Protocol for Determining Optimalr2

The r2 threshold is the most critical parameter to adjust when your noise channel setup changes. Here is a methodology for determining the optimal value for your specific system, based on established research practices [5].

Data Collection: Record a segment of mobile EEG data (e.g., during walking) using your specific experimental setup, including the total number of noise channels available.
Basic Preprocessing: Apply high-pass filtering (e.g., 1 Hz cutoff) and average re-referencing. Identify and remove any outlier channels with amplitudes greater than 3 times the median [5].
Parameter Sweep: Process the same dataset multiple times with iCanClean, sweeping the r2 parameter from a low value (e.g., 0.05) to 1.0 in increments of 0.05. Keep the window length fixed at 4 seconds [5].
ICA Decomposition: For each cleaned dataset, perform an Independent Component Analysis (ICA) using a preferred algorithm (e.g., AMICA, Infomax) [5].
Component Classification: Identify "good" independent components (ICs) based on two criteria [5]:
- Dipole Fit: The component must be well-localized by a single dipole (Residual Variance < 15%).
- Brain Probability: The component must have a high probability of being a brain source (ICLabel probability > 50%).
Optimal Parameter Selection: The optimal r2 value for your setup is the one that yields the highest number of "good" ICs. Using an overly aggressive (low) r2 will remove brain activity, while a too-conservative (high) r2 will leave noise in the data.

The Scientist's Toolkit

Table 2: Essential Materials for iCanClean Research

Item	Function in the Context of iCanClean
High-Density EEG System (64+ channels)	Records the mixture of brain signals and artifacts from the scalp. Essential for high-fidelity source separation with ICA [5].
Dual-Layer EEG Cap or Separate Noise Sensors	Provides the reference noise recordings. The outer layer of electrodes is mechanically coupled to the scalp electrodes but records primarily non-brain noise, which is crucial for iCanClean's operation [5] [11].
Portable EEG Amplifier	Enables data collection during mobile, whole-body movement tasks, which are the primary use-case for iCanClean [5].
Electrical Phantom Head	A validation tool containing embedded "brain" sources. It allows for quantitative testing of iCanClean's performance with known ground-truth signals, free from biological variability [11].

Workflow for Parameter Optimization

The following diagram illustrates the logical process for adapting iCanClean's core parameters to your specific hardware setup, particularly the number of noise channels.

Troubleshooting Guide

Problem	Possible Cause	Solution
Too few "good" brain components after cleaning.	`r2` value is too aggressive (too low), removing brain signals along with noise.	Increase the `r2` value to be more conservative (e.g., try 0.7 or 0.75) and re-run the analysis.
Excessive noise remains in the data after cleaning.	`r2` value is too conservative (too high), failing to remove enough noise.	Decrease the `r2` value to be more aggressive (e.g., try 0.6 or 0.55). Also, verify the quality of your noise channel recordings.
Cleaning performance is poor with a low number of noise channels.	The algorithm lacks sufficient noise reference information.	If possible, increase the number of noise channels. If not, you may need to accept a more aggressive `r2` setting, acknowledging a potential for minor loss of brain signal.
Inconsistent cleaning across the recording.	The default "infinite" window may not capture non-stationary noise well.	Ensure you are using a shorter, sliding window (e.g., 4 seconds) to adapt to changing noise conditions during movement [5].

Frequently Asked Questions

Q1: What are the primary iCanClean parameters I need to tune for mobile EEG studies? The two primary user-selectable parameters for the iCanClean algorithm are the window length and the r² cleaning aggressiveness threshold [5]. The window length determines the segment of data used to calculate correlations between cortical and noise electrodes. The r² threshold (ranging from 0 to 1) determines which correlated components are removed, with a lower value resulting in more aggressive cleaning [4] [5].

Q2: I am analyzing walking data. What are the recommended starting parameters? For EEG data corrupted by walking motion artifacts, research has identified an optimal window length of 4 seconds and an r² value of 0.65 [4] [5]. This configuration improved the average number of "good" independent components (ICs)—well-localized dipoles with high brain probability—from 8.4 to 13.2, a 57% increase [4].

Q3: Can I use iCanClean effectively with a reduced number of dedicated noise sensors? Yes, performance can be maintained with a reduced set of noise channels. One study found that using 64, 32, and 16 noise channels still yielded 12.7, 12.2, and 12.0 good components, respectively, demonstrating robust performance even with fewer reference channels [4].

Q4: How does iCanClean compare to other common artifact removal methods? iCanClean has been shown to consistently outperform other real-time-capable methods like Artifact Subspace Reconstruction (ASR), Auto-CCA, and Adaptive Filtering, regardless of the type or number of artifacts present [14]. In a phantom head study with all artifact types simultaneously present, iCanClean improved the Data Quality Score from 15.7% to 55.9%, whereas ASR, Auto-CCA, and Adaptive Filtering only improved it to 27.6%, 27.2%, and 32.9%, respectively [14].

Q5: Is preprocessing necessary before applying iCanClean? Basic preprocessing is recommended. A typical pipeline includes high-pass filtering (e.g., 1 Hz cutoff) and average re-referencing of both EEG and noise channels separately [5]. A basic channel rejection step to remove large-amplitude channels (e.g., those with amplitudes greater than 3 times the median) is also commonly applied before running iCanClean [5].

Experimental Protocols for Parameter Optimization

The following methodology outlines how key parameters for iCanClean were established, providing a template for validating parameters in new experimental scenarios.

1. Data Collection Setup

EEG System: High-density, dual-layer EEG caps are used. A typical setup includes 120 scalp electrodes and 120 outward-facing noise electrodes that are mechanically coupled but electrically isolated [5].
Participant Task: Data is collected from participants performing the movement of interest. For walking protocols, this may involve walking on a treadmill at a fixed speed or over terrain of varying difficulty for an extended period (e.g., 48 minutes) [5].
Population: Studies can include different participant groups (e.g., young adults, high-functioning older adults, low-functioning older adults) to ensure robustness across populations [5].

2. Data Processing and Parameter Sweep

Preprocessing: EEG data is high-pass filtered (1 Hz cutoff) and average re-referenced. Noise channels are referenced to their own average. Outlier channels are rejected based on amplitude [5].
ICA Decomposition: The Infomax or AMICA algorithm in EEGLAB is used to decompose the preprocessed data into independent components (ICs) [5].
Component Classification: "Good" brain components are identified based on two primary criteria [4]:
- Dipole Fit: The component must be well-localized by a single dipole model (Residual Variance < 15%).
- Brain Probability: The component must have a high probability of being a brain source (ICLabel probability > 50%).
Parameter Testing: A sweep of iCanClean parameters is performed:
- Window Length: Test various durations (e.g., 1 s, 2 s, 4 s, and an infinite window using the entire dataset) [5].
- R² Threshold: Test values from 0.05 to 1.00 in increments of 0.05 [5].
Outcome Measure: The optimal parameter set is determined by identifying the combination that yields the highest number of "good" ICs after cleaning with iCanClean [4] [5].

Parameter Performance Across Conditions

The table below summarizes quantitative findings from published studies to guide parameter selection.

Use Case Scenario	Optimal Window Length	Optimal R² Value	Key Performance Outcome	Source
General Walking	4 seconds	0.65	Increased good ICs by 57% (from 8.4 to 13.2)	[4] [5]
Phantom Head (All Artifacts)	Not Specified	Not Specified	Data Quality Score improved from 15.7% to 55.9%	[14]

The Scientist's Toolkit: Essential Research Reagents and Materials

Item	Function in iCanClean Research
Dual-Layer EEG Cap	A cap with paired scalp and noise electrodes provides the reference noise recordings essential for the iCanClean algorithm to function [4] [5].
High-Density EEG System	Systems with 64+ channels are recommended for mobile brain imaging to provide adequate spatial resolution for source localization after ICA [5].
Motion Platform & Phantom Head	An electrically conductive phantom head with embedded sources provides known ground-truth brain signals, enabling quantitative validation of cleaning performance against motion, muscle, and other artifacts [14].
MATLAB & EEGLAB	A standard software environment for implementing custom iCanClean scripts, performing parameter sweeps, and conducting ICA with toolboxes like ICLabel [5].

Workflow: Parameter Tuning for Mobile EEG

This diagram illustrates the logical workflow for determining the optimal iCanClean parameters for a mobile EEG experiment.

Decision Logic for iCanClean Parameters

This diagram outlines the decision-making process for selecting and adjusting the key iCanClean parameters based on your data and research goals.

Advanced Troubleshooting: Balancing Artifact Removal and Brain Signal Preservation

A critical challenge in mobile electroencephalography (EEG) research is balancing effective artifact removal with the preservation of neural signals. Over-cleaning, often resulting from inappropriate parameter selection in algorithms like iCanClean, can lead to the accidental removal of brain components, compromising data integrity and experimental outcomes [4] [14]. This guide provides troubleshooting and solutions for this common issue within the context of parameter tuning for iCanClean.

Frequently Asked Questions

Q1: What are the primary indicators that I have over-cleaned my mobile EEG data using iCanClean? A significant drop in the number of brain components identified post-ICA is a key indicator. Brain components are typically characterized by a dipolar topography (Residual Variance < 15%) and a high brain probability (ICLabel > 50%) [4] [5]. If the count of such "good" components decreases substantially after cleaning, over-cleaning is likely. A reduction in the Data Quality Score, which measures the correlation between preserved signals and known brain sources, also suggests over-aggressive cleaning [14].

Q2: Which iCanClean parameters most directly influence cleaning aggressiveness and the risk of brain signal loss? The two most critical parameters are the R² correlation threshold and the window length [4] [5].

R² Threshold: This value (range 0-1) sets the correlation level at which data subspaces are removed. A lower R² value (e.g., 0.2) results in more aggressive cleaning and a higher risk of removing brain activity. A higher R² value (e.g., 0.8) is more conservative and preserves more data [4] [14].
Window Length: This defines the temporal segment used to calculate correlations. Shorter windows (e.g., 1-second) adapt quickly to changing noise but may be overly sensitive. The "infinite" window uses the entire dataset, which can be less adaptive to non-stationary noise [5].

Q3: What are the empirically optimized iCanClean parameter settings to avoid over-cleaning? Research on mobile EEG data collected during walking recommends an R² threshold of 0.65 and a window length of 4 seconds as optimal starting points [4]. This combination significantly improved the number of "good" brain components from 8.4 to 13.2 (a 57% increase) without evidence of over-cleaning [4] [5].

Troubleshooting Guide

Diagnosis: Identifying the Root Cause of Over-Cleaning

Profile Your Data: The optimal parameters can depend on your specific experimental conditions (e.g., level of participant movement, muscle engagement, and EEG cap type). Visually inspect your raw data to assess the amplitude and type of noise present [5].
Perform a Parameter Sweep: Systematically test a range of values for the R² threshold (e.g., from 0.05 to 1.0 in increments of 0.05) and window length (e.g., 1s, 2s, 4s, infinite) [4] [5].
Benchmark Against Ground Truth: For each parameter combination, calculate key output metrics. The following table summarizes expected outcomes from a well-tuned cleaning process versus an over-cleaned one [4] [14]:

Table 1: Diagnostic Metrics for Identifying Over-Cleaned Data

Metric	Well-Cleaned Data	Over-Cleaned Data	Measurement Method
Number of "Good" ICs	Increased (>50% improvement reported) [4]	Decreased or unchanged	ICA decomposition followed by ICLabel and dipole fitting [4]
Data Quality Score	Significantly improved [14]	Worsened or not improved	Correlation between cleaned EEG channels and ground-truth brain sources [14]
Spectral Content	Preserved brain oscillatory patterns (e.g., alpha, beta, theta) [16]	Attenuated or absent brain oscillations	Power spectral density analysis [16]

Solutions: Correcting and Preventing Over-Cleaning

Adopt a Conservative R² Threshold: Begin with a higher R² value (e.g., 0.7-0.8) and gradually decrease it until a satisfactory level of noise removal is achieved, stopping before a drop in brain components occurs [4].
Utilize a Moderate Window Length: A 4-second window offers a good balance between local noise adaptation and stable correlation estimates for walking data [4].
Validate with a Reduced Noise Reference Set: iCanClean maintains good performance even with fewer noise reference channels. If you are concerned about over-cleaning, you can validate your parameters using a subset of noise channels (e.g., 16 or 32). Performance should remain stable; a significant drop may indicate an overly aggressive setting [4].
Compare Against Alternative Methods: Use other cleaning algorithms as a benchmark. The following table compares iCanClean's performance with other common methods, highlighting its effectiveness when properly tuned [14]:

Table 2: Performance Comparison of EEG Cleaning Algorithms

Cleaning Method	Data Quality Score (All Artifacts Condition)	Requires Clean Calibration Data?	Computational Efficiency
Uncleaned Data	15.7% [14]	N/A	N/A
iCanClean (Optimal)	55.9% [14]	No	High [14] [17]
Artifact Subspace Reconstruction (ASR)	27.6% [14]	Yes [14]	Moderate [14]
Auto-CCA	27.2% [14]	No	High [14]
Adaptive Filtering	32.9% [14]	Requires reference noise signals [14]	High [14]

Experimental Protocols for Parameter Tuning

Workflow for Optimizing iCanClean Parameters

The following diagram illustrates the systematic workflow for diagnosing over-cleaning and identifying optimal R² and window length parameters.

Protocol: Quantitative Assessment of Cleaning Fidelity

Aim: To quantitatively evaluate iCanClean parameters and prevent over-cleaning by comparing cleaned data to ground-truth brain signals or established component quality metrics [4] [14].

Materials:

Raw mobile EEG dataset with reference noise recordings (e.g., from a dual-layer EEG cap).
Computing environment with iCanClean algorithm and ICA software (e.g., EEGLAB).

Methodology:

Preprocessing: High-pass filter the data (e.g., 1 Hz cutoff) and perform average re-referencing. Reject outlier channels with amplitudes greater than 3 times the median [5].
Parameter Sweep: Run the iCanClean algorithm on your dataset, systematically varying the R² threshold and window length as described in the workflow above.
Source Decomposition: For each cleaned dataset, perform ICA (e.g., using AMICA algorithm) to decompose the data into independent components (ICs) [4] [5].
Component Classification: Classify each IC using the following criteria to mark it as a "good" brain component [4]:
- Dipole Fit: Residual Variance (RV) of less than 15%.
- Brain Probability: ICLabel probability of greater than 50% for the "brain" category.
Data Quality Calculation: If ground-truth brain sources are available (e.g., from a phantom head), calculate the Data Quality Score as the average correlation between the brain sources and the cleaned EEG channels [14].
Analysis: Plot the number of "good" components and the Data Quality Score against the R² threshold for each window length. The optimal parameters are those that maximize these metrics before a decline is observed.

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials and Tools for iCanClean Research

Item	Function / Description	Example Use in Protocol
Dual-Layer EEG Cap	A cap with inner (scalp) and outer (noise) electrode layers. Provides reference noise recordings mechanically coupled to the EEG sensors [5] [14].	Provides the reference noise signals required for the iCanClean algorithm to separate motion artifacts from brain activity [5].
iCanClean Algorithm	A novel cleaning algorithm that uses Canonical Correlation Analysis (CCA) and reference noise to remove artifact subspaces from EEG data [14] [17].	The core method for removing motion, muscle, eye, and line-noise artifacts while preserving brain activity prior to ICA [4].
ICLabel	A convolutional neural network for automatically classifying independent components from ICA [4] [5].	Used to calculate the "brain probability" of each component, which is a key metric for identifying "good" components and diagnosing over-cleaning [4].
Dipolar Source Localization	A method for fitting an equivalent current dipole to the scalp topography of an independent component [4] [5].	Used to calculate the Residual Variance (RV), helping to identify components that are likely of brain origin (RV < 15%) [4].
Artifact Subspace Reconstruction (ASR)	An alternative cleaning method based on principal component analysis, useful for comparative benchmarking [14].	Serves as a performance benchmark when validating iCanClean's effectiveness and tuning parameters [14].

Troubleshooting Guide

Q1: My EEG data still shows significant motion artifacts after using iCanClean with default settings. What should I do?

This is a classic symptom of under-cleaning, often caused by suboptimal parameter selection. The iCanClean algorithm has two primary parameters that directly impact cleaning aggressiveness: the r² correlation threshold and the window length.

Diagnosis: Residual motion artifacts typically indicate that your r² threshold is set too high (too conservative) or your window length is too short to capture the complete structure of the motion artifacts.

Recommended Solution: Based on systematic parameter sweeps, adjust your parameters as follows:

Decrease your r² threshold to increase cleaning aggressiveness. The optimal value identified for walking motion artifacts is r² = 0.65 [4] [7] [5].
Increase your window length to better model artifacts. A 4-second window was found to be optimal for gait-related motions, outperforming shorter (1s, 2s) and infinite-length windows [4] [5].

Table 1: Optimal iCanClean Parameters for Motion Artifact Removal

Parameter	Suboptimal Setting (Causes Under-Cleaning)	Recommended Optimal Setting	Impact of Adjustment
r² Threshold	High (e.g., > 0.8)	0.65 [4] [5]	Increases cleaning aggressiveness by removing more correlated noise subspaces.
Window Length	Short (e.g., 1s or 2s)	4 seconds [4] [5]	Allows the algorithm to better capture the temporal structure of motion artifacts like those from walking.

Q2: What experimental evidence supports these parameter recommendations?

The recommended parameters are not theoretical; they are derived from controlled, empirical studies. The key validation experiments are summarized below.

Experimental Protocol 1: Validation on Human Mobile EEG Data

Objective: To determine iCanClean's optimal parameters for improving ICA decomposition of EEG data corrupted by walking motion artifacts [4] [5].
Methodology:
- Participants: 45 adults across three cohorts (young, high-functioning older, low-functioning older) [7] [5].
- Setup: High-density dual-layer EEG (120 scalp electrodes + 120 noise electrodes) recorded during treadmill walking [7] [5].
- Processing: Data was processed with a parameter sweep (window length: 1s, 2s, 4s, infinite; r²: 0.05 to 1.0). Cleaned data was decomposed with ICA [7] [5].
- Outcome Measure: The number of "good" independent components (ICs) defined by dipolar localization (residual variance < 15%) and high brain probability (ICLabel > 50%) [4] [5].
Key Result: Using the optimal parameters (4s window, r²=0.65), the average number of "good" brain components increased by 57%, from 8.4 to 13.2 [4] [5].

Experimental Protocol 2: Validation on a Phantom Head with Ground Truth

Objective: To quantitatively compare iCanClean's performance against other artifact removal methods using known ground-truth brain signals [11] [14].
Methodology:
- Setup: An electrical phantom head with 10 embedded brain sources and 10 contaminating artifact sources (motion, muscle, eye, line-noise) [11] [14].
- Conditions: Data was collected under multiple scenarios, including a "Brain + All Artifacts" condition [11] [14].
- Metric: A Data Quality Score (0-100%) was calculated based on the average correlation between the known brain sources and the cleaned EEG channels [11] [14].
Key Result: In the challenging "Brain + All Artifacts" condition, iCanClean dramatically improved the Data Quality Score from 15.7% (uncleaned) to 55.9%. This performance significantly exceeded that of Artifact Subspace Reconstruction (ASR), Auto-CCA, and Adaptive Filtering [11] [14].

Q3: How does iCanClean's performance compare to other common cleaning methods?

iCanClean has been shown to consistently outperform other real-time-capable methods, especially when multiple artifact types are present simultaneously. The following table summarizes a quantitative comparison from a phantom head study.

Table 2: Performance Comparison of Artifact Removal Methods on Phantom EEG Data ("Brain + All Artifacts" Condition)

Cleaning Method	Data Quality Score After Cleaning	Key Requirement / Limitation
Uncleaned Data	15.7% [11] [14]	Baseline for comparison.
iCanClean	55.9% [11] [14]	Requires reference noise signals (e.g., from a dual-layer cap).
Adaptive Filtering	32.9% [11] [14]	Assumes noise projects identically to EEG and noise sensors; can struggle with motion artifacts [11] [14].
Artifact Subspace Reconstruction (ASR)	27.6% [11] [14]	Requires clean calibration data [11] [14].
Auto-CCA	27.2% [11] [14]	Risks removing brain activity, which is also low-frequency/high-correlation [11] [14].

Frequently Asked Questions (FAQs)

Q4: Can I use iCanClean effectively with a standard EEG system, or do I need a special setup?

For the best performance, iCanClean is designed to work with reference noise recordings. The most effective setup is a dual-layer EEG cap, where outward-facing noise electrodes are mechanically coupled to the scalp electrodes to provide ideal noise references [7] [5]. However, the algorithm can still function with reduced noise channels. Research shows that good cleaning performance can be maintained with 64, 32, or even 16 properly spaced noise channels [4] [5].

Q5: What is the logical workflow for diagnosing and resolving under-cleaning?

The following diagram illustrates the step-by-step troubleshooting process for addressing residual motion artifacts in your data.

Parameter Optimization Workflow

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Key Materials and Software for iCanClean Experimentation

Item	Function / Relevance	Example from Literature
Dual-Layer EEG Cap	Provides dedicated reference noise electrodes that are mechanically coupled to scalp electrodes to record artifact signals without brain activity. Essential for optimal performance [7] [5].	120 scalp electrodes + 120 outward-facing noise electrodes [7] [5].
High-Density EEG System	Enables sufficient spatial sampling for effective source separation via ICA after cleaning with iCanClean.	Systems with 64+ channels; studies used 120-channel setups [4] [5].
Mobile/Portable EEG Amplifier	Allows for data collection during whole-body movement, which is the primary scenario where motion artifacts are encountered.	Used for recordings during treadmill walking [4] [7].
Signal Processing Software with EEGLAB	Provides the environment for implementing preprocessing, running iCanClean, and performing subsequent ICA and ICLabel analysis.	Custom MATLAB scripts with EEGLAB toolbox [7] [5].
ICLabel Classifier	A machine learning-based tool for automatically classifying independent components (ICs) after ICA. Used to quantify "brain" vs. "non-brain" components [4] [5].	Components with >50% brain probability classified as "good" [4] [5].

Troubleshooting Guides

Guide 1: Resolving Polarity Indeterminacy in Group-Level ICA

Problem Statement: After running ICA, my group-level Independent Component (IC) scalp topographies are inconsistent, and the Event-Related Potential (ERP) amplitudes appear weakened, suggesting polarity cancellation issues within IC clusters.

Background: A fundamental property of ICA is polarity indeterminacy [18]. This means that for any given independent component, the sign of the scalp topography and its time course (e.g., an ERP) can be arbitrarily flipped without affecting the decomposition's validity. When performing group-level analysis and clustering ICs from multiple participants, this can lead to components within the same cluster having mixed polarities. When averaged, these opposing polarities cancel each other out, reducing the amplitude and sensitivity of the resulting ERP and potentially obscuring true effects [18].

Solution: Implement a polarity alignment method during the group-level clustering stage. The default method in toolboxes like EEGLAB is often Iterative Correlation Maximization, which aligns polarities based on the scalp topographies. However, for studies prioritizing ERP fidelity, the Covariance Maximization method is recommended.

Methodology:

Covariance Maximization Workflow: This method determines the polarity of each IC based on the sign of the largest eigenvalue of the covariance matrix [18].
Application to ERPs: Apply the covariance maximization method directly to the IC ERPs, not just the scalp topographies. A published study demonstrated that this approach increased the number of IC clusters showing significant ERP amplitudes from 5 out of 9 to all 9 clusters, thereby minimizing within-cluster ERP amplitude cancellation and maximizing sensitivity [18].

Decision Flowchart: The following diagram illustrates the logical process for diagnosing and resolving polarity-related issues in your ICA results.

Guide 2: Balancing Dipolarity and ERP Fidelity in iCanClean R2 Value Window Size Tuning

Problem Statement: When tuning the iCanClean R2 value window size parameter, I face a trade-off. A smaller window improves the preservation of ERP fidelity but may leave more non-dipolar artifacts. A larger window enhances dipolarity (a key marker of a neural source) but risks distorting or smearing genuine neural signals in the ERP.

Background: The R2 value in iCanClean is a metric of component dipolarity. A high R2 value indicates that the component's scalp topography is consistent with a single neural generator within the brain [19]. However, the parameter window size used to calculate this R2 value over time is critical. Your research must decide which outcome to prioritize: the quality of the source separation (dipolarity) or the fidelity of the time-domain signal (ERP).

Solution: The optimal window size is experiment-dependent. There is no universal value. The choice should be guided by the primary research question and the subsequent analysis plans.

Methodology for Parameter Optimization:

Define Your Objective: Clearly state the goal of your analysis. Is it to (a) identify and study specific neural generators, or (b) to analyze the precise timing and amplitude of ERPs?
Systematic Parameter Sweep: Run iCanClean on a representative subset of your data (e.g., 5-10 participants) using a range of window sizes (e.g., from small to large).
Quantitative Comparison: For each resulting dataset, calculate the following metrics and compile them in a table for direct comparison (see table below).
Final Selection: Choose the window size that offers the best compromise, favoring the metric most critical to your thesis.

Quantitative Comparison Table: The table below summarizes the trade-offs and provides a framework for evaluating different window size parameters.

Window Size	Avg. R2 of ICs (Dipolarity)	ERP Peak Amplitude (e.g., P3 µV)	ERP Peak Latency (e.g., P3 ms)	Residual Noise Level	Recommended Use Case
Small (e.g., 100 ms)	Lower	Higher Fidelity	More Accurate	Higher	Primary: ERP Analysis. Use when precise timing/amplitude of cognitive events is critical.
Medium (e.g., 250 ms)	Moderate	Moderate	Slightly Shifted	Moderate	Balanced Approach. Suitable for studies investigating both sources and ERPs.
Large (e.g., 500 ms)	Higher	Reduced/Smeared	Delayed/Blurred	Lower	Primary: Source Analysis. Use when localization and dipolarity are the main goals.

Frequently Asked Questions (FAQs)

FAQ 1: What is the single most important factor for ensuring reliable ICA results in individual differences research?

Answer: The psychometric reliability of your EEG measures is the most critical factor. High internal consistency and test-retest reliability are prerequisites for any analysis seeking to differentiate individuals [19]. Regardless of how well you optimize ICA parameters, if the underlying EEG measures (e.g., power, ERP amplitudes) are not stable and reliable over time, your study will lack the power to detect meaningful correlations with behavior or traits. It is essential to consult reliability profiles for your specific EEG measures (e.g., power, ERPs, functional connectivity) and employ denoising techniques and data quality metrics to improve the reliability of your individual differences analyses [19].

FAQ 2: My primary goal is to enhance the signal-to-noise ratio of my ERPs for drug development studies. Should I prioritize ICA dipolarity?

Answer: Not primarily. In the context of drug development, where detecting a subtle change in a cognitive ERP (like the P3) between a drug and placebo group is often the goal, you should prioritize ERP Fidelity [20]. While ensuring a baseline level of data cleanliness via dipolarity is good practice, an over-emphasis on maximizing R2 values with large window sizes can smear and distort the temporal dynamics of the ERP. This distortion can mask the very drug effects you are trying to measure. Your optimization strategy should favor parameters that best preserve the timing and amplitude of your ERP components of interest.

FAQ 3: How can I formally optimize multiple ICA and post-processing parameters at once?

Answer: For complex optimization problems involving multiple interacting parameters, we recommend using a structured Design of Experiments (DoE) approach [21]. The general workflow is:

Identify Factors: List the parameters you want to optimize (e.g., R2 window size, iCanClean threshold, number of ICs).
Choose a Design: Select an experimental design (e.g., full factorial, response surface) that efficiently explores the parameter space.
Generate Design Matrix: This specifies the different parameter combinations to test.
Run Experiments: Process your data with each parameter combination.
Analyze Results: Fit a model to your results to understand how each parameter and its interactions affect your outcome (e.g., ERP signal-to-noise ratio). This method is far more efficient and informative than the traditional "one-factor-at-a-time" approach and is widely used in scientific optimization [21].

The Scientist's Toolkit: Research Reagent Solutions

Item Name	Function / Explanation
Human Neocortical Neurosolver (HNN)	A biophysical modeling software used to simulate the cellular and circuit-level mechanisms that generate scalp-recorded EEG/ERP signals. It helps generate testable predictions for interpreting ICA components and ERPs [20].
EEGLAB Toolbox	A foundational open-source MATLAB toolbox for processing EEG data. It provides the core environment and functions for performing ICA, component clustering, and calculating ERPs [20].
iCanClean Plugin	An EEGLAB plugin designed to denoise EEG data using an ICA-based approach. Its key parameter, the R2 value window size, is central to the trade-off between dipolarity and ERP fidelity.
Design of Experiments (DoE) Software	Software that assists in designing efficient parameter optimization studies. It helps systematically vary multiple parameters (like window size and threshold) to find the optimal combination for a desired outcome [21].
Covariance Maximization Script	A custom or toolbox-integrated script for resolving polarity indeterminacy in group-level ICA analyses. It is essential for maximizing the sensitivity of IC-clustered ERPs [18].

Leveraging ICLabel and Residual Variance for Objective Parameter Validation

Frequently Asked Questions (FAQs)

Q1: What is ICLabel, and why is it a superior tool for classifying Independent Components (ICs) in EEG research? ICLabel is an automated, publicly available IC classifier for EEG data that assigns components to source categories like Brain, Muscle, Eye, and Line Noise [22]. It improves upon prior methods by offering enhanced computational efficiency and label accuracy, performing comparably to or better than other public classifiers while computing labels ten times faster [22]. This speed and accuracy are crucial for parameter validation in methods like iCanClean, where rapid, objective assessment of signal sources is needed.

Q2: What does Residual Variance (RV) measure, and how should I interpret its value for a component? Residual Variance (RV) is a measure of how well an IC's scalp topography can be fit by an equivalent current dipole (ECD) model [23]. A lower RV indicates a more "dipolar" component, which is often characteristic of a true brain source. The ICLabel tutorial notes that while a two-dipole fit will almost always yield a lower RV than a one-dipole fit, most components are not perfectly dipolar, and of those that are, the majority require only one dipole [23]. Therefore, RV is one piece of evidence to be weighed alongside other features like the power spectrum and time series.

Q3: My ICA decomposition seems poor, with many non-brain components having high power. What preprocessing steps are critical? A key preprocessing step for a stable ICA decomposition is high-pass filtering the data. It is recommended to apply a high-pass filter with a 1-Hz pass-band edge (equivalent to a 0.5 Hz cutoff) before running ICA [24]. This removes slow baseline drift, which can bias ICA toward high-amplitude, low-frequency artifacts, allowing the algorithm to better isolate brain signals in the 3-13 Hz range [24].

Q4: How can ICLabel and RV be used together to objectively validate cleaning parameters? ICLabel provides a probabilistic classification of a component's origin. RV quantitatively measures its fit to a generative brain source model. Used in tandem, they offer a multi-faceted validation metric. For instance, when tuning a parameter like the window size in iCanClean, you can objectively track its effect on the resulting ICs. A successful parameter set should yield a higher proportion of components labeled "Brain" by ICLabel and, for those brain components, a lower average RV, indicating more physiologically plausible sources.

Troubleshooting Guides

Issue 1: Poor ICA Decomposition

Observed Problem	Potential Causes	Solutions and Validation Checks
High proportion of components classified as Muscle, Eye, or Channel Noise by ICLabel [22].	Insufficient data length or data quality for ICA. Inadequate high-pass filtering. Noisy or bad channels included in the ICA computation.	Ensure sufficient clean data is available (e.g., ≥ 30 min for high-density mobile EEG) [11]. Apply a 1-Hz high-pass filter before ICA [24]. Identify and remove bad channels before running ICA.

Issue 2: Ambiguous or Counterintuitive ICLabel Results

Observed Problem	Potential Causes	Solutions and Validation Checks
A component with a high "Brain" probability from ICLabel has a high Residual Variance (RV).	The underlying source may not be well-modeled by a single equivalent dipole. The component may represent a valid but non-dipolar brain source.	Do not rely on a single metric. Examine the component's power spectrum and time series for brain-like characteristics [23]. Consider the two-dipole RV; a much lower value may indicate a bilateral source.

Issue 3: High Residual Variance on Theoretically "Good" Brain Components

Observed Problem	Potential Causes	Solutions and Validation Checks
Components that appear neurogenic based on topography and spectrum have unexpectedly high RV.	Forward head model inaccuracies. Imperfections in the ICA decomposition itself. The brain source is genuinely non-dipolar.	Consult the ICLabel tutorial, which cautions that "most components are not dipolar" [23]. Use RV as a relative measure for comparison between parameter sets (e.g., different iCanClean windows), not as an absolute indicator of quality.

Experimental Protocols for Parameter Validation

Protocol: Validating iCanClean Window Size Using ICLabel and RV

Objective: To determine the optimal window size parameter for the iCanClean algorithm by objectively assessing the quality of resulting independent components.

Methodology:

Data Processing: Apply the iCanClean algorithm to a standardized, artifact-laden EEG dataset (e.g., from a phantom head with known ground-truth sources [11]) using a range of window size parameters (e.g., 1s, 2s, 5s).
ICA Decomposition: Perform ICA decomposition on the cleaned data from each parameter set using a standardized algorithm (e.g., Infomax) and settings.
Component Labeling: Process all resulting ICs through the ICLabel classifier to obtain probabilistic labels (Brain, Muscle, Eye, etc.) [22].
Residual Variance Calculation: Calculate the RV for the one-dipole and two-dipole models for each IC [23].
Quantitative Analysis: For each parameter set, calculate the following metrics:
- Percentage of components classified as "Brain".
- Average RV for components with a "Brain" probability > 80%.
- Data Quality Score, calculated as the average correlation between known brain sources and EEG channels in a phantom setup [11].

Expected Outcome: The optimal window size will maximize the percentage of Brain components, minimize the average RV of those components, and achieve the highest Data Quality Score, indicating superior artifact removal and signal preservation.

Signaling Pathways and Workflows

IC Validation Workflow

The Scientist's Toolkit: Essential Research Reagents & Materials

The following tools are critical for conducting and validating EEG artifact removal research.

Tool / Material	Function in Research	Application in Parameter Validation
ICLabel Classifier	An automated classifier to label Independent Components (ICs) from EEG data into categories like Brain, Muscle, and Eye [22].	Provides a primary, objective metric (% Brain components) for evaluating the output of iCanClean under different parameters.
iCanClean Algorithm	A novel framework for removing motion, muscle, eye, and line-noise artifacts from EEG in real-time capable scenarios [11].	The algorithm whose parameters (e.g., R² value window size) are the subject of the tuning and validation process.
Phantom Head Apparatus	A physical model with embedded brain signal sources and contaminating artifact sources [11].	Provides ground-truth data with known brain signals, enabling quantitative calculation of a Data Quality Score to validate cleaning efficacy.
Residual Variance (RV)	A quantitative measure of how well an IC's scalp map is fit by an equivalent current dipole model [23].	Serves as a secondary, objective metric to assess the neurophysiological plausibility of components labeled as "Brain".
High-Pass Filter (1 Hz)	A preprocessing step to remove slow baseline drift from the EEG signal prior to ICA [24].	A critical, standardized step to ensure a stable and meaningful ICA decomposition, which is foundational for all subsequent validation.

Evidence and Comparison: How iCanClean Stacks Up Against Other Methods

Troubleshooting Guides

Guide 1: Addressing Poor Independent Component Analysis (ICA) Decomposition

Problem: After running ICA on mobile EEG data, you find too few brain components, or the components have poor dipolarity (high residual variance).

Explanation: Large motion artifacts can overwhelm the ICA algorithm, preventing it from effectively separating brain signals from noise [3] [5]. The goal of preprocessing is to reduce these large-amplitude artifacts to enable a successful decomposition.

Solutions:

First, try iCanClean. Studies show iCanClean is particularly effective at improving ICA. Use a 4-second window and an R² threshold of 0.65 for walking data. This combination has been shown to increase the average number of "good" (dipolar, brain-like) ICA components by 57% [5] [7].
If using ASR, avoid over-cleaning. A very low k parameter (e.g., below 10) can over-clean the data and remove neural signals. For locomotion data, a k value of 20-30 is often recommended, but values below 10 can reduce component dipolarity [3].
Check your reference data. If using ASR, ensure the calibration data is as clean as possible. The algorithm's performance is dependent on its method for selecting this reference period [3].

Guide 2: Managing Residual Spectral Power at Gait Frequency

Problem: After cleaning, you notice a strong peak in the power spectrum at the step frequency and its harmonics, suggesting motion artifact remains.

Explanation: Motion artifacts from activities like running are often time-locked to the gait cycle, producing rhythmic, high-amplitude signals that can be difficult to fully remove [3].

Solutions:

Leverage pseudo-reference signals in iCanClean. If you do not have dual-layer EEG hardware, you can create pseudo-reference noise signals from your raw EEG data. Temporarily apply a notch filter (e.g., below 3 Hz) to identify the low-frequency noise subspaces most associated with motion, which iCanClean can then target for removal [3].
Compare methods quantitatively. Research shows that both ASR and iCanClean significantly reduce power at the gait frequency, but iCanClean may be somewhat more effective [3].
Consider the artifact type. For motion artifacts caused by cable sway, iCanClean has demonstrated superior performance in phantom head tests by specifically targeting these noise subspaces [11].

Problem: You are running a task during movement (e.g., a Flanker task while jogging) but cannot recover the expected ERP components, like the P300.

Explanation: Motion artifacts can obscure stimulus-locked neural responses. The cleaning method must be aggressive enough to remove noise without distorting or removing the underlying cognitive signal [3].

Solutions:

Use iCanClean for cognitive effects. In a direct comparison during a dynamic Flanker task, the expected P300 congruency effect (greater amplitude for incongruent stimuli) was only identified when preprocessing with iCanClean [3].
Validate with a static condition. Always run a matched, stationary version of your task (e.g., seated Flanker task). This provides a ground-truth ERP waveform to compare against the cleaned data from the dynamic condition [3].
Confirm latency alignment. After cleaning with ASR or iCanClean, the ERP components from the dynamic task should have a similar latency to those from the static task. This is a key indicator of successful cleaning [3].

Frequently Asked Questions (FAQs)

FAQ 1: I don't have a dual-layer EEG system. Can I still use iCanClean?

Answer: Yes. While iCanClean is most effective with true dual-layer noise sensors, it can be configured to use "pseudo-reference" noise signals derived from your existing EEG data. The algorithm creates these by temporarily filtering the data to isolate noise-dominated frequency bands (e.g., very low frequencies for motion artifacts), which are then used as the reference for cleaning [3] [11].

FAQ 2: How does iCanClean differ from traditional Adaptive Filtering?

Answer: They are fundamentally different. Adaptive Filtering assumes a simple, linear relationship between a reference noise signal and the corruption in the EEG signal. In contrast, iCanClean uses Canonical Correlation Analysis (CCA) to identify and remove entire subspaces of the EEG data that are correlated with noise subspaces. This allows it to handle more complex mixing scenarios and consistently outperforms Adaptive Filtering, especially for motion artifacts [11] [14].

FAQ 3: Which method is best for real-time processing?

Answer: iCanClean, ASR, and Adaptive Filtering are all capable of real-time implementation [11] [14]. ICA, however, is computationally intensive and generally not suitable for real-time applications due to slow decomposition times [11].

FAQ 4: Can Auto-CCA remove low-frequency motion artifacts?

Answer: Theoretically, yes, by rejecting high-correlation components. However, you must exercise caution because brain activity is also low-frequency and has high correlation, creating a risk of accidentally removing neural signals along with the artifact [11] [14].

Quantitative Performance Comparison

The table below summarizes key performance metrics from controlled studies, providing a basis for method selection.

Table 1: Performance Comparison of Artifact Removal Methods

Method	Data Quality Score (Brain + All Artifacts) [11]	Good ICA Components (Before/After) [5] [7]	Optimal Parameters	Key Strength
iCanClean	55.9% (from 15.7%)	13.2 (from 8.4)	R²=0.65, 4-s window [5]	Effectively removes multiple artifact types without calibrated data [11].
ASR	27.6% (from 15.7%)	Information Missing	`k`=20-30 [3]	Good for burst-like artifacts; integrated into EEGLAB.
Auto-CCA	27.2% (from 15.7%)	Information Missing	Information Missing	Computationally efficient; no reference signals needed.
Adaptive Filtering	32.9% (from 15.7%)	Information Missing	Information Missing	Effective for simple, linear artifacts like eye blinks.

Table 2: Impact of iCanClean Noise Channels on Performance

Number of Noise Channels	Average Good ICA Components	Performance Note
120 (Full Set)	13.2	Baseline optimal performance [5] [7].
64	12.7	Good performance maintained [5] [7].
32	12.2	Moderate performance loss [5] [7].
16	12.0	Performance remains acceptable [5] [7].

Experimental Protocols

Protocol 1: Parameter Tuning for iCanClean's R² and Window Length

This protocol is designed to systematically identify the optimal settings for your specific dataset, a core aspect of thesis research.

Methodology:

Basic Preprocessing: Begin by high-pass filtering the data at 1 Hz and applying average re-referencing. Identify and remove bad channels with amplitudes more than 3 times the median [5] [7].
Define Parameter Sweep:
- R² Threshold: Test values from 0.05 to 1.00 in increments of 0.05. A higher R² (near 1) is less aggressive, while a lower R² is more aggressive [5] [7].
- Window Length: Test 1-second, 2-second, and 4-second windows. A longer window considers more data for the correlation analysis. An "infinite" window using the entire dataset can also be tested [5] [7].
Run iCanClean: Process your dataset through the iCanClean algorithm for each combination of R² and window length.
Evaluate Output: For each cleaned dataset, run ICA (e.g., using AMICA). Then, quantify the number of "good" independent components. A "good" component is typically defined as one that is well-localized by a dipole model (Residual Variance < 15%) and is classified as "brain" by ICLabel with a probability >50% [5] [7].
Select Optima: The parameter set that yields the highest number of "good" components is optimal for your data. Research on walking data suggests an R² of 0.65 with a 4-second window is often ideal [5] [7].

Protocol 2: Phantom Head Validation for Motion Artifact Removal

Using a phantom head with known ground-truth brain signals provides the most rigorous validation of any cleaning method.

Methodology:

Setup: Use an electrically conductive phantom head apparatus with embedded "brain" sources (antennae). Contaminate the EEG recording with controlled artifacts: motion (via cable sway), muscle, and eye artifacts [11].
Data Collection: Record EEG data under several conditions: "Brain" only, and "Brain + All Artifacts" [11].
Apply Methods: Clean the contaminated data using iCanClean, ASR, Auto-CCA, and Adaptive Filtering, running a parameter sweep for each.
Quantitative Analysis: Calculate a Data Quality Score for each result. This score is the average correlation between the known ground-truth brain source signals and the cleaned EEG channels. A higher score indicates better preservation of brain signal and removal of artifacts [11].
Conclusion: The method that achieves the highest Data Quality Score, especially in the challenging "All Artifacts" condition, is the most effective. Phantom studies have shown iCanClean achieves a score near the theoretical maximum (e.g., 55.9% vs. 57.2% for clean "Brain" data), significantly outperforming other methods [11].

Method Workflow Visualization

Algorithm Selection Workflow

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Essential Materials for Mobile EEG Artifact Research

Item	Function in Research
Dual-Layer EEG Cap	The key hardware for iCanClean. Outer-layer electrodes act as noise references, mechanically coupled to scalp electrodes to record only environmental and motion artifacts [5] [11].
Conductive Phantom Head	A physical model with embedded "brain" signal sources. Provides known ground-truth signals to quantitatively validate and compare the performance of cleaning algorithms [11].
Portable EEG System with Active Electrodes	Enables data collection during whole-body movement. Active electrodes help mitigate motion artifacts by amplifying signals close to the source before transmission [11].
High-Performance Computing (HPC) or Workstation	Necessary for running computationally intensive processes like ICA on high-density, mobile EEG datasets, which can require hours of computation time [11].
Software Platform (EEGLAB/BCILAB)	Standard software environments that provide implementations of ASR and a framework for integrating and testing other algorithms like iCanClean [5] [11].

Frequently Asked Questions (FAQs)

Q1: What is a phantom head, and why is it critical for validating EEG processing techniques like iCanClean?

A phantom head is a physical model of the human head, engineered to simulate the electrical conductivity of biological tissues and skull structures [25]. It contains embedded antennae that deliver known "ground-truth" signals. For algorithms like iCanClean that are designed to remove motion artifacts from mobile EEG data, phantom heads provide an indispensable validation tool. They allow researchers to test whether their processing steps can accurately recover known signals that have been corrupted by real-world volume conduction and motion artifacts, a scenario that computer simulations alone cannot fully replicate [25] [26].

Q2: My iCanClean-processed data still shows high residual noise. Which parameter should I adjust first?

The primary parameter to adjust is the r² threshold, which controls the cleaning aggressiveness [4] [5]. A lower r² value (e.g., 0.3) results in more aggressive noise removal, while a higher value (e.g., 0.8) is more conservative. If high noise persists, try lowering the r² value incrementally. A parameter sweep study identified an optimal r² value of 0.65 for data collected during walking, which serves as an excellent starting point for tuning [4] [5].

Q3: How does the choice of "window length" in iCanClean affect the cleaning of my data?

The window length determines the segment of data over which the algorithm calculates the correlation between cortical electrodes and noise electrodes [5]. Using shorter windows (e.g., 1-2 seconds) allows iCanClean to adapt to rapidly changing noise, which is ideal for highly dynamic tasks. Longer windows (e.g., 4 seconds or the entire recording) can be more effective for stabilizing the decomposition when noise is more consistent. Research suggests a 4-second window paired with an r² of 0.65 provides a robust configuration for mobile data [5].

Q4: What are the key metrics for quantifying the performance of iCanClean after using a phantom for validation?

When using a ground-truth phantom, two primary quantitative metrics are:

Cross-correlation: Measures how well the recovered signal matches the original ground-truth signal injected into the phantom. A value above 0.8 indicates excellent recovery [25].
Signal-to-Noise Ratio (SNR): Quantifies the power of the true signal relative to the noise. iCanClean has been shown to maintain an SNR near 10 dB even at fast walking speeds, whereas raw scalp data can drop to ~2 dB [25]. After iCanClean processing and ICA, performance is also measured by the number of "good" independent components—those that are well-localized as dipoles (residual variance < 15%) and have a high brain probability (>50% via ICLabel) [4] [5].

Troubleshooting Guides

Issue 1: Poor Independent Component Analysis (ICA) Decomposition After iCanCleaning

Problem: After processing mobile EEG data with iCanClean, subsequent ICA fails to produce a sufficient number of brain-like components.

Solution:

Step 1: Verify iCanClean Parameters. Re-run iCanClean with the recommended parameters (4-s window, r²=0.65) and ensure you are using an adequate number of noise reference channels. Performance remains strong even with 32 noise channels [5].
Step 2: Check Basic Preprocessing. Ensure data was high-pass filtered at 1 Hz and that grossly abnormal channels were rejected before running iCanClean [5].
Step 3: Validate with a Ground-Truth Phantom. If the issue persists, use a phantom head to verify your entire pipeline. This will isolate whether the problem is with the data, the iCanClean settings, or the ICA algorithm itself [25].

Issue 2: Inconsistent Results Across Participant Groups

Problem: iCanClean performs well on data from young adults but seems less effective on data from older adult populations.

Solution:

Step 1: Investigate Noise Characteristics. Noise properties may differ between groups. Use the phantom head to determine if the algorithm is equally effective at recovering known signals under different simulated noise conditions.
Step 2: Group-Specific Tuning. The optimal r² value and window length may vary depending on the magnitude and type of motion artifact. Perform a parameter sweep on a subset of your data from each group to determine if group-specific tuning is necessary [5]. The same study validated iCanClean across young adults, high-functioning older adults, and low-functioning older adults, establishing a solid baseline for comparison [5].

Issue 3: Signal Distortion or Attenuation

Problem: After aggressive cleaning with iCanClean, the resulting EEG signals appear overly smoothed, and evoked responses are attenuated.

Solution:

Step 1: Increase r² Threshold. This is the most direct fix. A higher r² value (e.g., 0.7-0.8) makes the algorithm more conservative, preserving more of the signal at the potential cost of leaving some noise [5].
Step 2: Quantify Signal Fidelity. Using your phantom head, calculate the cross-correlation and Dynamic Fidelity between the cleaned signal and the ground-truth input. This provides an objective measure of whether crucial signal dynamics are being lost [27].

Experimental Protocols for Validation

Protocol 1: Validating iCanClean Parameters Using a Phantom Head

This protocol outlines how to use a phantom head to empirically determine the optimal iCanClean parameters for your specific experimental setup.

1. Materials and Setup

Phantom Head: A mannequin head filled with conductive material (e.g., dental plaster mixture) and embedded with 6-8 antennae [25].
Signal Generator: An input/output interface (e.g., dSPACE MicroLabBox) to deliver predefined signals to the antennae [25].
EEG System: A high-density EEG system (64+ channels) with a dual-layer cap for noise reference is ideal [5].
Motion Platform: A platform to mimic human head motion at various walking speeds [25].

2. Ground-Truth Signal Generation Generate complex, physiologically relevant signals for the antennae using a Neural Mass Model (NMM). Create signals with peak frequencies in different EEG bands (e.g., theta: 6.5 Hz, alpha: 10 Hz, gamma: 41 Hz). Incorporate intermittent, known connections between these signals to serve as a ground truth for connectivity analysis [25].

3. Data Collection & Processing

Mount the phantom on the motion platform and record EEG data while it executes a pre-programmed motion pattern (e.g., walking at different speeds).
Process the collected data through your pipeline, applying iCanClean with a range of parameters in a sweep (e.g., window lengths: 1, 2, 4, infinite seconds; r²: 0.05 to 1 in 0.05 increments) [5].

4. Performance Quantification For each parameter combination, calculate the following metrics:

Cross-correlation between recovered components and original NMM signals [25].
Signal-to-Noise Ratio (SNR) of the recovered signals [25].
For connectivity measures, compute the accuracy in identifying the known interconnections from the ground truth [25].

The table below summarizes key performance metrics from published studies to serve as a benchmark.

Table 1: Benchmark Performance Metrics for iCanClean and Phantom Validation

Metric	Target Performance	Context / Conditions
Optimal iCanClean Parameters	Window: 4-s, r²: 0.65 [4] [5]	Mobile EEG data from walking; improves good ICs from 8.4 to 13.2 (+57%) [5]
Cross-correlation with Ground-Truth	Primarily > 0.8 [25]	Phantom head validation after ICA source separation [25]
Signal-to-Noise Ratio (SNR)	~10 dB (maintained with iCanClean) [25]	During fast walking speeds; compared to ~2 dB in raw scalp data [25]
Good Independent Components (Residual Variance <15%, ICLabel >50%)	13.2 components on average [5]	After iCanClean processing with optimal parameters [5]
Scanner Instability/Noise in Phantom Data	6–18% of total noise [27]	Measured as multiplicative noise contribution in "best-case" fMRI scanners [27]

Protocol 2: Fabricating a Simple EEG Phantom for System Validation

For labs without a commercial phantom, this protocol provides a method for creating a basic validation tool.

1. Phantom Fabrication

Skull Model: Use a 3D-printed skull based on CT/MRI data (PLA plastic is acoustically opaque and can serve as a base) [28] [29].
Tissue-Mimicking Material: Fill the skull with a conductive mixture. A simple option is polyvinyl chloride (PVC) plastisol, which has been validated for its tissue-like properties [28] [29]. For a more accessible material, a mixture of dental plaster, sodium propionate, and water can simulate realistic tissue conductance [25].
Targets: Insert needles or small metal/plastic objects (1-3 mm in size) to serve as navigation or detection targets during imaging [28] [29].

2. Basic System Validation

Inject simple, known signals (e.g., sine waves) into the antennae or targets embedded in the phantom.
Record the EEG and process it with your standard pipeline.
The recovered signal should closely match the input, confirming that your entire acquisition system from electrodes to amplifier is functioning correctly [26].

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 2: Key Materials for Phantom Fabrication and Validation

Item	Function / Application	Example Use Case
Polyvinyl Chloride (PVC) Plastisol	Tissue-mimicking material for simulating brain tissue in phantoms [28] [29]	Used as a primary filler in 3D-printed skull models to create realistic head phantoms for transcranial ultrasound [28].
Polylactic Acid (PLA)	Filament for Fused Deposition Modeling (FDM) 3D printing; creates acoustically opaque structures like the skull [28] [29]	3D printing the structural components of a head phantom, such as the skull bone [28].
Photopolymer Resin	Material for LCD 3D printing; used to create parts with specific acoustic properties, like temporal acoustic windows [28] [29]	Printing the "acoustic windows" in a head phantom to simulate areas of the skull that allow ultrasound to pass [28].
Dental Plaster Mixture	Conductive medium that simulates realistic tissue conductance for EEG phantoms [25]	Combined with sodium propionate and water to fill a mannequin head for EEG electrode testing [25].
Neural Mass Model (NMM)	Computational model that generates complex, physiologically relevant signals with known interconnectivity [25]	Providing the "ground-truth" brain signals that are played into the antennae of a phantom head during validation [25].
dSPACE MicroLabBox	Input/output interface hardware for delivering precise, predefined signals to antennae within a phantom [25]	Used to feed NMM-generated signals into the antennae of an EEG phantom head [25].

Workflow Visualization

The following diagram illustrates the complete experimental workflow for phantom head validation, from setup to quantitative analysis.

Experimental Workflow for Phantom Head Validation

Troubleshooting Guide: iCanClean Parameter Tuning for Mobile EEG

This guide provides targeted support for researchers optimizing the iCanClean algorithm for mobile brain imaging, focusing on the critical parameters of R² threshold and window size.

Optimal Parameter Settings for iCanClean

The table below summarizes the key parameters for the iCanClean algorithm, their functions, and recommended values based on systematic testing.

Parameter	Function & Effect	Recommended Value	Performance Impact
R² Threshold	Controls cleaning aggressiveness; lower values remove more data components [7].	0.65 (for walking motion artifacts) [7] [5]	Increased good ICA components from 8.4 to 13.2 (+57%) at this setting [7].
Window Length	Duration of data segments cleaned in one analysis cycle; affects noise correlation detection [7].	4 seconds (for walking motion artifacts) [7] [5]	Balances local noise correlation tracking with sufficient data for stable cleaning [7].
Noise Channels	Number of reference noise electrodes used for artifact detection [7].	16-64 channels (from 120 available) [7]	Performance maintained with reduced channels (12.0-12.7 good components vs. 13.2 with 120 channels) [7].

Detailed Experimental Protocols

Protocol 1: Validating iCanClean on a Phantom Head

This protocol was designed to quantitatively compare iCanClean against other artifact removal methods using a known ground truth [14].

Apparatus: An electrically conductive phantom head with 10 embedded brain signal sources and 10 contaminating artifact sources [14].
Test Conditions: Data was recorded under six conditions: Brain only, Brain + Eyes, Brain + Neck Muscles, Brain + Facial Muscles, Brain + Walking Motion, and Brain + All Artifacts [14].
Comparison Methods: iCanClean was compared to Artifact Subspace Reconstruction (ASR), Auto-CCA, and Adaptive Filtering [14].
Quality Metric: A Data Quality Score (0-100%) was calculated as the average correlation between known brain sources and the recorded EEG channels [14].
Key Outcome: In the "Brain + All Artifacts" condition, iCanClean improved the Data Quality Score from 15.7% (before cleaning) to 55.9%. This outperformed ASR (27.6%), Auto-CCA (27.2%), and Adaptive Filtering (32.9%) [14].

Protocol 2: iCanClean Parameter Sweep for Mobile EEG

This protocol established optimal parameters for cleaning motion artifacts during walking [7] [5].

Data Collection:
- Participants: 45 adults across three groups: young adults, high-functioning older adults, and low-functioning older adults [7] [5].
- EEG Setup: High-density dual-layer cap with 120 scalp electrodes and 120 outward-facing noise electrodes [7] [5].
- Task: Participants walked on a treadmill at varying speeds and over terrain of differing difficulty for approximately 48 minutes [7] [5].
Data Processing:
- EEG data was high-pass filtered (1 Hz cutoff) and average-referenced [7] [5].
- Basic preprocessing removed bad channels [7] [5].
Parameter Sweep:
- R² Threshold: Tested from 0.05 to 1.00 in increments of 0.05 [7] [5].
- Window Length: Tested at 1s, 2s, 4s, and an "infinite" window using the entire dataset [7] [5].
Outcome Measure: Data cleaned with each parameter set was decomposed using Independent Component Analysis (ICA). "Good" brain components were identified as those well-localized by a dipole model (Residual Variance < 15%) and classified as brain-like by the ICLabel algorithm (probability > 50%) [7] [5].
Key Finding: The combination of a 4-second window and an R² threshold of 0.65 was found to be optimal, maximizing the number of "good" brain components extracted [7] [5].

iCanClean Algorithm Workflow

The following diagram illustrates the logical flow of the iCanClean algorithm and its key parameters based on the described research.

The Scientist's Toolkit: Research Reagent Solutions

Item	Function in Experiment
Dual-Layer EEG Cap	A specialized cap with 120 scalp electrodes and 120 mechanically coupled but electrically isolated noise electrodes. It provides the reference noise signals essential for the iCanClean algorithm [7] [5].
Electrical Phantom Head	A bench-test apparatus with embedded brain and artifact sources. It provides known ground-truth signals for validating cleaning algorithms without human subjects [14].
INDIP Reference System	A multi-sensor system (INertial modules, DIstance sensors, Pressure insoles) used as a gold standard for validating real-world gait and mobility digital outcomes [30].
GaitPy Algorithm	An open-source method for analyzing gait from a single lumbar-worn accelerometer, enabling validation of gait speed in naturalistic environments [31].
ICLabel Classifier	A convolutional neural network that automatically classifies Independent Components from ICA, helping researchers identify "good" brain components post-cleaning [7] [5].

Frequently Asked Questions (FAQs)

Q1: My data is from a seated task with eye and muscle artifacts, not walking. Should I use the same R²=0.65 setting? A: The R²=0.65 and 4-second window were optimized specifically for cleaning motion artifacts during walking [7]. For other artifact types (e.g., pure eye-blink or EMG noise), a different parameter combination might be more effective. It is recommended to run a small parameter sweep on a representative segment of your data to find the optimal setting for your specific task and artifact profile.

Q2: I have fewer than 16 noise channels available. Can I still use iCanClean effectively? A: The research shows that performance gracefully declines as noise channels are reduced [7]. While 16-64 noise channels are recommended for best results, the algorithm can still function with fewer, though cleaning efficacy may be lower. Prioritize using the most evenly spaced subset of your available noise channels.

Q3: How does iCanClean's performance compare to traditional methods like ICA alone? A: iCanClean is not a replacement for ICA but a powerful preprocessing step. The study showed that cleaning data with iCanClean before running ICA significantly improved the results of the ICA decomposition, increasing the number of identifiable, high-quality brain components [7] [5].

Q4: What is considered a successful outcome after using iCanClean? A: Success depends on your downstream analysis. For source-level analysis with ICA, success is measured by an increase in the number of "good" brain components that are dipolar and have high brain probability [7]. For other analyses, success could be a higher Data Quality Score or improved signal-to-noise ratio in event-related potentials.

Frequently Asked Questions (FAQs)

Q1: What are the key performance metrics used to validate iCanClean's effectiveness? The primary metrics for evaluating iCanClean are:

Number of "Good" Independent Components (ICs): Components classified as originating from the brain (ICLabel probability >50%) that are well-localized (dipole residual variance <15%) [5] [7].
Data Quality Score: A score from 0% to 100% representing the average squared correlation between known ground-truth brain sources and the recorded/cleaned EEG channels. A higher score indicates better preservation of brain signals [14] [11].
Spectral Power Reduction: The algorithm's effectiveness in removing artifacts is also observable through the reduction of high-frequency power in the EEG signal, characteristic of muscle and motion artifacts [5].

Q2: How do the r² threshold and window length parameters affect cleaning performance? These are iCanClean's core tuning parameters [5]:

r² Threshold (Cleaning Aggressiveness): Controls which correlated noise subspaces are removed. A lower r² value results in more aggressive cleaning (more components removed), while a value near 1.0 results in less cleaning [5] [7].
Window Length: Determines the segment of data used to calculate correlations between cortical and noise electrodes. Shorter windows (e.g., 1-4 seconds) adapt to non-stationary noise, while an "infinite" window uses the entire dataset [5].

Q3: What are the optimal parameter settings for iCanClean? Based on a systematic parameter sweep with mobile EEG data during walking, the optimal settings were found to be [5] [7]:

r² Threshold: 0.65
Window Length: 4 seconds At these settings, the average number of "good" brain components increased by 57%, from 8.4 to 13.2 [5].

Q4: How does iCanClean performance change with fewer noise reference channels? Performance remains robust even with a reduced set of noise channels. After finding the optimal parameters, a subsequent test showed that the number of "good" ICs only slightly decreased with fewer channels [5] [7]:

Number of Noise Channels	Average Good ICs
120 (Full Set)	13.2
64	12.7
32	12.2
16	12.0

Q5: How does iCanClean compare to other artifact removal methods? In a phantom head study with known ground-truth signals, iCanClean consistently outperformed other real-time-capable methods, especially when multiple artifacts were present simultaneously. Starting from a Data Quality Score of 15.7% (before cleaning) in the "Brain + All Artifacts" condition, results were [14] [11]:

Cleaning Method	Data Quality Score After Cleaning
iCanClean	55.9%
Adaptive Filtering	32.9%
Artifact Subspace Reconstruction (ASR)	27.6%
Auto-Canonical Correlation Analysis (Auto-CCA)	27.2%

Experimental Protocols & Workflows

Protocol 1: Parameter Sweep for Mobile EEG

Objective: To determine the optimal r² threshold and window length for iCanClean to maximize the number of high-quality brain components extracted via ICA from mobile EEG data [5] [7].
Dataset: High-density (120-channel) dual-layer EEG recorded from 45 participants (young adults, high-functioning older adults, low-functioning older adults) during 48 minutes of treadmill walking [5] [7].
Preprocessing:
- High-pass filter at 1 Hz cutoff.
- Average re-referencing of EEG and noise channels separately.
- Reject outlier channels with amplitude >3x the median standard deviation [5] [7].
Parameter Sweep Execution:
- r² Threshold: Tested from 0.05 to 1.00 in increments of 0.05.
- Window Length: Tested 1s, 2s, 4s, and infinite (full recording) windows.
- Apply iCanClean to the preprocessed data for each parameter combination [5].
Outcome Measurement:
- Run ICA (e.g., AMICA algorithm) on the cleaned data.
- Classify components using ICLabel (brain probability >50%) and dipole fitting (residual variance <15%).
- Count the number of "good" components for each parameter set [5] [7].

Protocol 2: Phantom Head Validation

Objective: To quantitatively compare iCanClean against other methods (ASR, Auto-CCA, Adaptive Filtering) using a ground-truth phantom head model [14] [11].
Apparatus: An electrically conductive phantom head with 10 embedded "brain" signal sources and the capability to inject various artifact types (eye, muscle, motion, line-noise) [14] [11].
Experimental Conditions: Record EEG data under six scenarios: Brain (clean), Brain + Eyes, Brain + Neck Muscles, Brain + Facial Muscles, Brain + Walking Motion, and Brain + All Artifacts [14] [11].
Processing:
- Apply each cleaning algorithm (iCanClean, ASR, Auto-CCA, Adaptive Filtering) to the contaminated data.
- For iCanClean, use a range of parameters to find the best performance.
Outcome Measurement:
- Calculate the Data Quality Score for each condition and algorithm: DQS = 100% * mean(R²(brain_source_i, EEG_channel_j)) for all i brain sources and j EEG channels [14] [11].

The Scientist's Toolkit: Research Reagent Solutions

Item	Function in iCanClean Research
Dual-Layer EEG Cap	A specialized cap with inner-layer (scalp) electrodes recording brain + noise and outer-layer electrodes recording only reference noise. Mechanically coupled but electrically isolated [5] [7].
Electrical Phantom Head	A physical model with embedded antennas to simulate known "brain" signals. Allows for controlled injection of artifacts and provides ground-truth for validating cleaning algorithms [14] [11].
ICLabel	A convolutional neural network-based classifier that automatically labels independent components from ICA by estimating the probability that a component comes from a specific source (e.g., brain, muscle, eye) [5] [7].
Dipolar Source Localization	A method to fit an equivalent current dipole to an ICA component's scalp topography. A low residual variance (<15%) indicates a component that is physically plausible and likely originates from a compact brain source [5].
Canonical Correlation Analysis (CCA)	The core statistical engine of iCanClean. It identifies linear subspaces of the cortical EEG data that are maximally correlated with subspaces in the reference noise data, which are then removed [5] [14].

Conclusion

The systematic tuning of iCanClean's R2 value and window size is paramount for unlocking reliable mobile brain imaging. The consensus from current research points to an R2 threshold of 0.65 and a 4-second window as a robust starting point for human walking studies, significantly improving the yield of high-quality, dipolar brain components. Successful application requires a careful balance; an overly aggressive R2 can suppress neural signals, while an overly conservative one leaves artifacts. iCanClean has consistently demonstrated superiority over methods like ASR in both phantom and human studies, particularly in preserving brain signals while removing complex motion and muscle artifacts. For biomedical and clinical research, these optimized cleaning protocols enable more sensitive detection of electrocortical biomarkers during dynamic behaviors, paving the way for deeper investigations into neurological mechanisms of mobility, more objective assessment in neurorehabilitation, and potentially sharper endpoints for clinical trials in drug development for neurological disorders. Future work should focus on developing fully automated parameter selection and adapting these guidelines for a wider range of populations and real-world activities.