Open Access Article
Arabella Stockdale
*a,
Joe Bibleb,
Eli Mattingly
c and
Olin Thompson Mefford
a
aDepartment of Materials Science and Engineering, Clemson University, Clemson, SC 29634, USA. E-mail: arabelh@clemson.edu
bSchool of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, USA
cIndependent Researcher, USA
First published on 12th June 2026
Magnetic particle spectroscopy (MPS) is an emerging analytical technique with potential for rapid, accessible, and affordable point of care biomarker detection based on magnetic signal fluctuations. Accurate interpretation of MPS signals requires both colloidal stability of nanoparticles in aqueous and complex media and statistical models that appropriately represent experimental variability. Conventional least squares (LS) approaches assume stationary and homoscedastic noise, assumptions which break down in experimental settings due to inherent temporal and within sample variability. This study investigates the signal variability in 22 nm iron oxide nanoparticles (IONPs) coated with 20 kDa poly(ethylene glycol). We evaluate the validity of the standard linear model and reveal that the least squares approach fails to account for inherent temporal and within sample variability. Repeated measurements under identical treatment conditions reveal substantial structured fluctuations consistent with a non-stationary measurement process. Transformations commonly used to normalize mass loading effects, including harmonic ratios, redistribute variance without resolving the underlying variability. A hierarchical mixed effects model is implemented to partition within-run and temporal sources of variation and to provide confidence estimates that reflect the observed variance structure across observation levels. These findings establish a statistical framework for representing intrinsic variability in MPS measurements and support more rigorous interpretation of harmonic signals. Explicit modeling of hierarchical uncertainty is an important step towards quantitative, reproducible MPS analysis and translation of MPS for sensitive biomarker detection.
Signal quality in MPS is often reported using metrics such as the signal-to-noise ratio (SNR) or harmonic decay ratio (e.g., A5/A3) with uncertainty typically estimated using the standard error of the mean (see eqn (2)).11–13 The standard error asymptotically shrinks with increasing measurement acquisition making the estimates of uncertainty artificially small and result in unrealistically narrow confidence intervals. These approaches implicitly assume stationary means and statistically equivalent observations, neglecting between-sample variation that may overstate reproducibility. In practice, MPS datasets frequently include multiple aliquots, repeated runs, and day to day measurements making the definition of an independent replicate nontrivial. Treating these as independent observations collapses distinct sources of variability into a single estimate and overstates reproducibility. As a result, conventional uncertainty metrics can give the appearance of a well defined expected mean even when substantial structured variability persists in the data.
Magnetic particle spectroscopy (MPS) is widely used for biosensing applications due to its exceptional sensitivity to changes in nanoparticle media, surface chemistry, and interfacial dynamics. Ligand binding and particle mobility can induce subtle measurable shifts in the harmonics spectra enabling the detection of a variety of biomarkers.6–9,14,15 To support these applications, surface coatings such as poly(ethylene glycol) (PEG) are widely used to enhance colloidal stability and provide chemical handles for target moieties.11,16–18 MPS is inherently sensitive to minute changes in nanoparticle dynamics, providing sensing capabilities for subtle perturbations in magnetic behavior. While this enables powerful bio sensing applications, it is highly prone to measurement variability in external factors such as temperature, aggregation, and instrumentation even under tightly controlled experimental conditions. Such variability may mimic true biological binding events that could lead to false positives or in-differentiable results in clinical applications.19,20
Aggregation, local viscosity changes, and medium induced fluctuations have been shown to cause non-constant frequency scaling, while colloidal stabilization can nonlinearly affect relaxation behavior, hydrodynamics, and electrophoretic mobilities.6,21 Importantly, these fluctuations can be hidden by the intrinsic nature of scale in magnetic measurements. When harmonic data for grossly different samples are visualized such is the case in calibration dilutions, fluctuations may appear suppressed due to logarithmic scaling despite suffering from the same variability problems on local signal interpretation as shown in Fig. 3.
These observations highlight the need for a robust statistical model that explicitly accounts for known causes of variation in MPS harmonic measurements. A common approach for analyzing harmonic data is using least squares (LS) regression or standard linear models that implicitly assume stationary and homoscedastic noise. Such methods offer practical estimates of harmonic behavior and are frequently adapted in the MPS literature.7,9,11,14,22 However, these assumptions are poorly aligned with non-stationary systems which fail to account for within-sample and temporal variability. As a result, uncertainty is systematically underestimated, raising the risk of overly confident interpretations when random noise, system drift, or sudden fluctuations in the signal occur.
In MPS measurements, LS based models collapse temporal and within sample variation into a single error term, thus leading to systematically underestimated uncertainty. This produces artificially narrow confidence intervals that mask real signal fluctuations and create a false impression of precision. This limitation reflects the known behavior inherent to MPS and MPI devices where signal distortions arise from a combination of factors, such as background feed-through, system drift, random noise, drive field fluctuations, and abrupt fluctuations of unidentified origin.23,24 While there will always be mechanical improvements to limit variability, such occurrences cannot be fully eliminated in practice. Consequently, robust statistical modeling is required to accurately represent and interpret MPS in data in non ideal experimental systems.
Recognizing this gap, the present study treats within and between sample variability as an inherent property of this material system, to establish a statistically rigorous framework for evaluating MPS measurements. We systematically examine signal reproducibility in PEG coated iron oxide nanoparticles by quantifying both aliquot-level (within sample, σWi,j and day-to-day variation/temporal, σBi) for liquid phase 20 kDa PEG-coated 22 nm iron oxide nanoparticles. We demonstrate the necessity of mixed effect models that explicitly account for the aforementioned random effects, providing an accurate and statistically defensible description of MPS data. This framework enables meaningful uncertainty quantification and reproducible interpretation in non-ideal experimental setups. By looking closely into the origin and persistence of these perturbations, this work advances the reliability of MPS measurements, introduces a new standard for analysis protocol, and establishes a statistically sound foundation for the clinical translation in biomarker detection, bio imaging, and targeted therapeutic applications.
X-ray diffraction patterns were indexed to magnetite confirming the crystalline phase (see Fig. 1d). Positions and profile changes across the series. The diffraction patterns are indexed to the cubic inverse spinel structure of Fe3O4 (JCPDS 85-1436). The full width at half maximum value of the (311) reflection as determined from a Gaussian fitting was input to the Debye Scherrer equation, as described in eqn (5). The crystallite size was estimated to be 14.2 nm. This value is slightly smaller than the average particle size obtained from TEM (21.6 ± 2.7 nm). This discrepancy is expected as XRD provides the size of diffracting crystalline domains, whereas TEM measures the overall particle size; in this case, it suggests multiple domains within a single particle.
Applying a polymer ligand is well established method to increase colloidal stability of nanoparticles in water.18,25,26 This molecular weight (20 kDa) was chosen due to its high steric repulsion and stabilization properties.16,18 The PEG coating is anchored to the iron oxide surface via nitroDOPA, a catechol amine, forming strong chelation interactions that are known to provide stable surface functionalization under acqueous conditions.16,17,26 Dynamic light scattering (DLS) was therefore used to asses colloidal stability in DI water over 24 hours. Fig. 1a and b show the DLS intensity average and Z-average distributions for PEG coated IONPs in DI water respectively. The hydrodynamic diameter decreased slightly from 96.1 to 90.4 nm over 24 hours with narrow intensity average distributions and no emergence of large aggregates. This indicates that gross colloidal instability or irreversible aggregation is not present, and any variability in MPS measurements likely originate from mechanisms beyond simple sedimentation within the timescale of a day. Together, these results suggest that ligand binding and colloidal stability are maintained over the course of measurements across multiple days.
To illustrate the consequences of these assumptions, least-squares (LS) regression was applied to the third harmonic (A3) and harmonic ratio (A5/A3) to estimate the mean and corresponding 95% confidence intervals. Three independently prepared aliquots of 20 kDa 22 nm IONPs were measured randomly over three days to evaluate within-run and between-day variability. When 95% confidence intervals for mean third harmonic, Ā3, were calculated by treating each day as an independent sample, the removal of temporal structure resulted in non overlapping confidence intervals. The resulting LS confidence intervals are summarized in Table 1. This approach concludes that measurements of the same aliquot collected on different days are statistically different.
| Day | 95% CI [A.U.] | Sample estimates mean [A.U.] |
|---|---|---|
| 1 | [2.34 × 10−10−2.36 × 10−10] | 2.35 × 10−10 |
| 2 | [1.59 × 10−10−1.61 × 10−10] | 1.60 × 10−10 |
| 3 | [1.69 × 10−10−1.71 × 10−10] | 1.70 × 10−10 |
When all three days are pooled into a single dataset, the resulting LS confidence intervals [1.85 × 10−10−1.92 × 10−10 A.U.] shown in Fig. 2b, remain unrealistically narrow. In fact the resulting intervals never contains a single data point for the series. Consequently, all measured values fall well outside the 95% confidence bounds, indicating a systemic underestimation of uncertainty and complete misrepresentation of the data. This failure arises because LS regression collapses within aliquot and day to day variability random effects into a single residual error term thus producing artificially narrow confidence intervals. Critically, this behavior persists even when data appear well behaved on logarithmic scales with well known commercial agent, Synomag™-D 70 nm lot: 54525104-04 (see Fig. 3). This underscores the necessity of a model that attributes variation to random effects such as temporal variability.
![]() | ||
| Fig. 3 Synomag-D 70 nm dilution series plotted as signal intensity of the third harmonic A3 vs. (a) iron mass and (b) run number. | ||
To properly account for this variability, a linear mixed effects model was applied (see eqn (3)) where μ is the mean of the targeted harmonic, Bi represents day to day fluctuations, Wij captures between-run within-day variation and εijk denotes residual measurement noise. The random effect structure should be selected based on known or potential sources of variation as a function of the physical workflow of the observation process. For example, variability arising from day to day changes (e.g., ambient temperature, sample aging, or instrument drift) should be modeled at the day level capturing valuable timescales for degradation studies or biomarker uptake. Additionally, variability introduced within a measurement session (e.g., local heating, sample repositioning, and sonication induced rearrangement of nanoparticle clusters) may be captured at the run level. In practice, researchers should identify candidate sources of variability and determine whether they occur at predictable intervals, then assign random effects accordingly (see Section Statistical modeling).
Unlike LS regression, the mixed effects model produces confidence intervals that accurately summarize the uncertainty in our estimates based on the observed data. Applying this model to the dataset of aliquot one across all days shown in Fig. 2 illustrates the lack of confidence in the estimated value of the harmonic data. With this method, the individual sources of temporal variability are accounted for and result in all harmonic data falling within the LMM confidence intervals. Assigning defined jumps in the data to random effects Bi and Wij with the LMM method, yield a 95% confidence bound [8.72 × 10−11−2.89 × 10−10] as highlighted in Fig. 2b for aliquot one across all days. This method well encompasses the amplitude data in contrast to the LS method.
It should be noted that the power of the LMM method arises from its ability to accommodate temporal and between-run fluctuations. If the dataset were collapsed by day as shown in Table 1, the mixed effect model simplifies to a LS model where the random effect terms go to zero such that all variability is attributed to residual measurement noise. While the mixed effects model is general and can, in principle, be applied to other nanoparticle formulations and MPS platforms, the present study evaluates a single well characterized IONP system. Extension to particles with different core sizes, coatings, or interaction effects may yield different variance structures and should be validated in future studies.
Alternative strategies such as harmonic ratio analysis or observation of harmonic phase are commonly employed to mitigate mass loading effects.7,9,14 In the present data, such techniques did not reduce overall variability but instead redistributed it, amplifying within-run stochastic noise relative to between-run and between-day variation. Although amplitude measurements exhibit abrupt fluctuations, these changes occur at predictable intervals and can be ascribed to identifiable aspects of the observation process that are naturally accommodated within a hierarchical random effects framework. In contrast, ratio operation introduces a nonlinear combination of two random variables spanning an order of magnitude. This operation removes information about the absolute signal magnitude, which reflects factors such as particle mass, interactions and system sensitivity, while rescaling variance contributions. As a result, direct comparison of the noise structure within a linear mixed effects model becomes more complex. The apparent compactness of the ratio distribution therefore reflects a scaling effect rather than an intrinsic reduction in signal variability (see Fig. 2a) When mass is independently known, modeling amplitude data provides a more direct representation of drift, trends, and structured fluctuations across observation intervals. Ratios may instead be viewed as additional derived observations that require explicit modeling of their constituent harmonic relationships.
Prior work has demonstrated empirical correlations between harmonic amplitudes to reduce quantification uncertainty.19 While such relationships improve calibration, they do not account for intrinsic signal variability arising from non-stationary particle dynamics. Importantly, the observed variance in amplitude measurements reflects fundamental characteristics of MPS which will vary instrument to instrument. Spatial sensitivity of detection coils and temperature dependent nanoparticle dynamics introduce hierarchical sources of uncertainty that persist even in optimized systems. As the demand for measurement sensitivity increases, statistical frameworks that explicitly model hierarchical variability become essential for resolving subtle signal perturbations necessary for advanced magnetic characterization and bio diagnostic applications.
, A5/A3), far exceeding the instrument noise floor (see Fig. 3b). Confidence intervals obtained via standard LS methods fail to accommodate these extra sources of variation and systematically underestimate uncertainty. The operation of computing the harmonic ratio, while commonly used in the field, propagated noise and masked the attributions of underlying variances observed in the amplitude data. While the phase data could pose as a more stationary mass independent metric, this falls out of the scope of this project and is suggested as future work. We hypothesize that a portion of the observed variance arises from reversible interacting nanostructures (e.g. dimers or small chains) dynamically created during MPS acquisition. Similar field induced clustering has been extensively shown in literature.3,11,18,27–29 These structures would transiently modulate Brownian relaxation times on millisecond to second timescales, producing instantaneous changes in harmonic amplitudes.30 This mechanism is consistent with the colloidal stability confirmed through dynamic light scattering (DLS) (Fig. 1b) and non-stationary fluctuations within a single run.
Mixed-effects modeling provides a statistically rigorous method to partition variance and correctly represent uncertainty in MPS measurements. While this paints a clear picture of the data, this method does not inherently improve sample distinguishability, highlighting a limitation of diagnostic applications. Future improvements may involve instrumentation adjustments to control extraneous variation (e.g., feedback controlled drive fields, improved sample assembly), or the use of grossly different materials in which the threshold of positive signal in the case of disease testing would vastly out scale the variance. For example, nanoparticles with highly different hydrodynamic sizes, surface coatings, or morphology would be expected to significantly influence the magnetic response.27,28,31 These considerations are also relevant for smaller superparamagnetic iron oxide nanoparticles (SPIONs), where reduced core size impacts signal characteristics and potentially interactions and variance structure. While these effects may differ from the present system, the mixed effects framework remains applicable for quantifying variability when the relevant sources and timescales are properly defined.
In this study, a 20 kDa PEG coating was selected to provide a stable, well dispersed system; however, varying PEG molecular weights may further alter hydrodynamic size, interparticle interactions, and relaxation dynamics, therefore impacting both signal magnitude, harmonic decay, and variability. This highlights that material driven differences can outweigh measurement variability as illustrated in Fig. 3, where separation is possible even in the presence of large fluctuations. Together, these findings highlight the need for both statistical rigor and careful experimental design in reliable MPS instrumentation.
In a standard LS approach, we assume that the perturbations we observe in harmonic signals are independent with a mean of 0 and common variance about the true mean μ. Typically, normality is assumed although approximations can typically be obtained of non-normal data. We would define
![]() | (1) |
is obtained (for the balanced case) as
where I, J and K are the number of days, runs and observations respectively. The standard error of an estimate
LS is
![]() | (2) |
With the LMM we assume
![]() | (3) |
LS =
LMM, however
![]() | (4) |
Size analysis of the magnetic core was obtained using transmission electron microscopy (TEM, Hitachi H7830). The samples were prepared by drop casting diluted hexane dispersions onto carbon-coated copper grids and dried under ambient conditions. Image analysis of approximately 300 particles was performed using ImageJ©.
Iron concentration was determined by inductively coupled plasma mass spectroscopy (ICP-MS) following nitric acid digestion. Two microliters of each sample were dried in a 500 °C oven over night to remove organic matter, then digested in 5 mL of trace metals nitric acid at 90 °C for one hour. The sample was then dispersed in 5 mL of 2% nitric acid, sonicated, filtered with a 0.2 µm Teflon syringe filter, and quantified using calibration curves derived from iron standards.
X-ray diffraction (XRD) was performed using a Rigaku SmartLab powder X-ray diffractometer to determine the phase and composition. Measurements were carried out using Cu Kα radiation (λ = 1.54 Å.) Samples were prepared by dropwise deposition of hexane dispersions onto a circular glass pane, forming uniform dry films. Diffraction patterns were collected over the 2θ range of 15 to 80 degrees with a scan rate of of 5 degrees/minute and a step size of 0.02 degrees.
To improve the signal to noise ratio, a smoothing function was applied using a Savitzky–Golay filter in Origin. The smoothing was used only to assist peak visualization and fitting does not affect peak broadening. Peak positions and full width at half maximum (FWHM) values were obtained by fitting the diffraction peaks with a Gaussian function in Origin. The (311) reflection was selected due to its high intensity and minimal overlap with neighboring peaks. The crystallite size in nanometers was estimated using the Debye Scherrer equation
![]() | (5) |
For each sample type, three aliquots containing 60 µL of substance, were prepared in 0.6 mL microcentrifuge Eppendorf tubes. Each observation consisted of five sequential runs. Between runs, the motor was homed to null any accrued motor position offset and the baseline was reset between excitation pulses. This procedure yielded a total of 110 harmonic spectra per observation.
Prior to measurement, samples were sonicated in a bath sonicator at 25 °C for 30 seconds to ensure thermal equilibrium and minimize settling effects before entering the bore. The series of samples measured each day were input in a randomized sequence. The instrument noise floor was defined as Ak = μk + 10σk where Ak is the amplitude of the kth harmonic, μk is the mean of the kth harmonic and σk is the standard deviation obtained from 10 blank runs, respectively, following the guidance of the International Union of Pure and Applied Chemistry (IUPAC).
To adequately sample potential sources of variation, a five day measurement protocol was conducted. On day 1, three aliquots from the 20 kDa PEG IONP sample were prepared to serve as the initial aliquot reference. On subsequent days, three aliquots were freshly pipetted and measured in a randomized sequence alongside day 1 references to evaluate within sample and temporal variation. This allowed for analysis of aliquot-to-aliquot variation between two series of a given sample measured on the same day. Additionally, the same aliquots from day 1 were remeasured daily to assess instrument drift.
| This journal is © The Royal Society of Chemistry 2026 |