Supplement to the manuscript ‘Modeling atmospheric aging of small-scale wood combustion emissions: distinguishing causal effects from non-causal associations’

Small-scale wood combustion is a significant source of particulate emissions. Atmospheric transformation of wood combustion emissions is a complex process involving multiple compounds interacting simultaneously. Thus, advanced methodology is needed...

Effect of filtering and smoothing to the goodness of fit parameters.       Tables   Table S1 Coefficients for smaller simulated dataset.

S1 Motivation for time series filtering
The measurement , at time for time series is the sum of the true value , and the measurement error , . These measurement errors comprise sampling from the chamber ( , ), error related to the estimation of particles and gases losses to chamber walls ( − , ), and error related to processing of measurement instrument data from raw data into more useful form ( , ).
Measurements , were presented as where the error term , is independent in time and follows a specified distribution presenting all uncertainties. The state , describes the estimate of the real state of the variable in the chamber and the error term , represents the error related to the estimation of the state.
Understanding the evolution of the state , of the variable is the question of interest. We would like to understand the factors affecting the change of state during the aging of emissions.
Therefore, we estimated the state , from measurements , for each variable.

S2 Methods for simulation studies
Two simulated data sets (see Table 1 in the main text) were formed to study how model would perform in a situation where we know the correct evolution and structure. Both data sets were formed using R-package deSolve (Soetaert et al., 2010). Data sets describe the evolution of Ordinary Differential Equation (ODE) system which length is 100. The difference between data sets is the way how differential equations of variables are linked to each other. In smaller data sets, differential equations are following the Laws of Mass Action, applied in R-package S4 episode (Mikkelsen, 2017;Seinfeld and Pandis, 2016, see Table S1 for coefficients used in the system). In larger data set, equations have been formed independently for each variable, using pre-defined causal structure for variables (see Table S2 for coefficients used in the system. We called these data sets as simulated data sets throughout the text.  Several questions of interest existed related to the properties of input data set and data pretreatment (Table S3). Firstly, we were interested to study how the precision of the measurements by analytical instrumentation is related to the model fit, which is assessed by S6 using different proportions of random noise mimicking the measurement error and the number of measurements made in time. To mimic the measurement error, normally distributed random noise was added to the variables. The standard deviation of the added random noise have been proportional to the standard deviation of the simulated variables due to evolution process. The proportion of the standard deviation of the random noise have been called fraction of added uncertainty. The number of measurements during the same evolution length (100) was altered to be between 26 and 401.
Secondly, we were interested to know whether the methods we applied to increase the quality of data are increasing the quality of the fit, i.e. accuracy of fit and obtained causal structure.
Does filtering or smoothing of time series improve the fit and accuracy of prediction and is there an optimal time resolution to which data should be averaged?
Thirdly, the amount of necessary prior information was the question of interest. We were interested to study the importance of prior information given to a causal discovery algorithm to the modeled structure. Does addition of prior information improve the accuracy of modeled structure and how much prior information is necessary to get a reasonably good fit for the model.
The question about necessity of the prior information is also related to the dependence of model fit and the correctness of the structure. Intuitively, one might think that the correct structure would produce the best fit for the evolution. As many of the variables are highly dependent, it is probable that we will fail to obtain the exactly correct structure between variables. In addition to the differences between obtained and real structure in the model, we are interested about the predictive value of the obtained model compared to the simulated evolution. If dependent variables which we use in the model to explain the evolution are correlated with real causes in the data set, the model might still be able to predict the evolution of emission in the chamber. Adding normally distributed random noise to the simulated evolution. This random noise was representing the possible measurement uncertainty.
Purpose was to see if the measurement uncertainty reduces both structural accuracy and fit accuracy.
Filtering and smoothing ( Figure S1) Applied filtering and smoothing to original variables of simulated dataset.
How filtering or smoothing would improve the fit and structure of the model? Is it reasonable to use filtering or smoothing of dataset before making a model?
Prior Information (Table S6 and S7) Using prior information about the edges possible in the model. Using both correct and incorrect prior information. Fraction of information from all correct prior information was used as a measure of information.
Does addition of prior information help model to get correct structure and good fit?
Accuracy tests for causal discovery algorithms have been performed earlier (Scheines and Ramsey, 2016;Singh et al., 2017). However, those tests are dependent on the used data set. In our case, variables that explain the evolution of some variable do not originate directly from the discovery algorithm. Our situation differs from the tests performed earlier as the variables from the algorithm are used to form possible interaction variables, not directly to explain the evolution.

S8
We measured the performance of the model in simulated data set by two ways. First is the accuracy of the model fit to the simulated data set: how well the model can capture simulated evolution and how well the model can predict the simulated evolution after fitted data set.
Second can be called as structural accuracy: how well the underlying causal structure of variables can be returned by the model.
For measuring accuracy of the model fit, we compared the evolution obtained from the model to measurements. Evolution was then compared to true evolution, not including the error added to the simulations, using Root Mean Squared Error (RMSE) for each time series. To equally weight each time series when calculating RMSE, each time series were scaled by dividing those with its standard deviation before calculating RMSE. In further text, we refer to this scaled version as RMSE.
In addition to the accuracy of the model, we also evaluated the predictive accuracy of the model.
We used the obtained coefficients from the model to predict further time steps of the evolution of the system. Then we compared the prediction to the same time steps from the real system and evaluate the accuracy of model prediction using RMSE. Prediction length was 30% of the simulated data set used to fit a model.
For measuring structural accuracy, we used adjacency precision (AP), adjacency recall (AR) (Scheines and Ramsey, 2016) and F-score (Singh et al., 2017). AP was defined as a fraction of correct edges in the model of all proposed edges. AR was defined as a fraction of correct edges in the model of all correct edges. F-score was defined based on AP and AR as In addition to F-score, we wanted to study whether incorrect predictors for variables were close to correct causes and whether the model could find a good replacement for each correct predictor that was not chosen for modeled structure. Correlation was used to measure closeness here. For each correct predictor we calculated correlation between it and each predictor in the model (for sameΔ( )). The maximum of these correlations was taken as the value for that S9 predictor. This results that if the correct predictor was in the model, correlation was 1. In the results section we have calculated the mean value of correlations for all the predictors.

S3 Results of simulation studies
As expected, the model fit to the data was better when the error assigned on it was lower. The effect of measurement error is reported for simulated data sets (Tables S4 and S5). The effect of uncertainty was small for F-score and correlation between the correct edges and edges in the model. This was expected, as the error makes the data set noisier. Furthermore, as the simulations also studied the effect of time resolution in measurements, in both data sets the optimal number of measurements during one experiment was related to the amount of error applied (Tables S4 and S5). For lower error, higher time resolution leads to better results when the real change between subsequent time points was dominating the change observed and if many observations about the evolution process were available. In case of high error, fewer observations from the phenomenon were preferred, as the signal to noise ratio was low and increasing the number of observations would make it even lower. However, the dependence between sample size and fraction of error was weak. This is still opposite of what is expected and could be due to a small sample size.
We found that in some cases, there were unstabilities in computations. These can be seen as large numbers of RMSE in the tables S4 and S5. These are resulted by some variable that gets large values in the differential equation system, resulting the system to be unstable. We haven't been studied the exact reason for the unstability. This could be due to the inaccurate estimation of the structure, i.e. some incorrect edge in the model results a dependence that is having "large" coefficient, and that differential equation would result system to be unstable. Other option is that the estimate itself is overestimated, i.e., due to low number of points or inaccuracies in the measured differences that are modeled.

S12
Filtering or smoothing of the time series representing measurements reduces the error, especially when measurement uncertainty was high ( Figure S1). These methods were also improving the accuracy of model prediction. To understand importance of applying smoothing or filtering method in a real data set, we need to understand whether the error in the real data set is large enough that applying filtering or smoothing method is reasonable. We assume the error related to the time series as the difference between these data sets. The standard deviation of the error was then compared to the standard deviation of the filtered data set to estimate the fraction of the error of the whole standard deviation in the time series.

Figure S1 Effect of filtering and smoothing to the goodness of fit parameters. Method 'Original' means that time series is used without filtering or smoothing. In automatic and manual filtering, filter have been applied by selecting length of the filtering window by cross-validation (automatic) or by manual selection. Window length in smoothing have been chosen by cross-validation. Variable 'Fraction of added uncertainty' is a standard deviation of random noise added to simulation data (standard deviation of each time series is adjusted to 1), see S1 for more information. Number of edges is the number of edges in the model, RMSE is Root Mean Squared Error, F-score is a measure of structural accuracy, based on number of correct and incorrect edges, see S1 for more information. Mean correlation is mean correlation between proposed and correct predictor variable.
There is some additional points that are outside of the figure and are due to some unstabilities in the fit. For RMSE, there is one larger value for smoothing, two larger values for manual filtering and two for original time series, all with fraction of added uncertainty of 1. For predicted RMSE, there is one larger value for smoothed, uncertainty 0.1, three for manually filtered, with uncertainties 0.3, 0.3, and 1, and one for automatically filtered, with uncertainty 0.5.

In Aerosol Mass Spectrometer (AMS) and Proton-Transfer-Reactor Time-of-Flight Mass
Spectrometer (PTR-ToF-MS) data sets and in coarse size bin (>300nm) of Scanning Mobility Particle Sizer (SMPS) measurements, the fraction of standard deviation of error was around 10-20% of the whole standard deviation. Variables measured by gas analyzers contained the least amount of error, approximately 1-5% of the standard deviation of filtered time series. However, because the fraction of error of some variables is high, applying of the filtering methods to real data set was reasonable.
Reasons why applying the filtering methods to the data set is not improving the fit before the error is relatively high could be assessed further but is not on the scope of this study. In addition to the fraction of error, the sample size is probably related to the necessity of the filtering method. These simulations for the larger data set were performed using the sample size of 100, which is the same sample size that was used in the shortest experiment.
Fourthly, prior information was closely related especially to the correctness of the obtained dependence structure between variables in the data set. In some situations, there might be large amounts of prior information from previous studies available, which might help to construct the structure based on prior knowledge. This means that the correctness of the structure is high, and the causal inference based on the structure and the observed data set has higher quality.

S14
From wood combustion emission experiments, we do not have much prior knowledge about the phenomenon. Even though there has been and currently is a vast amount of research about the oxidative aging of combustion emissions, the number of possible reactions occurring during experiments is very high and the reaction coefficients are therefore hard to estimate. Thus, the construction of the complete causal structure between variables might be out of reach based on low amount of prior information.
One question of interest is that how the incorrect edges in the structure are related to the missing or present correct edges. Based on the nature of the causal discovery algorithm one might think that the algorithm is not able to separate between causal and non-causal associations. Therefore, it might be that in the model the real cause is replaced with an indicative one. This decreases the accuracy of the structure, but not necessarily affect the fit or accuracy of prediction of the model. Especially if the dependence between a real and an indicative cause is strong. Decreased accuracy in the structure may produce incorrect interpretations of the causal effects.
We found that there is a small difference in both goodness of fit and structure accuracy parameters when the number of correct and incorrect edges in prior information is varied in the smaller, artificial data set. Larger amount of correct prior information led to slightly lower RMSE for the model and prediction (Table S6). Addition of correct prior information led to lower RMSE and higher F-score. For prediction, the information had not much effect. Incorrect prior information didn't affect the prediction accuracy, nor the F-score as would have been expected. F-score was even higher when there was more incorrect information available (Table   S7). This indicates that the prior information is not optimally used in the current version of the model and should be handled better in the future.

Table S6 Effect of correct prior information for the goodness of fit parameters of the model in smaller simulated dataset. Fractions of correct and incorrect information
about dependencies are X -> Y dependencies given as a prior information for causal discovery algorithm. Variable 'Fraction of added uncertainty' is a standard deviation of random noise added to simulation data (standard deviation of each time series is adjusted to 1). Number of edges is the number of edges in the model, RMSE is Root Mean Squared Error, F-score is a measure of structural accuracy, based on a number of correct and incorrect edges, see S1 for more information. Mean correlation is mean correlation between proposed and correct predictor variable. For each row, number of replications is 10, number of observations is 101, and fraction of added uncertainty is 0.5.

Table S7 Effect of incorrect prior information for the goodness of fit parameters of the model in smaller simulated dataset. Fractions of correct and incorrect information about dependencies are X -> Y dependencies given as a prior information for causal discovery algorithm.
Variable 'Fraction of added uncertainty' is a standard deviation of random noise added to simulation data (standard deviation of each time series is adjusted to 1). Number of edges is the number of edges in the model, RMSE is Root Mean Squared Error, F-score is a measure of structural accuracy, based on a number of correct and incorrect edges, see S1 for more information. Mean correlation is mean correlation between proposed and correct predictor variable. For each row, number of replications is 10, number of observations is 101, and fraction of added uncertainty is 0.5.