Evaluating plant residue decline data with KinGUII and TREC: results from case studies involving also non-SFO kinetic models

Foliage residue decline data are used to refine the risk assessment for herbivorous birds and mammals foraging in fields treated with plant protection products. For evaluation, current EFSA guidance has a clear focus on single-first order (SFO) kinetic models. However, other kinetic models are well established in other areas of environmental risk evaluations (e.g., soil residue assessment), and easy-to-use calculation tools have become available now. We provide case studies with 6 fungicides how such evaluations can be conducted with two of these tools (KinGUII and TREC) that have been developed by Bayer. SFO kinetics provided the best fits only for 13 of 36 residue decline studies conducted in a standardized design under field conditions. Biphasic models (double first order in parallel, hockey stick) were often superior and sometimes more conservative for risk assessment. The additional effort is manageable when using software such as KinGUII and TREC, and appears justified by the more reliable outcome of the evaluations. Further research would be useful to better assess the extent to which non-SFO better fits foliage residue decline, but our study suggests that it may be a significant proportion. Therefore, we encourage the use of biphasic models in the regulatory risk assessment for herbivorous birds and mammals, in the ongoing revision of the European Food Safety Authority (EFSA) guidance document from 2009.


Background
The regulatory risk assessment for birds and wild mammals which may be exposed in fields treated with plant protection products in the European Union (EU) is conducted according to guidance from the European Food Safety Authority EFSA [1]. A key element of this guidance is the calculation of residues that animals would ingest with their food when they forage on plant foliage, seeds or invertebrates from treated fields. Typically, the exposure assessment for herbivorous birds and mammals drives the overall risk characterization for terrestrial vertebrates because of the high residues for foliage (due to a high surface to volume ratio) and high food intake rates (due to the low usable energy content of leaves), compared to seeds or invertebrates.
For spray applications, these residues are estimated with a standard equation that employs the application rate (in kg active substance per hectare), the so-called RUD (residue-per unit dose, specific for different food items), and two factors related to the time course of the residue concentrations: the multiple-application factor MAF (to account for residue accumulation) and the timeweighted average factor f TWA (to account for residue dissipation over a certain time window, usually 21 days).
In the initial ("Tier1") risk assessment, both MAF and 21d-f TWA are calculated with a default dissipation halflive (DT50) of 10 days. This "generic" DT50 of 10 days on foliage is appropriately conservative for a Tier 1 assessment since most real DT50 values are clearly lower [2,3]. These real DT50 values are usually generated in field residue studies, conducted either as part of the standard residue trials for MRL (maximum residue limit) setting or in specific residue decline trials conducted to inform a refined bird and mammal risk assessment.
Not much guidance is given in EFSA [1] how to conduct and evaluate these residue decline trials for ecotoxicological risk assessment purposes, other than the equations in its appendix H for calculation of MAF and f TWA with SFO kinetics. A more recent EFSA publication on discussions from the EFSA expert meetings EFSA [4] provides much more guidance on how to report and evaluate field residue trials for ecotoxicological purposes, however, there is still not much more information on the kinetic models to use for the evaluation of the DT50 than the reference to SFO and the remark that some experts also considered the option to calculate a surrogate DT50 from the DT90/3.32 that is determined with the firstorder multi-compartment (FOMC) model.
In other areas of the environmental risk assessment for plant protection products, more elaborate guidance and techniques for kinetic evaluation of residue decline data have been generated over the last years, one of the most prominent being FOCUS [5]. In this guidance four models are proposed, one being the well-known SFO and three other models as given below to describe non-SFO behaviour such as bi-phasic decline curves. In this guidance, the SFO model is preferred mainly for two reasons. First, it is parsimonious and has only two parameters while the other models have three or four. Second, because of an operational reason: Almost all regulatory exposure models used for soil or water exposure assessment (e.g. FOCUS-PEARL) which employ these kinetic parameters can handle only SFO kinetics. For this reason, rules were developed to derive SFO DT50 surrogates from bi-phasic kinetic parameters such as mentioned above for FOMC kinetics. Criteria for the selection of a specific model were (i) visual acceptable fit, (ii) statistical measures for goodness of fit, and (iii) sufficiently small parameter uncertainty. Especially the latter criterion was introduced to safeguard a certain robustness of predictions beyond the experimental time period.
Parts of the FOCUS [5] guidance have been employed by EFSA [4], amongst others the preference for SFO. However, the main reason to prefer SFO by FOCUS [5], i.e. that the relevant fate models can only deal with SFO, does not apply to residue decline in plant foliage or the calculation of MAF and f TWA values. More complex kinetic models than SFO have also already been applied to plant residue decline studies in the consumer risk assessment context [6,7] and in the honeybee risk assessment [8], so it is unclear why the evaluation of plant foliage residues for bird and wild mammals would only be conducted with SFO models when suitable calculation tools are available.
In our study, we, therefore, aimed to conduct case studies on evaluation of foliage residue decline data with the 4 established kinetic standard models SFO, FOMC, DFOP (double-first-order-in parallel) and HS (hockey-stick) which are used for regulatory exposure assessment in soil, groundwater or surface waters according to FOCUS [5] guidance.
Even if the EFSA documents [1,4] do not provide any explanation for the reluctance to use non-SFO models also for plant residue decline data in bird and mammal risk assessment, it appears likely that the lack of appropriate calculation tools is part of the answer. Timeweighted average (TWA) calculations in the area of birds and mammals are typically needed for multiple applications with a moving time window, and this is not trivial for non-SFO kinetics. Until now, there do not seem to be appropriate tools published and accepted for regulatory uses in the EU.
Bayer has developed such tools and made them publicly available: KinGUII and TREC. KinGUII is an R-based tool to analyze residue decline data for complete metabolic pathways with 4 kinetic models, and generates the optimum fit parameter (kinetic parameter) for the decline data at hand. It is a follow-up version of KinGUI which was based on the mathematical software MAT-LAB. KinGUII has been evaluated and recommended in systematic reviews of available calculation tools [9,10]. KinGUII as well as its predecessor version have been frequently used for kinetic evaluations submitted for pesticide authorisation e.g. according to regulation 1107/2009 in Europe.
TREC is a new Microsoft Excel©-based tool to employ these kinetic fit parameters to any new exposure scenario including multiple applications, and generates MAF and TWA residues [11].
Here, we use KinGUII to analyse residue decline data measured within 10-14 days after a single application with SFO, DFOP, FOMC and HS kinetic models, and then we feed TREC with the kinetic parameter from KinGUII to predict the residue time course and TWA concentrations for a multi-application scenario (2 repeated applications with a 10 day interval).
We demonstrate the use of KinGUII and TREC with the data from residue decline trials conducted with 6 fungicidal active substances. There is no specific reason for why selecting these compounds other than that the available data set is particularly suitable for this exercise. Fungicides are often used in multiple applications with short intervals so that a moving time window calculator is needed to identify the worst case 21-d window.
The data set of 36 trials used for this exercise was conducted with a standardized design following OECD TG 509 [12], where a single spray application was made under field conditions on young cereals (wheat or barley) in spring at BBCH stages [13] close to BBCH 30, a surrogate for grassy ground vegetation that may be eaten by herbivorous birds and mammals, with 7 foliage samplings following the application.
In our evaluations, we employed all 4 kinetic FOCUS models in KinGUII (SFO, DFOP, FOMC and HS) to each residue decline data set, and then fed the KinGUII output into TREC to calculate the residue time course with each of the models for an application pattern of 2 applications with a 10d interval. With that approach, we could directly compare the fit quality of each trial under standardized conditions, and also determine the predicted residue concentrations from the case study application pattern.
We aimed to answer the following questions in our evaluations: 1. How suitable is the standard design in the trials employed in our evaluations? Do we get at least one model with an acceptable fit? 2. Which kinetic model gives the best fit to each data set ? Is it justified to limit the accepted kinetic models to SFO ? How often is a non-SFO fit better than SFO? Is it conservative to focus on SFO? 3. How conservative is the option to calculate a surrogate SFO-DT50 as FOMC-DT90 divided by 3.32 when compared to the best-fit model predictions? 4 Is it necessary to have 7 samplings per trial, or would a reduced sampling design with e.g. 3 well-spaced sampling times also generate acceptable predictions for the residues in our case study application scenario?
With these evaluations, we hope to inform the discussions and development of the revised EFSA GD for birds and mammals, particularly with regard to embracing non-SFO models for bird and wild mammal risk assessment, and also present 2 candidate tools (KinGUII and TREC) that would be suited for these calculations.

Data
Residue decline data were obtained from field studies conducted in the EU. In each of the studies, a single spray application was made with a fungicidal product on young cereal plants.
In each trial, 7 samplings were normally taken on days 0, 1, 2, 3, 5, 7 and 10 after application. However, 2 trials are included which were sampled at days 0, 1, 3, 5, 7, 10 and 14, and some trials included samplings on day 4 or 6 instead of day 5 or similar. Nevertheless, no trials are included which lack samplings on day 0, 3 and 10, because day 0 is necessary for a proper decline curve, and days 3 and 10 are necessary for a specific side-investigation where we looked at the impact on TWA calculations of having trials with only 3 but suitably spaced sampling time points (here: days 0, 3 and 10).
Many of these trials were conducted with mixed formulations, but specific care was taken that each trial is included only once in the database (i.e., trials that were conducted with a mixture of 2 compounds a and b were either used in the data set for compound a or compound b, but never twice). Otherwise, the assignment of the trials was arranged in such a way that the data set for each compound comprised data from both European residue zones (North and South), where possible.
All trials were conducted according to regulatory standard guidelines (OECD TG 509), and under Good laboratory Practice (GLP). Validated residue analysis methods were employed with certified analytical standards, and most of the trials were already submitted and reviewed by regulatory authorities in the EU.
Fit quality assessment: Each fit was quantitatively assessed based on the Chi 2 -value, which is a metric to calculate the goodness of fit. The Chi 2 -value mainly describes the average deviation between measured and fitted values relative to the average of the measured values. According to FOCUS [5], good fits should provide Chi 2 -values ≤ 15% for laboratory soil residue dissipation studies, under field conditions Chi 2 ≤ 25% may be acceptable. Additionally, the visual fit of the curve was scored (good fit = 1, acceptable fit = 2, bad fit = 3) for the residues themselves, and for the residuals as proposed by EFSA [4]. These scores aim to express in number how well the curve visually appeared to capture the observed decline residue pattern. Essential features assessed are whether deviations of the fitted curve from the measurements are of random or of systematic nature and whether absolute deviations vary more or less at which parts of the decline curve.
Parameter uncertainty was not considered as a criterion because this factor becomes more relevant when the model predictions are made beyond the conditions of the experiment from which the parameter was derived. This was not the case here because typically most of the residues had dissipated within the experimental period of 10 days. For the simulations in our case studies (2 applications with 10 day interval and a moving time window of 21 days) the determination of a TWA does not involve much extrapolation because the window usually ends on day 11 after the second application.
These 3 fit quality descriptors were combined in one single value per trial and kinetic which we call "fit quality" (FQ), calculated as the product of Chi 2 × visual fit score (fit) × residual fit score (res). For instance, a fit with a Chi 2 -value of 7.3%, a good visual fit (fit score = 1) and acceptable distribution of the residuals (res score = 2) gets an FQ value of 7.3 x 1 x 2 = 14.6. The kinetic model that provides the lowest FQ value for a given trial is called the best-fit model for this trial.
The combination of the visual fit score (fit) × residual fit score (res) puts more weight on the visual fit assessment than on the statistical goodness of fit (Chi 2 ), because the visual assessment is the most important measure to distinguish between SFO and non-SFO kinetics. Assessing the visual fit both based on the normal fits and based on residual fits, makes the decision on the score more robust, because each of the views provides different aspects of fit quality. For instance, the overall fit, and its shape (for comparing the different kinetic types), is best assessed in the normal plot. The number of consecutive points on the same side of, and their distance from, the zero line and thus the pattern and extent of deviations is seen at first glance in the residual plot. The use of 2 visual scores (fit and res) often accelerates and facilitates the visual assessment of a fit, and generates the intended predominance in a systematic matter. Chi 2 then mainly serves to select among fits of equal visual quality rating.
Additionally, KinGUII was also employed with a reduced data set, consisting only of 3 measurements (days 0, 3 and 10) for each trial, termed SFO3. These reduced data sets were only evaluated with SFO kinetics, as the degrees of freedom with only 3 data points are not sufficient for non-SFO models. This reduced data was used to assess the impact of a low-quality data set on the residue predictions in the risk assessment compared to the best-fit model for the full data set. This reduced data set provided the SFO3-DT50.
Finally, a surrogate SFO-DT50 was generated by dividing the FOMC-DT90 by 3.32. Again, the residue prediction with this surrogate SFO-DT50 was compared against the best fit model, to assess the level of conservativeness associated with such procedure. This approach is called FOMC90 in this article.

Evaluations with TREC
TREC is an Excel ® -based calculator that allows residue decline simulations for the 4 kinetic models in KinGUII (SFO, DFOP, FOMC and HS) for any agricultural use scenario (multiple applications with varying inter-application intervals and use rates), and provides multi-application factors (MAF) and TWA-factors (f TWA ) for use in bird and mammals risk assessments according to EFSA [1]. The TREC tool was presented at the SETAC conference in Helsinki 2019 [11] and is included in the supporting information files of this article.
The kinetic parameter derived by KinGUII-analysis for the 36 field residue decline data sets were employed in TREC, separated per kinetic model (i.e. one file with all 36 SFO-evaluations, one file with all 36 DFOP evaluations etc.). To run TREC, the same application scenario was applied to all compounds (2 applications with a 10 day interval) which is not untypical for fungicides like the 6 model compounds. The mean RUD of 54.2 for grass and cereals was selected (EFSA [1]), so that the RUD category in TREC matched with the matrix from the decline trials. The selected application rate of 1 kg a.s./ha was, however, arbitrary and just chosen for sake of simplicity, because for our case study calculations it only mattered that the same scenario is simulated for all compounds. As we limit our evaluation of the TREC results to MAF and f TWA , the settings of application rate and RUD are irrelevant, since they do only influence the absolute residue concentrations and not the dissipation; application rate and RUD are only needed for TREC running properly.
From the TREC output, the MAF and the 21-d moving time window TWA factor (21d f TWA ) were extracted and multiplied to compute the trial-and kinetic-specific 21-d residues (21-d RES) which was the key metric for the following comparisons.

Results
Fit quality FQ was calculated as Chi 2 × fit × res, i.e. the smaller FQ, the better the fit. In 5 cases, FQ was identical in DFOP and HS models so that the total number of best FQ was 36 + 5=41 (in these 5 cases the model delivering the higher 21-d RES was selected as a best fit model for later evaluations) ( Table 1 and Fig. 1).
For 13 out of the 36 trials, SFO model kinetics provided the lowest FQ. Thus, non-SFO fits were better than SFO fits in the majority of cases, with HS most frequently providing the best fit. Overall, FOMC was least often the best fit model. Slight preference of SFO by allowing for Chi 2 being 2 percentage-points higher than for the best model, and requiring the same scores for visual fit and residual fit as the best model, would increase the number of trials associated with the SFO model to 17 (one of these cases is trial 18-2950-02). So there seems to be a number of borderline cases but still half of the trials would remain non-SFO (Table 2).
SFO model kinetics provided the best FQ for 6 trials with fluopyram, 3 trials with trifloxystrobin and for one trial only with each of the other 4 compounds. Thus, the best-fit decline models for 4 out of 6 compounds were dominated by non-SFO kinetics ( Table 3).
The values for the 21-d RES (calculated as MAF × 21d f TWA ) are displayed in Table 4, with the best-fit model highlighted in italics. The maximum ratio between the highest and the lowest 21-d RES with the 4 models was 2.15 for a single trial (17-2950-02), and the mean ratios per compound ranged from 1.02 (fluopyram) to 1.52 (spiroxamine). The extreme ratios for max/min and SFO/best fit 21-d RES are for the datasets where SFO gave a poor fit in the kinetic evaluation (visual fit and/ or residual score of 3) ( Table 4).
The comparison of 21-d RES prediction with surrogate DT50 (FOMC DT90/3.32) against 21-d RES with the best fit model is presented in Table 5. A significant overestimation with the FOMC90-DT50 was observed for the 5 compounds where the best fit models are not predominantly SFO (Table 5).  1 Examples for fit quality scores (trial 16-2958-01). Left panel: SFO with score 3 for a bad fit (all points missed from day 3) and score 3 for large and systematic residuals (y-axis scales > 10% of the y-axis for the fit because of relatively large residuals, more than 3 consecutive data points on the same side of the x-axis). Right panel: HS with score 1 for a good if not excellent fit (most of the points on the curve) and score 1 for good residuals (y-axis scales < 10% of the y-axis for the fit because of relatively small residuals, not more than 2 consecutive data points on the same side of the x-axis)  We also investigated the difference when using the SFO3-DT50 (calculated with KinGUII after removal of all data points except days 0, 3 and 10) for the TREC calculations instead of the best fit model. Accepting such thin data would be of regulatory concern if it leads to a significant overestimation of the true dissipation rate, and in turn an underestimation of 21-d RES as the exposure concentrations for risk assessment. For the 6 case study compounds at hand, the mean 21-d RES estimate with the best fit model was very similar (within ± 10%) to the mean 21-d RES estimate with SFO3 for 5 compounds, and for the remaining compound (fluopyram) the mean deviation was only 16% and that into the direction of an underestimation of the true dissipation (i.e., of no regulatory concern) (Table 6).

Discussion
Our first question we wanted to answer with the case studies was whether the standard design employed in the 36 trials was appropriate, i.e. providing for a high chance to generate the data needed for a successful kinetic Table 4  In our evaluation we have not used an exclusion criterion, rather aimed to "punish" bad fits by the multiplication of Chi 2 × fit score × residuals score which results in the fit quality parameter FQ. Since the best possible score for fit and residuals is 1, the best possible FQ-value equals the Chi 2 -result. Following FOCUS [5] guidance, a threshold of 15% for Chi 2 is used for laboratory trials, and 25% would still be good enough for field trials. Thus, FQ values of ≤ 25 would indicate overall good fit of the model to the data. FQ values of up to 100 would result for trials with acceptable visual scores (2) for the fit and the residuals, combined with a Chi 2 -value of up to 25%.

21-d RES (MAF × 21d-fTWA) for a simulation of 2 applications with 10 day interval with TREC using the kinetic parameter from KinGUII, and ratio between max and min with the 4 models (for best-fit model highlighted in italics)
Hence, a possible target for a standard design could be to achieve FQ ≤ 100. In our data set of the 36 trials, all Table 5  FQ values for the best fit were below 100 except for trial 18-2954-02 with FQ = 104 and thus very slightly above this target (Table 1). That means, our target was achieved in 97% of all 36 evaluated trials, indicating that the sampling scheme of days 0, 1, 2, 3, 5, 7 and 10 may be a suitable balance of measurement efforts and the reliability of the outcome. However, the comparison with the results with the truncated datasets (SFO3: only sampling data retained for days 0, 3 and 10) indicated that even such a limited design would often provide acceptable estimates for the residue dissipation rates (Table 6). Compared to the 21-d RES from the best fit model for each trial, the 21-d RES for the SFO3 evaluations were on average within ± 10%, except for one compound (fluopyram) where the mean difference was 16% (and that on the side of an underestimation of the dissipation rate, i.e. not more critical and thus of no regulatory concern). With only 3 data points, the uncertainty of the fit is potentially higher than with more data points. For our data set, the number of trials where 21-d RES for SFO3 is < 95% of the 21-d RES of the best fit is 12 of 36 (33%). This is comparable to the number of trials (14) where 21-d RES for SFO7 is < 95% of the best fit. However, the variation of the mean 21-d RES for SFO3 (CoV 30.6%) is slightly higher than the variation with the best fit 21-d RES (CoV 24.6%).

21-d RES (MAF × 21d-f TWA ) for a simulation of 2 applications with 10 day interval with TREC using the surrogate DT50 (FOMC-DT90/3.32), and comparison to best fit 21-d RES
EFSA [4] had stated that the number of samplings should never be < 4 for an acceptable residue dissipation trial. Our assessment here would indicate, however, that the error from accepting SFO evaluation of trials with only 3 samplings is often small, at least if the 3 sampling dates are suitably arranged. Taking into account that a typical foliage residue SFO-DT50 was found at about 3 days [3], a good coverage of the dissipation can be expected with samplings on day 0 (100%), day 3 (50%) and day 10 (ca. 10%), and that is probably the reason why the 21-d RES results with SFO3 are so similar to the 21-d RES from the respective best-fit model with the complete data (all 7 samplings) ( Table 6). Another reason stems from the fact that 21-d RES is an integrated quantity which smooths deviations of single measuring points and is inherently more robust. Thus, regulatory acceptance of data-sets with a low but well-timed number of samplings could be considered case-by-case, for instance when the initially measured residues have declined to about 10-20% until the last sampling and the resulting DT50 is no obvious outlier.
On the other hand, our data evaluation suggests that using a surrogate DT50 from dividing the FOMC-DT90 by 3.32 (as mentioned in EFSA [4]) is often a significant and unnecessary underestimation of the dissipation rate as determined in the respective best fit model ( Table 5). This surrogate DT50 can also be calculated from the DT90 in DFOP or HS fits that are generated with KinGUII, if their fit is better than FOMC (which is often the case in our data set), and this would be appropriate where a tool like TREC is not available or in use. However, when there is anyway already an FOMC model fitted to the data (as would be needed to determine an FOMC-DT90), then it would appear simple enough and appropriate to directly involve the real FOMC parameter alpha and beta in a tool like TREC instead of a surrogate SFO-DT50.
Justification to include all 4 kinetic KinGUII models in the evaluation instead of limiting the assessment to SFO can also be deduced from the comparison of the 21-d RES with SFO to the 21-d RES with the respective best fit model (Table 4): in ca. 40% of all cases (14 of 36), the best fit 21-d RES with SFO is smaller than with the respective best fit (i.e., in these cases SFO underestimates the MAF and TWA). In general, the difference is not large (on average 7% over all trials), but for a specific compound the difference can be larger (e.g., about 40% for spiroxamine), particularly if the SFO fits are visually bad or borderline and thus difficult to accept. For individual trials, the difference was over 50% (16-2958-01, 17-2950-02). Especially for these trials with bad or borderline SFO fits, bi-phasic modelling is the best way to use the information from these trials. Thus, the inclusion of non-SFO kinetics allows more realistic and robust DT50 determinations, and can lead to more protective refined risk assessments.
In their literature review, Fantke and Juraske [6] concluded that residue dissipation would generally be well described by single-first-order kinetics, which appears to contradict our findings. However, they could calculate non-SFO fits only for a part of the studies in their database (due to lack of detailed residue concentrations reported in many of the original publications), and apparently applied a criterion of factor 2 as threshold for differences between DT50s. In view of the large variability in the data set, this may be a reasonable approach, but in a regulatory context a factor of 2 may often be relevant for decision making. Furthermore, the apparent DT50 in non-SFO kinetics can only inform about the time to the dissipation of the first 50% and tells very little about the dissipation rates of the second half of the residues, so it is not surprising that they did not see the better fit of the non-SFO kinetics as in our case studies. However, also Fantke and Juraske [6] acknowledged that there are residue decline data that are significantly better described with other than SFO kinetics, and suggest to check different kinetic models for their fit to the data in future experimental studies. There may be also reasons for non-SFO behaviour from a theoretical point of view. We consider here the total plant residues which is the sum of residues on the plant surface and in different plant tissues. These residues may be subjected simultaneously to different dissipation processes depending on their residence. Residues in the plant may be redistributed, degraded, diluted by growth or recharged by uptake from the plant surface. Residues on the plant surface may dissipate by photolysis, volatilisation, wash-off or uptake into the plant. Assuming that the single processes could be described by SFO the multiple overlay of this processes with different individual rate constants can easily produce an overall non-SFO behaviour.
Thus, there are overall good reasons to expand the range of kinetic models over the standard SFO and include at least the 3 additional models used for regulatory environmental exposure assessment (DFOP, FOMC, HS) also for evaluating residue decline trials in the ecotoxicology area, at least for foliage residues as the key element of the herbivorous bird and mammal assessment, which is typically the driver behind the need for more realistic exposure assessments. Appropriate tools for this purpose are now available with KinGUII and TREC, and the outcome of our case study evaluations confirms that this is possible without undue extra-efforts and that it leads to more robust refinements in the risk assessment. We would, therefore, recommend considering these suggestions in the ongoing revision of the EFSA GD for birds and mammals.
Interestingly, all our 6 case study compounds are included in the residue data compilation and evaluation of Fantke et al. [15] (3.17 − 3.87) for propineb. Except for spiroxamine the modelled DT50s with their confidence ranges are similar to our results, but for spiroxamine their modelled dissipation appears much slower than measured in our trials. This may be related to the pronounced non-SFO profile of spiroxamine in our case study trials, which might be less well captured in the model fits used by [14]. Additionally, their modelled DT50 of spiroxamine is based also on trials with other plant material than the cereal foliage in our case studies, where the dissipation rate may be different. Fantke et al. [15] have also compared and discussed the range of factors behind the variability between compounds, crops and trials, and conclude that "there is more than one process contributing to overall dissipation from plants and that these processes go in a counter-direction". For that discussion, the reader is directed to their paper.
In any case, it should be noted that the purpose of our evaluations was not to establish a regulatory DT50 for the 6 compounds but to evaluate a set of trials with a standard design. That meant that we had to exclude some additional other residue decline studies which are available but in unequal numbers of trials, and partly with deviating sampling schemes. Here we decided to work with a deliberately reduced but standard data set, whilst for a DT50 proposal for regulatory risk assessments all available data per compound should be considered. Therefore, the results of our case studies are not to be used directly in regulatory assessments without taking also into account the results of the other residue decline trials which are not included here. However, we would not expect large changes in the conclusions if we had evaluated all currently available studies for our 6 case study compounds. Overall, the rapid dissipation rates which we found here in cereal foliage are in very good agreement with the findings by Ebeling and Wang [3] for a variety of leafy plant matrices.
Based on these findings, short DT50 values may be typical for foliage residues in ground vegetation of relevance for exposure of herbivorous birds and mammals on treated fields. We used an application interval of 10 days which is in our experience quite typical for fungicides that need to be repeatedly applied to maintain the efficacy (actually, our short DT50s may explain why they need to be applied in relatively short intervals). Certainly, there are cases with more applications of fungicides than 2, but 2 is quite typical for the compounds which we assessed, and too frequent applications also sometimes pose problems with resistance development. Where there is the need to assess many applications with short intervals that fall into the 21-d time window, then the use of TREC is even more attractive because the work saved with that automatic calculation tool may be even more considerable. The 21-d time window for bird and mammal risk assessment is a convention with no explicit justification, however, this duration appears to fit the duration of key reproductive phases in the toxicity studies that generate the risk assessment endpoint, like the embryonic phase for the avian test species (Bobwhite quail, Mallard duck), or the gestation and lactation phases in the rat reproduction study which is typically used for wild mammal risk assessment.
We developed the approach of calculating fit quality scores as product of Chi 2 × fit score × res score specifically for this article and found it worked well and was reasonably effective: the visual fit helps to detect a biphasic nature, and the residual plot helps to assess the amount of scatter, but not only as an average number (as in the Chi 2 value) but also with regard to its location on the curve, the systematicity of the scatter, and the relative distance to the straight line (i.e. the "weight" of the scatter). For our exercise, we feel quite comfortable not to apply strict triggers for acceptability of a trial, rather punish bad fits so that the best fit is (relatively) easy to identify. The decision if that best fit is good enough in a regulatory context depends on that context, e.g. the level of conservativeness required or the overall weight of evidences under consideration.
In our opinion, the visual fit scores should be more important than other criteria like parameter uncertainty, unless you need significant temporal extrapolation (as for example in FOCUS groundwater assessment where comparatively low residues matter if they persist over a long time).
Our focus is primarily on the use of foliar residue decline data in the risk assessment for herbivorous birds and mammals, where the time window is short, and residues declined below 10-20% of the peak are usually not of concern in these risk assessments. Furthermore, the vegetation on arable fields is regularly removed by harvest, mowing, plowing and other measures, so that longterm kinetics in foliage are of much lower relevance than long-term kinetics in soil. Therefore, we did not incorporate parameter uncertainty in our evaluation. Sources of prediction uncertainty like a representation of variable environmental conditions would in our view be best addressed by a sufficient number of trials conducted under contrasting but relevant conditions.
The primary purpose of our paper is to explore how available new calculation tools could be used to evaluate plant residue dissipation kinetics in a regulatory context. We found that the additional application of non-SFO kinetics certainly increases the workload for the evaluation, but not very much when using KinGUII and TREC as calculation tools. Further research would be useful to better assess the extent to which non-SFO better fits foliage residue decline, but our limited explorations suggest that it may be a significant proportion. Therefore, we would like to encourage the use of non-SFO kinetic models in the regulatory risk assessment for herbivorous birds and mammals, and to provide detailed related guidance for that in the ongoing revision of the EFSA GD (2009).

Conclusions
The standardized design in the 36 residue decline trials with the 6 fungicides allowed an acceptable kinetic fit with KinGUII for all cases. Best fits with SFO were obtained only for 13 cases, thus non-SFO kinetics clearly dominated. This non-SFO pattern was only visible because of the high initial sampling frequency in our trials. Removing all data points except for sampling dates 0, 3 and 10 allowed only SFO-DT50 calculation but showed that even small data sets may be informative and without major impacts on the level, or the variability, of timeweighted average residues calculated with TREC for the tested case studies. The biphasic models DFOP and HS most often provided the best fit in KinGUII, and sometimes also more conservative exposure predictions with TREC. Therefore, we encourage the adoption of biphasic models in the regulatory exposure and risk assessment for herbivorous birds and mammals, in the ongoing revision of the EFSA guidance document from 2009.