Comparison of software tools for kinetic evaluation of chemical degradation data

Background For evaluating the fate of xenobiotics in the environment, a variety of degradation or environmental metabolism experiments are routinely conducted. The data generated in such experiments are evaluated by optimizing the parameters of kinetic models in a way that the model simulation fits the data. No comparison of the main software tools currently in use has been published to date. This article shows a comparison of numerical results as well as an overall, somewhat subjective comparison based on a scoring system using a set of criteria. The scoring was separately performed for two types of uses. Uses of type I are routine evaluations involving standard kinetic models and up to three metabolites in a single compartment. Evaluations involving non-standard model components, more than three metabolites or more than a single compartment belong to use type II. For use type I, usability is most important, while the flexibility of the model definition is most important for use type II. Results Test datasets were assembled that can be used to compare the numerical results for different software tools. These datasets can also be used to ensure that no unintended or erroneous behaviour is introduced in newer versions. In the comparison of numerical results, good agreement between the parameter estimates was observed for datasets with up to three metabolites. For the now unmaintained reference software DegKinManager/ModelMaker, and for OpenModel which is still under development, user options were identified that should be taken care of in order to obtain results that are as reliable as possible. Based on the scoring system mentioned above, the software tools gmkin, KinGUII and CAKE received the best scores for use type I. Out of the 15 software packages compared with respect to use type II, again gmkin and KinGUII were the first two, followed by the script based tool mkin, which is the technical basis for gmkin, and by OpenModel. Conclusions Based on the evaluation using the system of criteria mentioned above and the comparison of numerical results for the suite of test datasets, the software tools gmkin, KinGUII and CAKE are recommended for use type I, and gmkin and KinGUII for use type II. For users that prefer to work with scripts instead of graphical user interfaces, mkin is recommended. For future software evaluations, it is recommended to include a measure for the total time that a typical user needs for a kinetic evaluation into the scoring scheme. It is the hope of the authors that the publication of test data, source code and overall rankings foster the evolution of useful and reliable software in the field. Electronic supplementary material The online version of this article (10.1186/s12302-018-0145-1) contains supplementary material, which is available to authorized users.


Introduction
This document contains annotated results of the evaluation of test datasets assembled in the course of two projects carried out by the author for the German Environment Agency (UBA). A first version of the results of these evaluations using different software tools has been performed in 2014 in Project No 27452. An update of this comparison with the current versions has been commissioned to JR in Project No 92570. This document is being published as supporting information to a manuscript about a more general software comparison (Ranke et al., 2018). Further updates and/or extensions of this document may be published elsewhere.

General remarks
The datasets were evaluated with DegKin For the integration of kinetic models with ModelMaker, Runge-Kutta integration with 200 output points was used, the integration accuracy was set to 0.001 and constant error scaling was specified. For the automatic steplength calculation with OpenModel, an error factor of at least 1e-5 was specified.
For the optimisation, the settings that were predefined in the model files supplied by DegKin manager were generally not changed. For the termination criterion, the value for the fractional change was 0.01 in some model files and 0.001 in others. In OpenModel, the change threshold for the convergence was set to 1e-5 and the maximum number of iterations was set to 200.
No weighting methods were enabled. For parameter starting values, the values predefined by the software packages were used. If not available, the values given in Table S1 were used.  (FOCUS, 2006(FOCUS, , 2014. These results were discussed in the course of the validation of the kinfit package, which is a predecessor of the mkin package, and the median of the parameters obtained with the different packages was calculated. As no χ 2 error level values were reported in the FOCUS guidance, these values were calculated at the time with KinGUI version 1 for the kinfit package vignette (Ranke, 2011).
In this section, the median parameter values from the FOCUS guidance and the χ 2 error level values calculated with KinGUI version 1 are compared to the values obtained with DegKin Manager, KinGUII, CAKE, OpenModel and mkin.     As FOCUS dataset A is well described by the two-parameter SFO model, the FOMC model with its three parameters is already overparameterised. This leads to a lack of convergence of the FOMC fit to this dataset in mkin. Also, the covariance matrix used for describing parameter uncertainty can not be estimated by CAKE, due to the large correlation of parameters alpha and beta in this fit (Table S5).

FOCUS A
The large relative deviations between the tools found for the alpha and beta parameters for this dataset also reflect this overparameterisation, while the resulting DT50 and DT90 values show good agreement (Table S6).
The DFOP model (Table S7) and the HS model (Table S9) are also overparameterised. With DegKin Manger, no results could be obtained because the fits terminated with the error message "singular curvature matrix encountered".
Results for test datasets from the FOCUS guidance 9/ 40 Test dataset results jrwb-116 Draft from 10 April 2018

FOCUS C
For this dataset the results obtained with the different tools were very similar, with the exception of the DT90 value obtained with DegKin for the Hockey Stick model.     Deviations between the tools were less than 1% for the FOMC model.
Where a comparison with the reference was possible, deviations between the tools were less than 1% for the DFOP model.  Deviations between the tools were less than 1% for the HS model, with the exception of the DT90 calculated by DegKin Manager for the DT90 value, which appears to be erroneous in this case. Calculating the DT90 from the parameters found by DegKin Manager using the formula from the FOCUS guidance yields 26.1.

FOCUS D
Results for dataset FOCUS D are shown for the SFO-SFO model (SFO used for parent and metabolite).      Differences between the results and the median which was used as reference here are shown in Table S25 and are smaller than 1%, with the exception of the χ 2 error level for metabolite m1, where DegKin Manager takes the residual at t=0 into account in the calculation which is against the FOCUS recommendation. KinGUII uses the sampling at time 0 into account for the degrees of freedom, because it has a residue greater than zero, which also not according to the FOCUS recommendation. Current versions of CAKE and mkin handle this case according to the FOCUS guidance. For KinGUII, this can be seen in the source code of the underlying KineticEval package current at the time of this writing (link to source code at github). Here, only values at time zero that are zero are filtered out, as in mkin versions before version 0.9-33 which introduced the code currently used in mkin for this purpose (link to source code at github). However, also values different from zero occurring at time zero should be filtered out if the respective initial value is fixed to zero (FOCUS, 2014, p. 90,166).

Results for synthetic datasets
A graphical representation of the models used for the generation of the synthetic datasets is shown in Figure 1.  For the evaluations of the synthetic datasets, confidence intervals are reported in the following tables for the parameter estimates from the parent only evaluations using the SFO, FOMC, DFOP and HS models. This makes it possible to check if the confidence intervals include the parameters that were used in the generation of the data. The latter are shown in the column "Input" in the result tables.
For the coupled fits, confidence intervals obtained with mkin are shown.