Evaluating the predictive ability of models |
Yamamura K (2016) Estimation of the predictive ability of ecological models.
Communications in Statistics - Simulation and Computation 45: 2122-2144
DOI: 10.1080/03610918.2014.889161.
Preprint version of author manuscript is available from here RD_criterion.pdf (574KB)
Final version of article is available from Taylor & Francis site
http://www.tandfonline.com/doi/full/10.1080/03610918.2014.889161
R function and SAS macro for calculating RD
The R function and SAS macro for calculating RD is available by the following link page. The R function is for generalized linear models (GLMs) including normal linear models. The glm function of R is used in this function. In contrast, SAS macro is for generalized linear mixed models (GLMMs) including normal linear mixed models and generalized linear models. Proc GLIMMIX, MIXED, and GENMOD are used in this macro.Model evaluation is more important than model selection
We can perform model selection by using several conventional criteria such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). However, these criteria do not indicate how good the model is; these criteria indicate only the relative goodness of models. The best model selected by these criteria will be usually a bad model if the quality and quantity of data is not good. We should evaluate the goodness of data set in its quality and quantity; the model evaluation is more important than model selection in a sense. If the goodness of data is insufficient, we should collect another set of data to improve the model. However, the conventional criteria such as AIC and BIC never give us information about the goodness of data, and hence we cannot judge whether we should collect more data to further improve the model or not.Proposal of RD criterion
To solve such a practical problem, I proposed a criterion RD that lies between 0 and 1. RD is an asymptotic estimate of the proportion of improvement in the predictive ability under a given error structure, where the predictive ability is defined by the expected logarithmic probability by which the next data set (2nd data set) occurs under a model constructed from the current data set (1st data set). That is, the predictive ability is defined by the expected logarithmic probability of the 2nd data set evaluated at the model constructed from the 1st data set. Appropriate choice of error structures is important in the calculation of RD. I illustrate examples of calculations of RD by using a small data set about the moth abundance.