Modeling the Number of Research Papers Produced by Graduate Students Using Zero-Inflated Models

  • Zaihra Tasneem


The objective of this case study is to discuss a step by step approach in modeling zero inflated over dispersed counts using data on the number of research papers produced by a group of biochemistry students. The model can be further used to study factors associated with differences in productivity of students within the PhD (Biochemistry) stream. We fit a Zero Inflated Negative Binomial (ZINB) regression model in order to predict the number of articles produced during the last three years of PhD from factors indicating the gender of the student, marital status, the number of children aged five or younger and the number of articles produced by a PhD mentor during the last three years. The dispersion parameter is found to be significantly different from zero, suggesting that the counts are over dispersed, and that a Negative Binomial (NB) model is more appropriate than a Poisson model. Vuong’s test further suggests that our zero-inflated model is a significant improvement over a standard NB model. Thus, the ZINB model is a clear winner in terms of parsimony and goodness of fit for the data. Based on our model, we find significant disadvantages for females and scientists with children under five and a large positive effect of the number of publications by the mentor. The presentation is accessible to readers with an intermediate level of statistics.