Fig. 1

a Overall reliability categorization results using the Klimisch (n = 121) and the CRED (n = 104) evaluation methods. b Study-specific reliability categories assigned using the Klimisch and CRED evaluation methods for studies A–H. Significant differences were found for studies D and E (**p ≤ 0.01) and study G (*p ≤ 0.05) using the exact (permutation) version of the Chi-square test in R [38]