Representing H1 with flexible maxima, and a comparison of the Rouder et al (2009) and Dienes (2008) calculators

A Bayes factor compares two models of the world. The important thing to know about a Bayes factor is therefore whether the two models were ones we were scientifically interested in comparing. So the question is, does the representation of H1 match the predictions of relevant theory (theory together with other supporting knowledge, such as effect sizes from experiments testing such theory)? For example, if based on past research we predict an effect size of P for our experiment, we would use P in specifying the representaton of H1. When H1 is represented with a Normal, with a mean of P (and a SD of P/2), or with a Half-Normal with an SD of P, (i.e.in both cases, following the recommendations of Dienes, 2014) the maximum plausible effect is roughly 2*P. For example, we replicate an experiment which found that increasing judges' belief in determinism rather than free will reduced the amount by which crminials were sentenced by 3 years. So for our replication attempt we could use a half-normal with SD = 3 years (the effect from the original study), giving a plausible range of the mean effect from 0 to roughly 6 years.

If our predicted value is P, setting the maximum plausible effect of 2*P is in general arbitrary. One has to judge whether 2*P is reasonable in the scientific context. In the case of sentencing just considered it might seem reasonable. In fact, 2*P is often a remarkably good default. Consider effect size expressed as a Cohen's d: d = 0.5 is a frequent effect size in studies in psychology, and an effect size of greater than d = 1 is very hard to obtain (cf Pashler, Coburn, & Harris, 2012), so a maximum of 2*P will often be sensible. But there is no reason why it should always be. Rouder et al (2009) suggest representing H1 with a Cauchy distribution (i.e. a t-distribution with 1 degree of freedom) rather than a Normal. While Rouder et al derived the Cauchy for different reasons, if the appropriate way of evalauting the representation of H1 is whether it simply and adequately represents scientific intuitions, then the Cauchy has an interesting property. Namely, if we scale it with P (i.e. set sdtheory=P in the code below), and use a half-Cauchy then there is 50% probability of the true population effect being in the range 0 to P but only 5% of being greater than 7*P (and 3% for being greater than 10*P). So the Cauchy distribution with scale parameter P represents the scientific intuition that the likely effect size is around P and the maximum plausible effect size is around roughly10*P. So using the Cauchy allows a different maximum than that provided by the Normal, for the same predicted effect size P. In the sentencing example, the scientific claim would be that while a change in sentence around 3 years is likely, there could plausibly be a change up to roughly 20-30 years. Thus, having both the Normal and the Cauchy available may allow us to match scientific intuition better in different situations.

To run the calculator, download R and copy and paste the code below into R. If the obtained mean difference was 2.0 years and the standard error 1.0 years, for the example we have been considering (where the roughly expected effect size is 3 years), type into R: "Bf(sd = 1, obtained = 2, sdtheory = 3)". You will get an answer of 2.81. If you enter "Bf(sd = 1, obtained = 2, sdtheory = 3, dftheory=1000)" the t-distribution has sufficiently high degrees of freedom it behaves as a Normal, and you obtain an answer of 3.71, essentially the result from the Dienes (2008) online calculator (3.72). In the first case (B = 2.81), the representation of H1 was more spread out, the theory more vague, so the result favoured H1 less than in the second case (B = 3.71). To notate how H1 has been represented, the former Bayes factor can be labelled Bhalf-t(0,3,1) to indicate that the predictons of H1 have been represented as a half t-distribution with a mode of 0, a scale of 3 years, and 1 degree of freedom.

 

Code for Bt with normal likelihood (Dienes & McLatchie, 2018))

*************************************************************************

Bf<-function(sd, obtained, meanoftheory=0, sdtheory,  dftheory = 1, tail=1)
{
area <- 0
normarea <- 0
                                theta <- meanoftheory - 10 * sdtheory
                                incr <- sdtheory / 200
                                for (A in -2000:2000){
                                                theta <- theta + incr
                                                tscore = (theta - meanoftheory)/sdtheory
                                                dist_theta <- dt(tscore, df=dftheory)

                                                if(identical(tail, 1)){
                                                                if (theta <= 0){
                                                                                dist_theta <- 0
                                                                } else {
                                                                                dist_theta <- dist_theta * 2
                                                                }
                                                }
                                                height <- dist_theta * dnorm(obtained, theta, sd)
                                                area <- area + height * incr
normarea <- normarea + dist_theta*incr
                }
                LikelihoodTheory <- area/normarea
                Likelihoodnull <- dnorm(obtained, 0, sd)
                BayesFactor <- LikelihoodTheory / Likelihoodnull
BayesFactor
}

 

***********************************************************************

This comparison illustrates how the Dienes (2008) calculator differs from the Rouder et al (2009) calculator: The Dienes calculator (e.g. assuming a half-normal with SD = P) assumes that for a predicted effect or scaling factor of P, the maximum is about 2*P; the Rouder et al calculator assumes a plausible maximum closer to 10*P. A further difference is that the Rouder et al calculator scales with a standardized effect size (Cohen's d); the Dienes one can use any effect size (so long as the sampling distribution of the parameter is roughly normal). Typically, raw effect sizes are better for testing theories (see Dienes 2014. 2015 for arguments; example: A standardised regression coefficient will be sensitive to the theoretically irrelevant factor of range restriction). But what units one uses ultimately depends on the scientific problem. Putting aside the issue of what scales are best for making scientific predictions (which is a domain-specific scientific matter), Table 1 shows how compared to the Dienes (2008) calculator, the t-distribution calculator above gives similar answers as the Rouder et al one for corresponding situations in the one-sample case. The Rouder et al calculator always favours the null more than the Normal Dienes calculator; the Dienes t-distribution calculator falls in between.

 

Table 1

Assume observations are normally distributed with a standard deviation of 1. Thus, SE = 1/sqrt(N), and t = obtained/SE. From these numbers, the Rouder et al (2009) Bayes factor, BR, can be calculated from here. The Dienes (2008) Bayes factor based on the Normal, BN, and modified to be based on the t-distribution, Bt, can also be determined (here using the correction for small numbers advised by Dienes, 2008, 2014; i..e the SE is increased by a factor (1 + 20/df*df)). BR has been scaled with r = 1, and thus Bt is correspondingly Bt(0,1,1) (i.e. a t-distribution with a mean of 0, an SD of 1, and one degree of freedom) . The Bayes factor based on a normal is BN(0,1) (i.e. H1 is modelled a normal with a mean of 0 and an SD of 1). All distributions for representing H1 are 2-tailed. All B's are reported as evidence in favour of H1 over H0. Column A is explained in the text.

N SE obtained sample mean BR Bt(0,1,1) BN(0,1) A
10 0.32 0 0.23 0.25 0.36 0.25
10 0.32 0.32 0.33 0.39 0.49 0.40
10 0.32 0.85 2.71 2.76 3.03 3.02
10 0.32 1.17 10.64 11.15 14.84 11.90
10 0.32 1.60 59.87 309.83 367.25 64.78
30 0.18 0 0.14 0.15 0.18 0.15
30 0.18 0.18 0.23 0.24 0.29 0.24
30 0.18 0.45 2.22 2.87 3.66 2.38
30 0.18 0.58 10.08 19.24 25.24 11.39
30 0.18 0.70 54.05 198.96 280.22 58.16
100 0.10 0 0.08 0.08 0.10 0.08
100 0.10 0.10 0.13 0.14 0.16 0.14
100 0.10 0.27 2.58 3.01 3.67 2.76
100 0.10 0.33 11.40 17.70 18.57 12.21
100 0.10 0.38 52.29 84.13 105.01 57.09

 

Bt(0,1,1) does not favour the null as much as BR. A further adjustment can be made to mimic BR: Instead of using a Normal distribution for the likelihood (as in the Dienes 2008 calculator, but not in the Rouder et al 2009 one which takes into account that the variance of the data is estimated) we can use the t-distribution not just to model the predictions of H1, but as the likelihood for the data (one could notate Bt(0,1,1), L = t). Column A shows the results. (In this case one does not adjust the SE with the correction factor; use of the t-distribution already takes into account that the variance of the data is estimated.) A combination of the Cauchy representation of H1 and the t-distribution for the likelihood produces a Bayes factor very much like the Rouder et al one. A key difference is that the calculator can be used to assess raw effect sizes.

Bt(0,P,1) is a Cauchy representation of H1 implying a likely effect around P and a maximum effect around 7 to 10*P. Bt(0, P, 1.2), i.e. where dftheory = 1.2, implies a likely effect around P and a maximum of about 5*P. (Run the R command: "pt(-5, df = 1.2)"; it indicates the area of the t-distribution below -5, which is 5% in this case.) Bt(0, P, 2), i.e. where dftheory = 2, implies a likely effect around P and a maximum of about 3*P. Thus, the Bt calculator can be used to scale one's maximum between about 2*P and 10*P for a predicted value of P, by changing dftheory.

 

 

Code for Bt with a t-distributed likelihood.

*********************************************

Bf<-function(sd , obtained, dfdata , meanoftheory=0, sdtheory=1, dftheory = 1,tail=2)
{           
area <- 0
normarea <- 0

                                theta <- meanoftheory - 10 * sdtheory
                                incr <- sdtheory / 200
                                for (A in -2000:2000){
                                                theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
                                                dist_theta <- dt((theta-meanoftheory)/sdtheory, df=dftheory)
                                                if(identical(tail, 1)){
                                                                if (theta <= 0){
                                                                                dist_theta <- 0
                                                                } else {
                                                                                dist_theta <- dist_theta * 2
                                                                }
                                                }
                                                height <- dist_theta * dt((obtained-theta)/sd, df = dfdata)
                                                area <- area + height * incr
                                                normarea <- normarea + dist_theta*incr
                                }
                LikelihoodTheory <- area/normarea
                Likelihoodnull <- dt(obtained/sd, df = dfdata)
                BayesFactor <- LikelihoodTheory / Likelihoodnull
BayesFactor
}

*****************************************

REFERENCES

Baguley, T., & Kaye, W. S. (2010). Review of Understanding psychology as a science: An introduction to scientific and statistical inference. British Journal of Mathematical & Statistical Psychology. 63, 695-698.

Dienes, Z. (2008).Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference. Palgrave Macmillan

Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psycholology, 5: 781. doi: 10.3389/fpsyg.2014.00781

Dienes, Z (2015). How Bayesian statistics are needed to determine whether mental states are unconscious. In M. Overgaard (Ed.), Behavioural Methods in Consciousness Research. Oxford: Oxford University Press, pp 199-220.

Dienes, Z., & McLatchie, N. (2018). Four reasons to prefer Bayesian over significance testing. Psychonomic Bulletin & Review, 25, 207-218. https://doi.org/10.3758/s13423-017-1266-z

Pashler, H., Coburn, N., and Harris, C. (2012). Priming of social distance? Failure to replicate effects on social and food judgements.PloS ONE, 7(8):, e42510

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., and Iverson, G. (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin Review, 16, 225–237