This page will give you the means for performing simple Bayesian analyses. You should read Chapter Four of Understanding psychology as a science for background. Chapter Four provides definitions, explanations and details. This page can be used by students, course instructors, and researchers to get the most out of their data. (And put difficult journal reviewers in their place! By how much should your confidence really change in one theory rather than another?)
To test your intuitions concerning Bayesian versus Orthodox statistics try this QUIZ.
For more explication: Dienes, Z. (2011). Bayesian versus Orthodox statistics: Which side are you on? Perspectives on Psychological Sciences, 6(3), 274-290.
__________________________________________________________________________________________________________________________
The Bayes factor tells you how strongly data support one theory (e.g. your pet scientific theory under test) over another (e.g. the null hypothesis). It is a simple intuitive way of performing the Bayesian equivalence of significance testing, telling you the sort of answer which many people mistakenly think they obtain from significance testing. but cannot. A "null" result in significance testing, for example, does not automatically mean you should reduce your confidence in the theory under test; often you should actually increase your confidence. A non-significant p-value does not tell you whether you have evidence for the null or no evidence for any conlusion at all (or indeed evidence against the null). Yet people routinely take a non-significant result as indicating they should reduce their confidence in a theory that predicts a difference.
The Bayes factor needs two types of input: 1) a summary of the data and 2) a specification of what the theories predict.
1) In a situation where you could do a t-test, the data summary is exactly the same as would be used for a t-test:
a) the sample mean difference between conditions, or between a mean and a baseline, call this meandiff; and
b) the standard error of the difference, call this SE.
Note that t = meandiff/SE. Thus, if you know t and the mean differnce between conditions, you can get the relevant SE from SE = meandiff/t. This applies for any type of t-test.
The calculator will ask you for sample meandiff and SE. (Note more generally: a) could be any sample statistic, such as a median, and b) is its standard error.)
2) What are the predictions of the theories? This Bayes factor calculator compares a theory to the null hypothesis. In the calculator, the null hypothesis is taken to predict a population value of zero (e.g. a population difference in means between conditions of zero). To assess the evidence for the theory over the null, one must know what the theory predicts. This is the difficult part of using Bayes. But without knowing what a theory predicts, it is impossible to evaluate it. The calculator gives you some flexiblity in specifying the predictions of any theory. To specify the predictions, you need to represent how plasuible different possible population mean differences are according to the theory. You can approximate the plot of plausibility aagainst different possible population mean differences by i) a uniform distribution; ii) a half normal or iii) a normal. The calculator will ask you to pick one of these represenations. We consider each in turn.
i) The uniform. This indicates that all possible population mean differences between a lower limit and upper limit are equally plausible as far as the theory is concerned, and all values outside those limits are ruled out. See the figure below:

If the theory is that children will show implicit memory for words played when they are under general aneasthetic, what is the range of population mean differences we could reasonably expect? Lopez et al (2009, British Journal of Aneasthesia) tested their priming paradigm on awake children and found a priming effect of 6% above a chance baseline. We know from adults that priming under aneasthesia happens but is considerably less than when awake. Thus 6% is the very upper limit of what could be expected for children under aneasthesia. Thus, the prediction that implicit memory can occur for material presented to children under aneasthesia could be represented as a uniform with a lower limit of 0 and an upper limit of 6%. (We could make this more precise, as 6% is rather a high expectation. As adults do show pirming when under anaesthesia, we could run the Lopez et al paradigm on adults when awake and when under aneasthesia. If their priming when awake is W and when under is U, then might reasonably expect children to reduce their priming in the same proportion when under general aneasthesia as comapred to awake, i.e. to reduce the 6% when awake to a fraction U/W of 6% when under anestheasia.)
The lower limit can often be set to zero, but when there is a minimal value below which the effect is too small for the theory to be either true or interesting, that can form the lower limit. If there is no theoretical constraint on the upper limit, but the scale has a natural upper limit (e.g. it is a rating scale from 0-7, then a difference between conditions cannot exceed 7), then that can be specified as the upper limit, if you really have no grounds for otherwise limiting it.
ii) The half normal. Often it is unreasonable to expect all values to be equally plasuible. In the implicit memory example above, smaller effects are more likely than larger effects. Iselin-Chaves et al (2005, Anesthesiology) found a priming effect in adults of 3% and children will not have automated word recongition as much as adults. Thus we can represent the plasuiblity as a half normal with a mode of zero:
Now you need to specify the standard deviation, SD, of the distribution. The height of a normal comes close to zero by about two standard deviations from the mean. Thus if we let the SD equal 3%, we have effectively represented plausible values as lying between 0 and 6%, with smaller values more likely.
iii) The normal. Relevantly similar past research may lead you to expect one possible population mean difference to be most likely, and population mean differences more or less than that value increasingly unlikely. Such predictions can be represented as a normal.

In this case, specify which value is most likely (the mean of the normal), and the SD of the normal. You can obtain the SD by thinking what range of values is plausible. If a range of 0 to 6 is plausible, and 3 is the most likely value based on past research, specify the mean as 3 and the SD as 1.5. (Why 1.5? Because 2 times 1.5 is 3. The mean plus two standard deviations is 3 + 2*1.5 = 6 and the mean minus two standard deviations is 3 - 2*1.5 = 0. So the plausible range is 0 to 6.)
In short, to use the program to calculate a Bayes factor for your data, you need to enter your sample meandiff (which the calculator just calls "sample mean") and the standard error. You also need to decide what your theory actually predicts. The calculator calls the plot of different plausibilities "p(population effect|theory)", and asks if this is uniform. If you say 'yes' it asks for a lower and upper limit. If you say 'no' it asks for the mean and SD of the normal and how many tails it has. For a half-normal say 1 tail, set the mean to zero, and then you are left with the SD, which you need to decide based on your theory and past research. For a normal say 2 tails, and enter the mean and SD as determined by the theory and past research.
A Bayes factor of 3 or more can be taken as substantial evidence for your theory (and against the null)and of 1/3 or less as evidence for the null (and against your theory). Bayes factors between 1/3 and 3 show the data do not provide much evidence to distinguish your theory from the null.
For more advice see Dienes, Z. (2011). Bayesian versus Orthodox statistics: Which side are you on? Perspectives on Psychological Sciences, 6(3), 274-290.
Notes:
Now click here to calculate your Bayes factor!
For those who use Matlab, here is Matlab code for calculating Bayes factor in the same way as the flash program above. Baguley and Kaye (2010) provide equivalent R code
As well as, or instead of, a Bayes factor, it is usualy useful to determine what the most plausible set of population mean differences are, given your data and other constraints. Please read Chapter Four to understand the following.
Assume you can represent your prior by a normal distribution (without grave misrepresentation) and also that your data are normal. In determining your prior it is often useful to make use of Normal tables. This Java applet will tell you areas under the normal curve. For example, consider a study in which either an advert or no advert is presented and liking ratings on a 10 point scale for a product are later taken. You think liking is more likely to be higher after the advert. In fact, all things considered, you think the most likely population mean difference in that direction (call it the positive direction) is a difference of 1 rating point. The mean of the prior is 1. You think there is a 30% probability that the mean population effect could go the other way (i.e. that the advert decreases liking on average). In the first calculator enter a mean of 1, click 'below', specify below '0'. Then change the standard deviation until the area below 0 is .30. This is the standard deviation of your prior. In this case, you should find the standard deviation to be 1.9. (See Chapter Four for further discussion and checks you can perform in order to be happy with the specification of your prior distribution.) Remember in forming your prior, the more uncertain you are, the more you should spread the prior out to reflect that uncertainty ("when in doubt, spread it out"). If you felt maximally uncertain, or wished to allow the data to completely determine your posterior, choose a flat prior, i.e. one with equal height between -10 and +10 units (the maximum changes allowed by the scale in this case). A flat prior effectively is the same as a normal with an infinite standard deviation. In this case, the posterior is determined completely by the likelihood. Bear in mind such a flat prior implies that if someone told you that they ran three subjects and the mean difference was 10 units, you would take a difference of 10 units, the maximum conceivable, as the most probable population value, because prior to the data you had 'come to doubt all that you once held true, you stood alone without belief' (Paul Simon). If this seems unreasonable you can set prior constraints. Make sure you have reasons for how you set constraints on your prior that you could defend if critically cross examined.
Once you have determined the mean and standard deviation of your prior, collected data and hence found the mean and standard deviation of your likelihood, use this flash program to determine the mean and standard deviation of your posterior and look at graphs of the prior, likelihood and posterior distributions.
Once you have found your posterior distribution you can perform something similar to significance testing. Imagine you are testing a theory that predicts a reaction time difference. Based on your knowledge of the literature you believe this sort of effect should be about 20 ms in magnitude, roughly speaking, but you cannot be sure of direction. In any case, an effect less than 5 ms in magnitude would be so small as to render the theory practically false or irrelevant. We could take a null hypothesis to be the claim that the population effect falls in an interval 5 ms around zero. If your posterior had a mean of 15 ms and a standard deviation of 8 ms, what is the probability of the null? In the first calculator enter a mean of 15, a standard deviation of 8, click 'between', specify between -5 and +5. The calculator tells you that the probability of the null in the light of the data is 0.10. You could compare this probability with the probability of the null given by your prior. Alternatively, if your theory predicted an effect in a certain direction, you could find the posterior probability that the effect is less than zero by clicking on 'below' and specifying '0' (or '5', if 5 is the minimally interesting effect). The middle 95% of the distribution provides a 95% credicility interval, which can be used to assess hypotheses as well. If the middle 95% of the distribution is contained within the 5ms around zero, the null could be accepted and the the theory rejected with 95% confidence. Likewise, if the middle 95% of the distribution lay outside of this interval, the null could be rejected (see Kruschke, 2011). This latter method of hypothesis testing usually requires a lot of data for clear answers. The Bayes factor can give reasonable answers with less data because it takes into account the full range of predictions of a theory, not just the minimal interesting value, if such a value even exists.
_____________________________________________________________________________________________________________________________
The above programs are to enable use of Bayesien inference in simple situations. For more advanced researchers, try the free downloadable "Winbugs" Bayesian software for more complex statistical models. Here is another site providing a Bayes factor calculator, which constrains p(population value|theory) with the information that psychologists rarely investigate effects where the mean difference is more than one standard deviation (see Rouder et al, 2009). Note their calculator provides Bayes factors in favour of the null over the alternative hypothesis (whereas the the calculator I provide is for the alternative over the null) and all predictions based on the alternative are two-tailed. For this calculator you can use as your prior the sort of effect size (Cohen's d) you think is likely on your theory (e.g. whether you think the sort of mean difference you expect is about one within-group standard deviation i.e. Cohen's d = 1, or half a standard deviation. i.e.Cohen's d = 0.5, etc). For potential problems comparing across studies with standardised effect sizes see Baguley (2009).
For tutors: A lecture on Bayes and an alternative version I gave to our undergradutes this year as an introduction to chapter four; please feel free to adapt for your own purposes. I ask students to discuss each question with the person sitting next to them. Lecture lasts about an hour.
An essay I set students is: "Perform a Bayesian analysis on a part of the data from your project or from a paper published this year (consider an interesting question tested by a t-test – one test will do). Compare and contrast the conclusions from your analysis with those that follow from an analysis using Neyman-Pearson (classic) statistics. "
See also this assessment of several topics from the book.
IN PUBLICATIONS: For examples of Bayes factors used in publications, for possible models of how to write them up, see these submitted papers: Dienes, Baddeley and Jansari on page 17 and footnote 2, and Guo et al, on pages12-13.
Thanks to Online Experiments for providing the Flash conversions of my Matlab. Use Online Experiments for convenient programming of all your experiments!