The Sampling Distribution of Means Game

Instructions

A "Begin" button will appear at the bottom of this file when the applet is finished loading. This may take a minute or two depending on the speed of your internet connection and computer. Please be patient.

This Java applet lets you explore various aspects of sampling distributions. When the applet begins, a histogram of a normal distribution is displayed at the topic of the screen.

The distribution portrayed at the top of the screen is the population from which samples are taken. The mean of the distribution is indicated by a small blue line and the median is indicated by a small purple line. Since the mean and median are the same, the two lines overlap. The red line extends from the mean one standard deviation in each direction. Note the correspondence between the colors used on the histogram and the statistics displayed to the left of the histogram.

The second histogram displays the sample data. This histogram is initially blank. The third and fourth histograms show the distribution of statistics computed from the sample data. The number of samples (replications) that the third and fourth histograms are based on is indicated by the label "Reps=."

Basic operations
The simulation is set to initially sample five numbers from the population, compute the mean of the five numbers, and plot the mean. Click the "Animated sample" button and you will see the five numbers appear in the histogram. The mean of the five numbers will be computed and the mean will be plotted in the third histogram. Do this several times to see the distribution of means begin to be formed. Once you see how this works, you can speed things up by taking 5, 1,000, or 10,000 samples at a time.

Choosing a statistic
The following statistics can be computed from the samples by choosing form the pop-up menu:

Mean
Standard deviation of the sample (N is used in the denominator)
Variance of the sample (N is used in the denominator)
Unbiased estimate of variance (N-1 is used in denominator)
Mean absolute value of the deviation from the mean
Range

Selecting a sample size
The size of each sample can be set to 2, 5, 10, 16, 20 or 25 from the pop-up menu. Be sure not to confuse sample size with number of samples.

Comparison to a normal distribution
By clicking the "Fit normal" button you can see a normal distribution superimposed over the simulated sampling distribution.

Changing the population distribution
You can change the population by clicking on the top histogram with the mouse and dragging.

Exercises

Understanding sampling distributions
1. Click the "Animated sample" button. Five scores from a normal distribution will be sampled and plotted in a histogram. The mean of the sample will be computed and plotted in a second histogram. Repeat this 3 or 4 times or until you understand the how the "Distribution of Means" is created. The red line extends from the mean one standard deviation in each direction. The colored vertical bars on the X-axis correspond to the statistic of the same color.

2. Click the "5 samples" button to sample 5 samples of 5 scores each. The five means will be plotted. Click the "500 samples" and/or "2000 samples" until the distribution of means has stabilized. The sampling distribution of the mean is the distribution that is approached as the number of samples approaches infinity. With 5,000 to 10,000 you get a pretty good approximation.

3. The distribution plotted in (2) above is the sampling distribution of the mean of a sample size of 5. Approximate the sampling distribution of the mean for other sample sizes.

4. Any statistic you can compute in a sample has a sampling distribution. Approximate the sampling distribution of other statistics. The statistics available to compute are:

Mean
Median
Standard deviation (sd) (Using N in the denominator)
Variance (Using N in the denominator)
Mean absolute deviation from the mean (MAD)
Range

Understanding the Standard error
1. The standard error is the standard deviation of the sampling distribution. Approximate the sampling distribution of the mean for N=5. The standard deviation of the distribution is the standard error of the mean. Find the standard error of the mean and the standard error of the range for N=10 using the normal distribution.

2. Determine how the standard error is affected by sample size. Plot the standard error of the mean as a function of sample size for different standard deviations? Can you discover a formula relating the standard error of the mean to the sample size and the standard deviation? If so, see if it holds for distributions other than the normal distribution.

3. Redo #2 above for the median.

Understanding Bias
1. A statistic is unbiased if the mean of the sampling distribution of the statistic is the parameter. Test to see if the sample mean is an unbiased estimate of the population mean. Try out different sample sizes and distributions.

2. Find a distribution/sample size combination for which the sample median is a biased estimate of the population median.

3. Is the sample variance an unbiased estimate of the population variance? If not, see if you can find a correction based on sample size. Does the correction hold for distributions other than the normal distribution?

4. For what statistic is the mean of the sampling distribution dependent on sample size?

Understanding Efficiency
1. For a normal distribution, compare the size of the standard error of the median and the standard error of the mean. Find a relationship that holds (approximately) across sample sizes?

2. Does this relationship hold for a uniform distribution?

3. Find a distribution for which the standard error of the median is smaller than the standard error of the mean. (You may find this difficult, but don't give up.)

4. Compare the standard error of the standard deviation and the standard error of the mean absolute deviation from the mean (MAD). Does the relationship depend on the distribution?

Understanding the Central Limit Theorem
1. The central limit theorem states that the sampling distribution of the mean approaches a normal distribution as the sample size increases. Sample from the uniform distribution and determine how large a sample size is needed for the distribution to be a very close approximation of the normal distribution.

2. Do the same thing sampling from the skewed distribution.

3. Determine whether the sampling distribution of the median approaches a normal distribution as sample size increases.

Source: http://www.ruf.rice.edu/~lane/rvls.html

This is a wonderful source for statistical information like this.